From my experience, ASR-to-NER pipelines don't perform adequately out of the box. Even though SOTA ASR systems claim 85% word accuracy, the distribution of errors is worth looking into. Errors around critical entities like credit card numbers or addresses are particularly prone, and even a small mistake renders the result useless.
These ASR errors cascade into the NER step, further degrading recall and precision. Combining ASR and NER into a joint model or integrated approach can reduce these issues in theory, it's just more complex to implement and less commonly used.
These ASR errors cascade into the NER step, further degrading recall and precision. Combining ASR and NER into a joint model or integrated approach can reduce these issues in theory, it's just more complex to implement and less commonly used.