Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In my unmeasured empirical observation Google has amazing speech recognition


I tried feeding the four examples from this announcement into Google as dictation inputs and it just sits there blankly. On the JFK speech test file in the repo, Google understands perfectly. The samples in the announcement are clearly outside the capabilities of anything Google has launched publicly, but I don't know how that translates to overall utility in every day applications.


I agree they have the best compared to Apple, Amazon, Microsoft. However I don't think it is as good as what is being shown here by OpenAI.


My experience with the APIs is Google is excellent and Microsoft is slightly better. And the offline model I've been using that's nearly as good as both is facebook's wav2vec2-large-960h-lv60-self.

Don't believe what's on marketing pages, they rarely transfer to the real world. Will have to make time to try it and see. In theory, given task diversity and sheer number of hours, it should be a lot more robust but will wait on evidence before believing any claims on SoTA.


Weird. I started working on an ASR SaaS in my spare time, and at least on the test podcasts, Google was the worst: https://www.sammaspeech.com/blogs/post/speech-recognition-ac...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: