Then you need a lot of people that listen to those 12B hours of audio, and multi... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		woodson on May 21, 2021 \| parent \| context \| favorite \| on: Voice2json: Offline speech and intent recognition ... Then you need a lot of people that listen to those 12B hours of audio, and multiple listeners agree for each chunk of audio that what is spoken corresponds to the transcript.

londons_explore on May 21, 2021 [–]

Lots of machine learning systems can use unsupervised and semi-supervised learning. Then nobody has to listen to and annotate all that audio.

woodson on May 22, 2021 | [–]

Yes, but then you don't need Mozilla collecting read speech samples. You can just scrape any audio out there, run speech activity detection, and there you go.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact