Yes, but for transcriptions of hour-long material like podcasts, interviews, speeches, essays, long emails. So it would be more like castingwords.com but faster and cheaper because of speech recognition.
You mean like Twilio or lavarockhq.com? I know it's hard but I consider that an advantage. But I'm still brainstorming, exploring existing solutions, and working on a very basic demo to see if it can be done with open-source software.
Not like either except for the speech component. Yours can be done using the speech open src sft but the accuracy will be bad unless you buy or make a good acoustic model. Very painful.
Yes, but the acoustic model (or models if you account for male/female and different accents) can be improved with all the input that the service gets over time. The bad accuracy can be fixed by human intelligence using Amazon Mechanical Turk.