I'm surprised by the quality on non-English languages, given that 80+% of the training data is English, and the rest is split between tens of languages.
It's sometimes close to perfect, and sometimes goes off the rail; I think that maybe the model tries to establish some sort of consistency for each sentence; if starts wrong for the first few words of a sentence, it can't build the rest properly.
1. Make sure you're using a model that isn't suffixed with `.en` (`base`, not `base.en).
2. Use `model.transcribe(your_input_audio, language='Japanese', task='translate')` ... with the appropriate input language.
I'm surprised by the quality on non-English languages, given that 80+% of the training data is English, and the rest is split between tens of languages.