For you or anyone else reading this I recently ran across this video documenting...

sethgecko · on April 29, 2023

I’ve done something similar here https://github.com/mcdallas/summarize it feeds an audio file to whisper and then summarizes the transcript. You can easily wrap it with yt-dlp to download the audio portion of a video

mkagenius · on April 29, 2023

I also did the same but its a web app, https://github.com/mkagenius/audioGPT (i also have it hosted but I am afraid if i post the link, it would eat through all my credits)

cced · on April 29, 2023

I’m currently working on this with the caveat that I want to do the work locally. Using whisper but the summarization portion if this task is not straightforward given the limited context size of models.

Does anyone have any additional insight into this problem?

messe · on April 29, 2023

I'll check it out (or maybe let my script check it out first), thanks.

From what I remember the Whisper API docs weren't too bad, but I didn't try actually implementing anything, so you could be right that they're underdetailed.