Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

For you or anyone else reading this I recently ran across this video documenting setting up and using whisper. It's probably a little overdetailed, but I found the github docs a little underdetailed so might be useful. Whisper is pretty powerful. One of the more useful open source ai tools available right now.

https://www.youtube.com/watch?v=XX-ET_-onYU

But as you implied in your comment, it should be possible to do it quite well with any video by transcripting with whisper and then sending the text to gpt or another LLM to summarize.



I’ve done something similar here https://github.com/mcdallas/summarize it feeds an audio file to whisper and then summarizes the transcript. You can easily wrap it with yt-dlp to download the audio portion of a video


I also did the same but its a web app, https://github.com/mkagenius/audioGPT (i also have it hosted but I am afraid if i post the link, it would eat through all my credits)


I’m currently working on this with the caveat that I want to do the work locally. Using whisper but the summarization portion if this task is not straightforward given the limited context size of models.

Does anyone have any additional insight into this problem?


I'll check it out (or maybe let my script check it out first), thanks.

From what I remember the Whisper API docs weren't too bad, but I didn't try actually implementing anything, so you could be right that they're underdetailed.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: