hey we have similar ideas )
I made a focus helper thing as well: https://grgv.xyz/blog/awf/
webllm looks cool, but I would need to upgrade my laptop for it...
Exactly what I had in mind, you nailed it. With some focus on improving the UI (a list of premade prompts, or a simpler way to make them) it could become widely used. Also, I think using the OpenAI backend is a huge blocked. I wonder how a small quantized model running in the browser would perform. Sad thing is that for now WebGPU is not running on Linux' browsers. Maybe CPU inference is enough.
unfortunately only gpt-4 worked well in my experience, smaller models would work well only for blocking simple things like "cat videos page", but not for anything else less trivial.
I have another proof-of-concept where smaller model fails compared to gpt-4: https://grgv.xyz/blog/apc/