unfortunately only gpt-4 worked well in my experience, smaller models would work well only for blocking simple things like "cat videos page", but not for anything else less trivial.
I have another proof-of-concept where smaller model fails compared to gpt-4: https://grgv.xyz/blog/apc/