Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Gad, they sure like to say "BM25" over and over again. That's a near worthless approach to result ranking. Doing any halfway ok job requires much more tuned and/or more powerful approaches.


It's common to do a hybrid of BM25 with other fuzzy search or pgvector.


BM25 is quite bad and needs to be retrained for each corpus anew. SPLADEv2 is much better and there are even better sparse embeddings these days.


Can you please elaborate why?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: