Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Has your lab tried using any of the newer causal inference–style evaluation methods? Things like interventional or counterfactual benchmarking, or causal graphs to tease apart real reasoning gains from data or scale effects. Wondering if that’s something you’ve looked into yet, or if it’s still too experimental for practical benchmarking work.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: