Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The test sample is just a small, arbitrary sample from a universe of similar data.

You (probably) don't care about test-set performance per se but instead want to be able to claim that one model works better _in general_ than another. For that, you need to bust out the tools of statistical inference.



The test set allows you to make this claim if it is representative of the universe of novel data the model will run on and there is no data spoilage between test and train.

This isn't always true (of course, especially in aggregate series over time) and of course statistical measures are used to report model performance. But a Bonferroni correction struck me as a weird place to apply this specifically, but after the other comments from yesterday I saw where they were taking it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: