Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Sounds to me that you are describing Laplace Smoothing in a Naive Bayes classifier. This is a pretty standard technique for avoiding the problem that you are seeing where the probability comes out as 100%/0% because of a lack of information in the model.


More generally, it's equivalent to applying a Dirichlet prior distribution, with uninformative parameters (i.e. all of the parameters on the Dirichlet are equal).

This is important, because while adding a single pseudocount to each column will prevent zero divisions, it's probably not reflective of the true distribution of values. If instead, you add pseudocounts using a Dirichlet where the parameters are set based on some prior knowledge, you can often improve the performance of the classifier (especially in low-count situations), without biasing the results unfairly.


Thanks, yes, that's what it's called.

All due credit to Laplace for the technique, but the word "smoothing" is making me wince, because it makes it sound as though this is some artificial approximation. For the assumption of an even distribution of probabilities, n+1 / m+2 really _is_ the exact probability of the event repeating. Like I said, you can confirm this experimentally with a quick program.


There are other smoothing techniques more prevalent in NLP that I'm learning about in my NLP class which distribute probability mass more evenly, but really won't help (at least I don't think so) in a classifier. Witten-Bell and Good-Turing are the ones I can think of off the top of my head.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: