Sounds to me that you are describing Laplace Smoothing in a Naive Bayes classifi...

timr · on Feb 26, 2008

More generally, it's equivalent to applying a Dirichlet prior distribution, with uninformative parameters (i.e. all of the parameters on the Dirichlet are equal).

This is important, because while adding a single pseudocount to each column will prevent zero divisions, it's probably not reflective of the true distribution of values. If instead, you add pseudocounts using a Dirichlet where the parameters are set based on some prior knowledge, you can often improve the performance of the classifier (especially in low-count situations), without biasing the results unfairly.

dreish · on Feb 26, 2008

Thanks, yes, that's what it's called.

All due credit to Laplace for the technique, but the word "smoothing" is making me wince, because it makes it sound as though this is some artificial approximation. For the assumption of an even distribution of probabilities, n+1 / m+2 really _is_ the exact probability of the event repeating. Like I said, you can confirm this experimentally with a quick program.

apgwoz · on Feb 27, 2008

There are other smoothing techniques more prevalent in NLP that I'm learning about in my NLP class which distribute probability mass more evenly, but really won't help (at least I don't think so) in a classifier. Witten-Bell and Good-Turing are the ones I can think of off the top of my head.