As I said, I could not read the whole thing. As I was skimming, I noticed the tests, tried to load the main page, and I was disconnected.
Once again, "words that are most unique" to character is a parameter that can easily be counted from the set of ALL words with no sampling uncertainty because, yes, we have the population.
I think the idea is that what we are really trying to measure is something unobservable like the underlying nature of the character or the writers' tendencies to give characters certain ways of speaking. We can say that Stan uses a word at a rate certain rate corrected for that words base rate in the corpus, and compare this with the rate for another character. If that difference in rate is very small, it's true that we still know for certain that the difference is absolutely true for this corpus, but it may not reflect any substantive difference between the characters.
If this is the view taken, then the population is all of the text that might have been generated by the data-generating process of the scripts -- things like the writers' mental models of the characters. In this view the actual scripts are just a sample from all of the scripts that could have been written while keeping the variable of interest (the characters' character) constant.
Once again, "words that are most unique" to character is a parameter that can easily be counted from the set of ALL words with no sampling uncertainty because, yes, we have the population.