That's definitely one solution, but it still wouldn't quite capture it. As an extreme example, if rapper A produced 100 songs, each with exactly the same lyrics, they should surely be penalized compared with rapper B producing 100 songs with no shared words— even if rapper A's average unique-words-per-song is higher than rapper B's.