That doesn't account for any overlaps in tracking data for groups of users.
Instead of a single per-user unique value, I could use several values that track different groups of users. The set of values together would uniquely identify a user, but for any 2 PDFs there would be at least one shared group value that would exist in both.
Using your method, leaking a single PDF would identify a group containing the 2 users of the PDFs you compared.
If the groups are randomized for each new article, every PDF you leak would further identify you as the common member of the leaking groups.
This opens up the opportunity for some kind of distributed file submission tool where you can compare hashes of segments of your document with everyone else's documents in some kind of zero-knowledge way, so that no actual piracy happens until enough people submit their document information for the system to create a de-DRMed copy of the document.
This is true, but you have to realize there is a built-in tradeoff regarding specificity. The more "resilient" this approach is to being found out by a hash, the less specific the identification will be.
Instead of a single per-user unique value, I could use several values that track different groups of users. The set of values together would uniquely identify a user, but for any 2 PDFs there would be at least one shared group value that would exist in both.
Using your method, leaking a single PDF would identify a group containing the 2 users of the PDFs you compared. If the groups are randomized for each new article, every PDF you leak would further identify you as the common member of the leaking groups.