Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is information-theoretically guaranteed to make LLM output worse.

My reasoning is simple: the only way to watermark text is to inject some relatively low-entropy signal into it, which can be detected later. This has to a) work for "all" output for some values of all, and b) have a low false positive rate on the detection side. The amount of signal involved cannot be subtle, for this reason.

That signal has a subtractive effect on the predictive-output signal. The entropy of the output is fixed by the entropy of natural language, so this is a zero-sum game: the watermark signal will remove fidelity from the predictive output.

This is impossible to avoid or fix.



you are correct of we suppose we are at a global optimum. however, consider this example:

i have two hands

i have 2 hands

these sentences communicate the same thing but one could be a watermarked result. we can apply this equivalent meaning word/phrase change many times over and be confident something is watermark while having avoided any semantic shifts.


You're not wrong, but natural language has a lot of stylistic "noise" which can be utilized as a subliminal channel without noticeably degrading the semantic signal.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: