Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

"An LLM generates text one token at a time. These tokens can represent a single character, word or part of a phrase. To create a sequence of coherent text, the model predicts the next most likely token to generate. These predictions are based on the preceding words and the probability scores assigned to each potential token.

For example, with the phrase “My favorite tropical fruits are __.” The LLM might start completing the sentence with the tokens “mango,” “lychee,” “papaya,” or “durian,” and each token is given a probability score. When there’s a range of different tokens to choose from, SynthID can adjust the probability score of each predicted token, in cases where it won’t compromise the quality, accuracy and creativity of the output.

This process is repeated throughout the generated text, so a single sentence might contain ten or more adjusted probability scores, and a page could contain hundreds. The final pattern of scores for both the model’s word choices combined with the adjusted probability scores are considered the watermark. This technique can be used for as few as three sentences. And as the text increases in length, SynthID’s robustness and accuracy increases."

Better link: https://deepmind.google/technologies/synthid/



I'm fascinated that this approach works at all, but that said, I don't believe watermarking text will ever be practical. Yes, you can do an academic study where you have exactly 1 version of an LLM in exactly 1 parameter configuration, and you can have an algorithm that tweaks the logits of different tokens in a way that produces a recognizable pattern. But you should note that the pattern will be recognizable only when the LLM version is locked and the parameter configuration is locked. Which they won't be in the real world. You will have a bunch of different models, and people will use them with a bunch of different parameter combinations. If your "detector" has to be able to recognize AI generated text from a variety of models and a variety of parameter combinations, it's no longer going to work. Even if you imagine someone bruteforcing all these different combos, trouble is that some of the combos will produce false positives just because you tested so many of them. Want to get rid off those false positives? Go ahead, make the pattern stronger. And now you're visibly altering the generated text to an extent where that is a quality issue.

In summary, this will not work in practice. Ever.


Even with temperature = 0, LLMs are still non-deterministic, as their internal, massively parallelized calculations are done with floating point arithmetic, which is order-dependent. Running the same LLM with the exact same parameters multiple times might still yield slightly different probabilities in the output, making this watermarking scheme even less robust.


This isn't necessarily true, it just depends on the implementation. I can say that because I've published research which embeds steganographic text into the output of GPT-2 and we had to deal with this. Running everything locally was usually fine--the model was deterministic as long as you had the same initial conditions. The problems occurred when trying to run the model on different hardware.


That's not my experience unless LLM providers are caching results. It's frustratingly difficult to get it to output substantially different text for a given prompt. It's like internally it always follows mostly the same reasoning for step 1, then step 2 applies light fudging of the output to give the appearance of randomness, but the underlying structure is generally the same. That's why there's so much blog spam that all pretty much read the same, but while one "delves" into a topic another "dives" into it.

How long until they can write genuinely unique output without piles of additional prompting?


Hmm, I ask LLMs to write me stories all the time, and I only give it a couple sentences as a prompt, loosely describing the setting of the story. And If I prompt it the exact same way, the events of the story are usually very different.


This is generally not a problem for most inference.


In practice, every programmer or a writer who gets the LLM output, does a lot of rewriting for already existing code, or already existing text. Stitching together parts of many LLM outputs is the only way to use an LLM effectively, even stitching together parts of different LLMs, which i do all the time.

Recognizing only parts of a watermark, and many watermarked parts scattered all around doesn't seem possible at all, in my mind.

They can however develop a software to sell very expensively to universities, schools etc, and it will occasionally catch a very guilty person who uses it all the time and doesn't even try to make the answer better, who always hands over the LLM answer in one piece.

At the end of the day, it will lead to so many false accusations people will stop trusting it. In chess players and tournaments false accusations of cheating happen all the time, for 15 years or more. Right now former world chess champion Kramnik has accused over 50 top chess players of cheating, including the 5 times US champion Nakamura, in the span of 2 months.

If a software like that gets applied to schools and universities, we are gonna have the fun of our lives.


Couldn’t this be easily disrupted as a watermark system by simply changing the words to interfere with the relative checksum?

I suspect sentence structure is also being used or, more likely, the primary “watermark”. Similar to how you can easily identify if something is at least NOT a Yoda quote based on it having incorrect structure. Combine that with other negative patterns like the quote containing Harry Potter references instead of Star Wars, and you can start to build up a profile of trends like this statement.

By rewriting the sentence structure and altering usual wording instead of directly copying the raw output, it seems like you could defeat any current raw watermarking.

Though this hasn’t stopped Google and others in the past using bad science and stats to make unhinged entitled claims like when they added captcha problems everybody said would be “literally impossible“ for bots to solve.

What a surprise how trivial they were to automate and the data they produce can be sold for profit at the expense of mass consumer time.


In principle, it seems like you could have semantic watermarking. For instance, suppose I want a short story. There are lots of different narrative and semantic aspects of it that each carry some number of bits of information: setting, characters, events, and those lay on a probability distribution like anything else. You just subtly shift the probability distribution of those choices, and then it's resistant to word choice, reordering, and any transformation that maintains its semantic meaning.


Much simpler: make every sentence contain an even number of words. Then the chances of 10 sentences in a row to be all even is about 0.1%.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: