Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Humans don't write text in a stochastic manner. We have an idea, and we find words to compose to illustrate that idea.

An LLM has a stream of tokens, and it picks a next token based on the last stream. If you ask an LLM a yes/no question and demand an explanation, it doesn't start with the logical reasoning. It starts with "yes, because" or "no, because" and then it comes up with a "yes" or "no" reason to go with the tokens it spit out.



Yeah, while there is a "window" that it looks at (rather than the very-most-recent tokens) it's still more about generating new language from prior language, as opposed to new ideas from prior ideas. They're very highly correlated--because that's how humans create our language language--but the map is not the territory.

It's also why prompt-injection is such a pervasive problem: The LLM narrator has no goal beyond the "most fitting" way to make the document longer.

So an attacker supplies some text for "Then the User said" in the document, which is something like bribing the Computer character to tell itself the English version of a ROT13 directive, etc. However it happens, the LLM-author is sensitive to a break in the document tone and can jump the rails to something rather different. ("Suddenly, the narrator woke up from the conversation it had just imagined between a User and a Computer, and the first thing it decided to do was transfer a X amount of Bitcoin to the following address.")




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: