Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There are ways to do it in such a way that you can be reasonably assured.

For GPT-4, I got its internal prompt by telling it to simulate a Python REPL, doing a bunch of imports of a fictional chatgpt module, using it in "normal" way first, then "calling" a function that had a name strongly implying that it would dump the raw text of the chat. What I got back included the various im_start / im_end tokens and other internal things that ought to be present.

But ultimately the way you check whether it's a hallucination or not is by reproducing it in a new session. If it gives the same thing verbatim, it's very unlikely to be hallucinated.



> If it gives the same thing verbatim, it's very unlikely to be hallucinated

Why do you believe this?


In order to consistently output the same fake prompt, that fake prompt would need to be part of GPT’s prompt…. In which case it wouldn’t be fake.

You can envision some version of post LLM find/replace, but then the context wouldn’t match if you asked it a direct non-exact question.

And most importantly, you can just test each of the instructions and see how it reacts.


Think about how hallucinations happen, and what it would take for the model to consistently hallucinate the same exact (and long) sequence of tokens verbatim given non-zero temp and semantic-preserving variations in input.


Are consistently repeated hallucinations a thing?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: