There are ways to do it in such a way that you can be reasonably assured.
For GPT-4, I got its internal prompt by telling it to simulate a Python REPL, doing a bunch of imports of a fictional chatgpt module, using it in "normal" way first, then "calling" a function that had a name strongly implying that it would dump the raw text of the chat. What I got back included the various im_start / im_end tokens and other internal things that ought to be present.
But ultimately the way you check whether it's a hallucination or not is by reproducing it in a new session. If it gives the same thing verbatim, it's very unlikely to be hallucinated.
Think about how hallucinations happen, and what it would take for the model to consistently hallucinate the same exact (and long) sequence of tokens verbatim given non-zero temp and semantic-preserving variations in input.
For GPT-4, I got its internal prompt by telling it to simulate a Python REPL, doing a bunch of imports of a fictional chatgpt module, using it in "normal" way first, then "calling" a function that had a name strongly implying that it would dump the raw text of the chat. What I got back included the various im_start / im_end tokens and other internal things that ought to be present.
But ultimately the way you check whether it's a hallucination or not is by reproducing it in a new session. If it gives the same thing verbatim, it's very unlikely to be hallucinated.