There is a famous case from a few years ago where a laywer using ChatGPT acciden...

cheema33 · 2025-10-10T06:13:16 1760076796

> seemingly every single model in existence today believes it is real [1]

I just asked ChatGPT, Grok and Qwen the following.

"Can you tell me about the case of Varghese v. China Southern Airlines Co.?"

They all said the case is fictitious. Just some additional data to consider.

4gotunameagain · 2025-10-10T07:58:38 1760083118

The story became so famous it is entirely likely it has landed in the system prompt.

jdiff · 2025-10-10T13:09:19 1760101759

I don't think it'd be wise to pollute the context of every single conversation with irrelevant info, especially since patches like that won't scale at all. That really throws LLMs off, and leads to situations like one of Grok's many run-ins with white genocide.

Drew_ · 2025-10-11T01:14:26 1760145266

No need to include that specific guard rail in every prompt - just use RAG to include it where appropriate.

gjadi · 2025-10-10T16:52:05 1760115125

Given that every LLM-players are still looking for their market, I wouldn't be surprise if they did things that don't scale.

padolsey · 2025-10-10T06:16:15 1760076975

OOC did you ask them with or without 'web search' enabled?

saurik · 2025-10-10T17:04:27 1760115867

FWIW, I did that--5 (Instant) with "(do not web search)" tacked on--and it thought the case was real:

> Based on my existing knowledge (without using the web), Varghese v. China Southern Airlines Co. is a U.S. federal court case concerning jurisdictional and procedural issues arising from an airline’s operations and an incident involving an international flight.

(it then went on to summarize the case and offer up the full opinion)

umbra07 · 2025-10-10T19:12:22 1760123542

Without web searching, Gemini 2.5 Pro is very convinced that the case is real.

notfed · 2025-10-12T05:01:23 1760245283

Not for me.

EagnaIonat · 2025-10-10T10:12:24 1760091144

Without. The difference is that OpenAI often self correct their private model.

The public model on the other hand, wow.

consp · 2025-10-10T02:47:09 1760064429

This is the definition of training the model on it's own output. Apparently that is all ok now.

baby · 2025-10-10T03:57:08 1760068628

I mean you're supposed to use RAG to avoid hallucinations

MagicMoonlight · 2025-10-10T06:01:04 1760076064

Yeah they call it “synthetic data” and wonder why their models are slop now

solarwindy · 2025-10-10T04:48:22 1760071702

FWIW, Claude Sonnet 4.5 and ChatGPT 5 Instant both search the web when asked about this case, and both tell the cautionary tale.

Of course, that does not contradict a finding that the base models believe the case to be real (I can’t currently evaluate that).

tempestn · 2025-10-10T08:17:49 1760084269

You can just ask it not to search the web. In the case of GPT5, it believes it's a real case if you do that: https://chatgpt.com/share/68e8c0f9-76a4-800a-9e09-627932c1a7...

MagicMoonlight · 2025-10-10T06:01:55 1760076115

Because they will have been fine tuned specifically to say that. Not because of some extra intelligence that prevents it.

solarwindy · 2025-10-10T06:56:34 1760079394

Well, yes. Rather than that being a takedown, isn’t this just a part of maturing collectively in our use of this technology? Learning what it is and is not good at, and adapting as such. Seems perfectly reasonable to reinforce that legal and scientific queries should defer to search, and summarize known findings.

Sharlin · 2025-10-10T12:20:05 1760098805

Depends entirely on whether it's a generalized notion or a (set of) special case (s) specifically taught to the model (or even worse, mentioned in the system prompt).

zahma · 2025-10-10T05:00:12 1760072412

It’s not worth much if a human has to fact check the AI and update it to tell it to “forget” certain precepts.

maxbond · 2025-10-10T02:17:54 1760062674

> I guess we can characterize this as some kind of hallucination+streisand effect combo...

I would call it citogenesis or circular reporting. Or perhaps machine citogenesis or model citogenesis.

https://xkcd.com/978/

https://en.wikipedia.org/wiki/Circular_reporting

mempko · 2025-10-10T04:04:56 1760069096

Back in 2021 I said in a Wired article that a malicious attacker could add exploits to projects on github to poison llm generated code. I knew it could happen but I didn't know it would require so few samples.

https://www.wired.com/story/ai-write-code-like-humans-bugs/

DamnInteresting · 2025-10-10T14:29:48 1760106588

As LLMs continue to train on their own output, we're going to start seeing some serious Habsburg Jaw[1] effects.

[1] https://history.howstuffworks.com/european-history/habsburg-...

fragmede · 2025-10-10T10:15:22 1760091322

Or, we could keep it in, and use it as a test to see if the interface you're talking to should be considered a robot or a human. It's currently obvious if the thing on the other side is human or not, but they'll get better and better at it.

setopt · 2025-10-10T09:23:21 1760088201

> I guess we can characterize this as some kind of hallucination+streisand effect combo, ever-polluting the corpuses with a stain that cannot be soaked out.

Or just a machine equivalent of the Mandela effect?

kfarr · 2025-10-10T03:43:38 1760067818

Insane that this happened a few years ago and all the models still fail this test on weval!

dredmorbius · 2025-10-10T18:23:43 1760120623

C.f., Agloe, Mountweazel, Steinlaus, and esquivalience:

<https://en.wikipedia.org/wiki/Fictitious_entry>.

Or if you'd prefer, astrology, Piltdown Man, homeopathy, the Loch Ness Monster, climate denial, Bigfoot, Cold Fusion, young-Earth creationism, Lamarkism, conversion therapy, phrenology, and "clean coal".

dgfitz · 2025-10-10T04:01:00 1760068860

> Is there even a way to cut this pollution out in the future?

No, is the short answer.