I'd be interested to see other examples of Lexis+ AI failures, because I thought...

rfw300 · on May 31, 2024

There's more examples on page 18 of the paper: https://dho.stanford.edu/wp-content/uploads/Legal_RAG_Halluc...

hn_throwaway_99 · on May 31, 2024

Thanks very much for linking that. This makes me think that legal support is actually one of the worst possible uses of LLM-based AI (at least as implemented here), primarily because so much of the source material is directly contradictory, e.g. a legislature passes a law which is subsequently overturned by the courts, or decisions in lower courts are reversed by higher courts, or higher courts reverse themselves over time. It feels like you'd absolutely have to annotate all the source material in some way to say whether it was still controlling law/precedent.

rfw300 · on May 31, 2024

Your instinct is correct. The major legal research providers (Thomson Reuters and LexisNexis) both provide “citators”, which are human annotations of which cases and statutes have been overruled, upheld, criticized, etc. One of the issues the paper describes is the fairly ham-handed way this gets integrated into these systems, causing even more trouble.

lukan · on May 31, 2024

Pretty much the same, when I try to get a LLM output correct code targeting a certain libary, but in its training data are various conflicting versions of the libary and the result is a incompatible mix composed of code for different versions thrown together.

vharuck · on May 31, 2024

Is there actually an effective way to handle queries with RAG where time periods are relevant? I made a proof-of-concept RAG for documents on a government website shortly after GPT-3.5 came out and remember this being a big problem. The most glaring wrong answer was "Who is currently the governor?" It answered with the previous governor, likely because he was listed as such in 8 years of documents versus the 2 years at the time for the current governor.