Hacker Newsnew | past | comments | ask | show | jobs | submit | btown's commentslogin

It’s only a losing strategy if you assume everyone universally adopts the slow strategy, and no research teams spot it in the interim. For things with large splash radius, that’s unrealistic, so defenders have an information advantage.

Makes actual security patches tougher to roll out though - you need to be vigilant to bypass the slowdown when you’re actually fixing a critical flaw. But nobody said this would be easy!


> Makes actual security patches tougher to roll out though

Yeah. 7 days in 2026 is a LONG TIME for security patches, especially for anything public facing.

Stuck between a rock (dependency compromise) and a hard place (legitimate security vulnerabilities).

Doesn't seem like a viable long-term solution.


It seems the benchmarks here are heavily biased towards single-shot explanatory tasks, not agentic loops where code is generated: https://github.com/drona23/claude-token-efficient/blob/main/...

And I think this raises a really important question. When you're deep into a project that's iterating on a live codebase, does Claude's default verbosity, where it's allowed to expound on why it's doing what it's doing when it's writing massive files, allow the session to remain more coherent and focused as context size grows? And in doing so, does it save overall tokens by making better, more grounded decisions?

The original link here has one rule that says: "No redundant context. Do not repeat information already established in the session." To me, I want more of that. That's goal-oriented quasi-reasoning tokens that I do want it to emit, visualize, and use, that very possibly keep it from getting "lost in the sauce."

By all means, use this in environments where output tokens are expensive, and you're processing lots of data in parallel. But I'm not sure there's good data on this approach being effective for agentic coding.


I wrote a skill called /handoff. Whenever a session is nearing a compaction limit or has served its usefulness, it generates and commits a markdown file explaining everything it did or talked about. It’s called /handoff because you do it before a compaction. (“Isn’t that what compaction is for?” Yes, but those go away. This is like a permanent record of compacted sessions.)

I don’t know if it helps maintain long term coherency, but my sessions do occasionally reference those docs. More than that, it’s an excellent “daily report” type system where you can give visibility to your manager (and your future self) on what you did and why.

Point being, it might be better to distill that long term cohesion into a verbose markdown file, so that you and your future sessions can read it as needed. A lot of the context is trying stuff and figuring out the problem to solve, which can be documented much more concisely than wanting it to fill up your context window.

EDIT: Someone asked for installation steps, so I posted it here: https://news.ycombinator.com/item?id=47581936


Did you call it '/handoff' or did Claude name it that? The reason I'm asking is because I noticed a pattern with Claude subtly influencing me. For example, the first time I heard the the word 'gate' was from Claude and 1 week later I hear it everywhere including on Hacker News. I didn't use the word 'handoff' but Claude creates handoff files also [0]. I was thinking about this all day. Because Claude didn't just use the word 'gate' it created an entire system around it that includes handoffs that I'm starting to see everywhere. This might mean Claude is very quietly leading and influencing us in a direction.

[0] https://github.com/search?q=repo%3Aadam-s%2Fintercept%20hand...


I was reading through the Claude docs and it was talking about common patterns to preserve context across sessions. One pattern was a "handoff file", which they explained like "have claude save a summary of the current session into a handoff file, start a new session, then tell it to read the file."

That sounded like a nice idea, so I made it effortless beyond typing /handoff.

The generated docs turned out to be really handy for me personally, so I kept using it, and committed them into my project as they're generated.


Oh, so the word 'gate' is probably in the documentation also!

I see. So this isn't as scary. Claude is helping me understand how to use it properly.


If this was more than just a gut reaction [0], I have a tough time navigating what swings this topic between scary and not scary for you.

Unless you're a true and invested believer of souls, free will, and other spiritualistic nonsense (or have a vested political affiliation to pretend so), it should be tautological that everything you read and experience biases you. LLM output then is no different.

If you are a believer, then either nothing ever did, or LLMs are special in some way, or everything else is. Which just doesn't make sense to me.

[0] It's jarring to observe the boundaries of one's agency, sure, but LLMs are really nothing special in this way. For example, I somewhat frequently catch myself using words and phrases I saw earlier during the day elsewhere, even if I did not process them consciously.


I have noticed similar phenomena with Claude, where its vocabulary subtly shifts how I think/frame/write about things or points me to subtle gaps in my own understanding. And I also usually come around to understand that it's often not arbitrary. But I do think some confirmation bias is at play: when it tries to shift me into the wrong directions repeatedly, I learn how to make it stop doing that.

It definitely adds a layer of cognitive load, in wrangling/shepherding/accomodating/accepting the unpredictable personalities and stochastic behaviors of the agents. It has strong default behaviors for certain small tasks, and where humans would eventually habituate prescribed procedures/requirements, the LLM's never really internalize my preferences. In that way, they are more like contractors than employees.


Why would it be scary? Claude is just parroting other human knowledge. It has no goal or agency.

You can’t verify that there is no influence by the makers of Claude.

I would certainly expect everyone to assume that influence rather than not.

By that logic, nothing computers do is scary.

Yes I think that is their argument.

Computer don't do anything.

What's their value then?

Just like with absolutely any other tool, their value is in what it enables humans using them to accomplish.

E.g., a hammer doesn't do anything, and neither does a lawnmower. It would be silly to argue (just because these tools are static objects doing nothing in the absence of direct human involvement) that those tools don't have a very clear value.


Seems equally silly to me to suggest that hammers and lawnmowers don't do anything, but I mean here we are.

When people use other people like tools, i.e. use them to enable themselves to accomplish something, do those people cease to do things as well? Or is that not a terminology you recognize as sensible maybe?

I appreciate that for some people the verb "do" is evidently human(?) exclusive, I just struggle to wrap my head around why. Or is this an animate vs. inanimate thing, so animals operating tools also do things in your view?

How do you phrase things like "this API consumes that kind of data" in your day to day?


> Seems equally silly to me to suggest that hammers and lawnmowers don't do anything, but I mean here we are.

To be clear, I am not the person you were originally replying to. I personally don't care much for the terminology semantics of whether we should say "hammers do things" (with the opponents claiming it to be incorrect, since hammers cannot do anything on their own). I am more than happy to use whichever of the two terms the majority agrees upon to be the most sensible, as long as everyone agrees on the actual meaning of it.

> I appreciate that for some people the verb "do" is evidently human(?) exclusive, I just struggle to wrap my head around why. Or is this an animate vs. inanimate thing, so animals operating tools also do things in your view?

To me, it isn't human-exclusive. I just thought that in the context of this specific comment thread, the user you originally replied to used it as a human-exclusive term, so I tried explaining in my reply how they (most likely) used it. For me, I just use whichever term that I feel makes the most sense to use in the context, and then clarify the exact details (in case I suspect the audience to have a number of people who might use the term differently).

> How do you phrase things like "this API consumes that kind of data" in your day to day?

I would use it the exact way you phrased it, "this API consumes that kind of data", because I don't think anyone in the audience would be confused or unclear about what that actually means (depends on the context ofc). Imo it wouldn't be wrong to say "this API receives that kind of data as input" either, but it feels too verbose and awkward to actually use.


I'm not sure how to respond then, because having a preferred position on this is kind of essential to continue. It's the contended point. Can an LLM do things? I think they can, they think they cannot. They think computers cannot do anything in general outright.

To me, what's essential for any "doing" to happen is an entity, a causative relationship, and an occurrence. So a lawnmower can absolutely mow the lawn, but also the wind can shape a canyon.

In a reference frame where a lawnmower cannot mow independently because humans designed it or operate it, humans cannot do anything independently either. Which is something I absolutely do agree with by the way, but then either everything is one big entity, or this is not a salient approach to segmenting entities. Which is then something I also agree with.

And so I consider the lawnmower its own entity, the person operating or designing it their own entity, and just evaluate the process accordingly. The person operating the lawnmower has a lot of control on where the lawnmower goes and whether it is on, the lawnmower has a lot of control over the shape of the grass, and the designer of the lawnmower has a lot of control over what shapes can the lawnmower hope to create.

Clearly they then have more logic applied, where they segment humans (or tools) in this a more special way. I wanted to probe into that further, because the only such labeling I can think of is spiritualistic and anthropocentric. I don't find such a model reasonable or interesting, but maybe they have some other rationale that I might. Especially so, because to me claiming that a given entity "does things" is not assigning it a soul, a free will, or some other spiritualistic quality, since I don't even recognize those as existing (and thus take great issue with the unspoken assumption that I do, or that people like me do).

The next best thing I can maybe think of is to consider the size of the given entity's internal state, and its entropy with relation to the occurred causative action and its environment. This is because that's quite literally how one entity would be independent of another, while being very selective about a given action. But then LLMs, just like humans, got plenty of this, much unlike a hammer or a lawnmower. So that doesn't really fit their segmentation either. LLMs have a lot less of it, but still hopelessly more than any virtual or physical tool ever conceived prior. The closest anything comes (very non-coincidentally) are vector and graph databases, but then those only respond to very specific, grammar-abiding queries, not arbitrary series of symbols.


Computers perform computations. They do what programmers instruct them to do by their nature.

Agreed, just like hammers get the nails hammered into a woodboard. They do what the human operator manually guides them to do by their nature.

I am not disagreeing with you in the slightest, I feel like this is just a linguistic semantics thing. And I, personally, don't care how people use those words, as long as we are on the same page about the actual meaning of what was said. And, in this case, I feel like we are fully on the same page.


FWIW I have worked with people using the word "gate" for years.

For example, "let's gate the new logic behind a feature flag".



Claude has trained me on the use of the word 'invariant'. I never used it before, but it makes sense as a term for a rule the system guarantees. I would have used 'validation' for application-side rules or 'constraint' for db rules, but 'invariant' is a nice generic substitute.

I've started saying "gate" and "bound(ed)" and "handoff" a lot (and even "seam" and "key off" sometimes) since Codex keeps using the terms. They're useful, no doubt, but AI definitely seems to prefer using them.

I've actually been doing this for a year. I call it /checkpoint instead and it does some thing like:

* update our architecture.md and other key md files in folders affected by updates and learnings in this session. * update claude.md with changes in workflows/tooling/conventions (not project summaries) * commit

It's been pretty good so far. Nothing fancy. Recently I also asked to keep memories within the repo itself instead of in ~/.claude.

Only downside is it is slow but keeps enough to pass the baton. May be "handoff" would have been a better name!


Did the same. Although I'm considering a pipeline where sessions are periodically translated to .md with most tool outputs and other junk stripped and using that as source to query against for context. I am testing out a semi-continuous ingestion of it in to my rag/knowledge db.

Is this available online? I'd love documentation of my prompts.

I’ll post it here, one minute.

Ok, here you go: https://gist.github.com/shawwn/56d9f2e3f8f662825c977e6e5d0bf...

Installation steps:

- In your project, download https://gist.github.com/shawwn/56d9f2e3f8f662825c977e6e5d0bf... into .claude/commands/handoff.md

- In your project's CLAUDE.md file, put "Read `docs/agents/handoff/*.md` for context."

Usage:

- Whenever you've finished a feature, done a coherent "thing", or otherwise want to document all the stuff that's in your current session, type /handoff. It'll generate a file named e.g. docs/agents/handoff/2026-03-30-001-whatever-you-did.md. It'll ask you if you like the name, and you can say "yes" or "yes, and make sure you go into detail about X" or whatever else you want the handoff to specifically include info about.

- Optionally, type "/rename 2026-03-23-001-whatever-you-did" into claude, followed by "/exit" and then "claude" to re-open a fresh session. (You can resume the previous session with "claude 2026-03-23-001-whatever-you-did". On the other hand, I've never actually needed to resume a previous session, so you could just ignore this step entirely; just /exit then type claude.)

Here's an example so you can see why I like the system. I was working on a little blockchain visualizer. At the end of the session I typed /handoff, and this was the result:

- docs/agents/handoff/2026-03-24-001-brownie-viz-graph-interactivity.md: https://gist.github.com/shawwn/29ed856d020a0131830aec6b3bc29...

The filename convention stuff was just personal preference. You can tell it to store the docs however you want to. I just like date-prefixed names because it gives a nice history of what I've done. https://github.com/user-attachments/assets/5a79b929-49ee-461...

Try to do a /handoff before your conversation gets compacted, not after. The whole point is to be a permanent record of key decisions from your session. Claude's compaction theoretically preserves all of these details, so /handoff will still work after a compaction, but it might not be as detailed as it otherwise would have been.


I already do this manually each time I finish some work/investigation (I literally just say

"write a summary handoff md in ./planning for a fresh convo"

and it's generally good enough), but maybe a skill like you've done would save some typing, hmm

My ./planning directory is getting pretty big, though!


Thanks! The last link is broken, though, or maybe you didn't mean to include it? Also, if you've never actually resumed a session, do you use these docs at some other time? Do you reference them when working on a related feature, or just keep them for keepsake to track what you've done and why?

Thank you. It was just a screenshot of my handoff directory. I originally tried to upload to imgur but got attacked by ads, then uploaded to github via “new issue” pasting. I thought such screenshots were stable, but looks like GitHub prunes those now.

It wasn’t anything important. I appreciate you pointing that out though.

I just keep old sessions for keepsake. No reason really. I thought maybe I’d want them for some reason but never did.

The docs are the important part. It helps me (and future sessions) understand old decisions.


Oh wow, thank you so much!!!!!

Thanks!!!

I've got something similar but I call them threads. I work with a number of different contexts and my context discipline is bad so I needed a way to hand off work planned on one context but needs to be executed from another. I wanted a little bit of order to the chaos, so my threads skill will add and search issues created in my local forgejo repo. Gives me a convenient way to explicitly save session state to be picked up later.

I've got a separate script which parses the jsonl files that claude creates for sessions and indexes them in a local database for longer term searchability. A number of times I've found myself needing some detail I knew existed in some conversation history, but CC is pretty bad and slow at searching through the flat files for relevant content. This makes that process much faster and more consistent. Again, this is due to my lack of discipline with contexts. I'll be working with my recipe planner context and have a random idea that I just iterate with right there. Later I'll never remember that idea started from the recipe context. With this setup I don't have to.


Wouldn't the next phase of this be automatic handoffs executed with hooks?

Your system is great and I do similar, my problem is I have a bunch of sessions and forget to 'handoff'.

The clawbots handle this automatically with journals to save knowledge/memory.


when work on task i have task/{name}.md that write a running log to. is this not a common workflow?

I think Cursor does something similar under the hood.

> No explaining what you are about to do. Just do it.

Came here for the same reason.

I can't calculate how many times this exact section of Claude output let me know that it was doing the wrong thing so I could abort and refine my prompt.


Seems crazy to me people aren't already including rules to prevent useless language in their system/project lvl CLAUDE.md.

As far as redundancy...it's quite useful according to recent research. Pulled from Gemini 3.1 "two main paradigms: generating redundant reasoning paths (self-consistency) and aggregating outputs from redundant models (ensembling)." Both have fresh papers written about their benefits.


There was also that one paper that had very noticeable benchmark improvements in non-thinking models by just writing the prompt twice. The same paper remarked how thinking models often repeat the relevant parts of the prompt, achieving the same effect.

Claude is already pretty light on flourishes in its answers, at least compared to most other SotA models. And for everything else it's not at all obvious to me which parts are useless. And benchmarking it is hard (as evidenced by this thread). I'd rather spend my time on something else


No such thing as junk DNA kinda applies here

also: inference time scaling. Generating more tokens when getting to an answer helps produce better answers.

Not all extra tokens help, but optimizing for minimal length when the model was RL'd on task performance seems detrimental.


I liked playing with the completion models (davinci 2/3). It was a challenge to arrange a scenario for it to complete in a way that gave me the information I wanted.

That was how I realized why the chat interfaces like to start with all that seemingly unnecessary/redundant text.

It basically seeds a document/dialogue for it to complete, so if you make it start out terse, then it will be less likely to get the right nuance for the rest of the inference.


I made a test [0] which runs several different configurations against coding tasks from easy to hard. There is a test which it has to pass. Because of temperature, the number of tokens per one shot vary widely with all the different configurations include this one. However, across 30 tests, this does perform worse.

[0] https://github.com/adam-s/testing-claude-agent


Some redundancy also helps to keep a running todo list on the context tip, in the event of compacting or truncation.

Distilled mini/nano models need regular reminders about their objectives.

As documented by Manus https://manus.im/blog/Context-Engineering-for-AI-Agents-Less...


There's an ancient paper that shows repetition improves non-reasoning weights: https://arxiv.org/html/2512.14982v1

if the model gets dumber as its context window is filled, any way of compressing the context in a lossless fashion should give a multiplicative gain in the 50% METR horizon on your tasks as you'll simply get more done before the collapse. (at least in the spherical cow^Wtask model, anyway.)

Verbose output helps until it pushes code out of context and Claude loses the thread on the next edit.

Is there any good documentation about contracts? https://en.cppreference.com/w/cpp/language/contracts.html is incredibly confusing - its first displayed example seems to be an edge case where the assertion itself causes a mutation?

https://en.cppreference.com/w/cpp/language/function.html#Fun... is vaguely better, but still quite dense.

IMO the syntax makes things hard for a newcomer to the syntax to understand, which I see as core to any programming language's goals of community.

    double square_root(double num) asserts_pre(num >= 0)
would have been far more self-evident than just

    double square_root(double num) pre(num >= 0)
But I suppose brevity won out.

I believe that https://isocpp.org/files/papers/P2900R14.pdf is the paper, which doesn't mean it's good documentation, as it's meant for modifying the standard. However, in its early sections, it does link to other papers which have more information, and the "proposed wording" section should be where the standardize lives, with the rest of it being context.

If anything, Sora was an experimental question: giving away video generation is expensive, but is the voluntary user labeling and engagement data, which can be fed into RLHF, accretive enough to model training that it's a meaningful trade to make?

The shutdown of the service makes it clear that the answer was "no."

(It's not a particularly useful signal, though, in evaluating OpenAI's future. It could mean that OpenAI is less interested in video data, which might have implications on their AGI ambitions. It could equally mean that OpenAI has enough data that it's hit diminishing returns, or has found a cheaper source of labeling, or doesn't consider it meaningful one way or another. So there's a lot of thoughtpieces that the shutdown is a sign of weakness, but I don't think it's worth jumping to conclusions.)


I thought the expectation was cleaning up the balance sheet in preparation for IPO.... along with a pivot towards codegen revenue.

OpenAI is undergoing a significant strategic pivot toward developing world models.

Gosh, I wish this had existed a year ago; I spent an absurd amount of time creating a system for print brochure typesetting in HTML, that would iteratively try to find viable break points (keeping in mind that bullets etc. could exist at any time) that would ensure non-orphaned new lines, etc., all by using the Selection API and repeatedly finding bounding boxes of prospective renders.

It works, and still runs quite successfully in production, but there are still off-by-one hacks where I have no idea why they work. The iterative line generation feature here is huge.


At least, not that we remember.

I wholeheartedly disagree with this. For any iteration, Claude should be reading your codebase, reading hundreds of thousands of tokens of (anonymized) production data, asking itself questions about backwards compatibility that goes beyond existing test suites, running scripts and CI to test that backwards compatibility, running a full-stack dev server and Chrome instance to QA that change, across multiple real-world examples.

And if you're building a feature that will call AI at runtime, you'll be iterating on multiple versions of a prompt that will be used at runtime, each of which adds token generation to each round of this.

In practice on anything other than a greenfield project, if you're asking for meaningful features in complex systems, you'll be at that 10 minute mark or more. But you've also meaningfully reduced time-to-review, because it's doing all that QA, and can provide executive summaries of what it finds. So multitasking actually works.


I agree with everything you said - but it's also the case that a set of parameters were created that, instead of requiring multi-person validation of target validity and provenance, prioritized speed to provide decision makers with options.

This certainly doesn't absolve the person implementing those parameters, but it is equally the responsibility of the very top of the decision-making structure.


Your point is well taken, though it's worth pointing out that literally yesterday Palantir was co-awarded a contract for building orbital weapons systems [0].

The broader point is Palantir's specific confluence of:

- access to granular, non-anonymized data across industry silos

- its chairman's specific pro-authoritarian mission (so pointedly so that the Catholic Church felt the need to make a specific rebuke a few days ago [1])

- a regulatory environment in which its monetary risks are arguably minimized if it takes the broadest possible reading of e.g. HIPAA's law enforcement exceptions that mention "written administrative requests" [2]

- documented concerns about governance [3]

Those concerned with this confluence are far from conspiracy theorists, and may be quite rationally interested in protecting e.g. the public reputation of their hospital networks, and ability to service - to say nothing of their desire to protect the privacy of their patients.

[0] https://www.usnews.com/news/top-news/articles/2026-03-24/and...

[1] https://www.nytimes.com/2026/03/17/world/europe/peter-thiel-... - https://archive.is/2EOXa

[2] https://www.hhs.gov/hipaa/for-professionals/faq/505/what-doe...

[3] https://comptroller.nyc.gov/reports/letter-to-palantir-techn...


Have you considered that a weapons platform like that could be necessary? Or are you just opposed to Palantir being part of it.

This comment is written in an interesting way. If it's unnecessary, the OP's comment is fine. If the platform is "necessary" in some abstract sense, you've avoided articulating that argument by putting the burden back on OP to justify their position.

That seems like an interesting discussion though. Why would it be necessary?


There's ample evidence that medium range ballistic missile technology is proliferating, fired from land based systems. It is difficult to intercept these with ground-based launchers. But, if incepting from orbit the probability you score a hit is higher. The catch is that it is a) extremely complex, and b) very expensive to develop and implement a system like this. Enter Palantir and Anduril.

The weight of this argument rests on how much you care about being in range of MRBMs, how likely you think it is that MRBMs will be a decisive factor in a future conflict, and whether or not you want the United States to be victorious in this potential conflict. Many people do not care about this threat, don't think MRBMs will matter, and/or want the United States to lose. I am not one of those people.


I think that three things can simultaneously be true:

(1) missile defense systems based on deep data fusion with cutting-edge espionage systems for launch detection are becoming increasingly useful and necessary

(2) we should be thoughtful that these types of espionage fusion systems could also be used for domestic surveillance, and advocate for close scrutiny and oversight of these systems

(3) a healthcare administrator can make a rational argument that their patient information should not be handled by the same company building those espionage-advised data fusion systems... lest the close relationship with government quietly transform into unauthorized data sharing with other government actors, without clear paths towards legal recourse if this were to occur, and with potentially irrevocable consequences for patients


well, there _is_ this:

https://en.wikipedia.org/wiki/Outer_Space_Treaty

"bars states party to the treaty from placing weapons of mass destruction in Earth orbit, installing them on the Moon or any other celestial body, or otherwise stationing them in outer space"

but 1. today's sentiment is: to hell with these treaties-schmeaties, and 2. what you mentioned is not yet a weapon of _mass_ destruction, so we're all good!


Claiming a particular weapons system is “necessary” is war brained. There are other ways of survival besides bombing the shit out of each other.

True, I am partial to battle drill 1A.

> The broader point is Palantir's specific confluence of:

> - access to granular, non-anonymized data across industry silos

Do you have evidence that Palantir itself - not customers using Palantir software - has access to this data?


I think you are maybe reading into the initial claim too much and not hearing the follow ups. There are two things here: 1. the overall character, broad charter, and people that compose the company, and 2. the theory that it is a specific agent in illegal or harmful data trafficking. And sure, I think we can take 2 away completely here if we simply must assume good faith from these guys and the contracts that they make, but that still kinda leaves 1 which is pretty big. Like 1 answers your follow up question of why everyone hates them either way, but you still are countering it by trying to ask what it has to do with 2. If that makes sense?

And really, I don't think anyone wants to "oh sweet summer child" you in your doubts here, but it's really extremely hard to not want to just... gesture around the world right now and ask why you still believe in some kind of sanctity or infallibility of something like the legal contract or other various forms of de jure "accountability" when it comes to tech companies, especially one as big as this.


This pattern in which people make claims about Palantir having access to private information, then retreat back to something along the lines of "I don't like the character of the company" is exactly the kind of thing that leads me to believe people don't actually have tangible complaints with the company.

This is true, but Palantir also describes what they do in a way that is going to cause skepticism and confusion. When they talk about the ontology acting as a "digital twin" of the customer environment one could be forgiven for thinking this does actually mean Palantir is exfiltrating customer data and cloning it, which is not what happens.

> When they talk about the ontology acting as a "digital twin" of the customer environment one could be forgiven for thinking this does actually mean Palantir is exfiltrating customer data and cloning it, which is not what happens.

This is basically saying you have the same DB schema on your dev environment as you do on prod. If anyone made that kind leap in logic, I would conclude they have little to no technical know how.


Oh, I agree with you. Perhaps I should've said "one could forgive a journalist ...", who tend to not be familiar with these things.

I do think that what we think of as RAG will change!

When any given document can fit into context, and when we can generate highly mission-specific summarization and retrieval engines (for which large amounts of production data can be held in context as they are being implemented)... is the way we index and retrieve still going to be based on naive chunking, and off-the-shelf embedding models?

For instance, a system that reads every article and continuously updates a list of potential keywords with each document and the code assumptions that led to those documents being generated, then re-runs and tags each article with those keywords and weights, and does the same to explode a query into relevant keywords with weights... this is still RAG, but arguably a version where dimensionality is closer tied to your data.

(Such a system, for instance, might directly intuit the difference in vector space between "pet-friendly" and "pets considered," or between legal procedures that are treated differently in different jurisdictions. Naive RAG can throw dimensions at this, and your large-context post-processing may just be able to read all the candidates for relevance... but is this optimal?)

I'm very curious whether benchmarks have been done on this kind of approach.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: