Not sure why you emphasizing a round trip request like these models aren't alrea...

dragonwriter · 2025-08-08T06:19:21 1754633961

> I'd probably reach for like embeddings though to find a relevant prompt info to include

So, tool selection, instead of being dependent on the ability of the model given the information in context, is dependent on both the accuracy of a RAG-like context stuffing first and then the model doing the right thing given the context.

I can't imagine that the number of input prompt tokens you save doing that is going to ever warrant the output quality cost of reaching for a RAG-like workaround (and the size of the context window is such that you shouldn't have the probems RAG-like workarounds mitigate very often anyway, and because the system prompt, long as it is, is very small compared to the context window, you have a very narrow band where shaving anything off the system prompt is going to meaningfully mitigate context pressure even if you have it.)

I can see something like that being a useful approach with a model with a smaller useful context window in a toolchain doing a more narrowly scoped set of tasks, where the set of situations it needs to handle is more constrained and so identify which function bucket a request fits in and what prompt best suits it is easy, and where a smaller focussed prompt is a bigger win compared to a big-window model like GPT-5.

tayo42 · 2025-08-08T14:29:53 1754663393

I don't think making the prompt smaller is the only goal. Instead if having 1000 tokens of general prompt instructions you could have 1000 tokens of specific prompt instructions.

There was also a paper I saw that went by that showed model performance went down when extra unrelated info was added, that must be happening to some degree here too with a prompt like that