Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Not sure why you emphasizing a round trip request like these models aren't already taking a few seconds to respond? Not even sure that matters since these all run in the same datacenter, or you can atleast send requests to somewhere close.

I'd probably reach for like embeddings though to find a relevant prompt info to include



> I'd probably reach for like embeddings though to find a relevant prompt info to include

So, tool selection, instead of being dependent on the ability of the model given the information in context, is dependent on both the accuracy of a RAG-like context stuffing first and then the model doing the right thing given the context.

I can't imagine that the number of input prompt tokens you save doing that is going to ever warrant the output quality cost of reaching for a RAG-like workaround (and the size of the context window is such that you shouldn't have the probems RAG-like workarounds mitigate very often anyway, and because the system prompt, long as it is, is very small compared to the context window, you have a very narrow band where shaving anything off the system prompt is going to meaningfully mitigate context pressure even if you have it.)

I can see something like that being a useful approach with a model with a smaller useful context window in a toolchain doing a more narrowly scoped set of tasks, where the set of situations it needs to handle is more constrained and so identify which function bucket a request fits in and what prompt best suits it is easy, and where a smaller focussed prompt is a bigger win compared to a big-window model like GPT-5.


I don't think making the prompt smaller is the only goal. Instead if having 1000 tokens of general prompt instructions you could have 1000 tokens of specific prompt instructions.

There was also a paper I saw that went by that showed model performance went down when extra unrelated info was added, that must be happening to some degree here too with a prompt like that




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: