People often describe the models as averaging their training data, but even for base models predicting the most likely next token this is imprecise and even misleading, because what is most likely is conditional on the input as well as what has been generated so far. So a strange input will produce a strange output — hardly an average or a reversion to the mean.
On top of that, the models people use have been heavily shaped by reinforcement learning, which rewards something quite different from the most likely next token. So I don’t think it’s clarifying to say “the model is basically a complex representation of the average of its training data.”
The average thing points to the real phenomenon of underspecified inputs leading to generic outputs, but modern agentic coding tools don’t have this problem the way the chat UIs did because they can take arbitrary input from the codebase.
On top of that, the models people use have been heavily shaped by reinforcement learning, which rewards something quite different from the most likely next token. So I don’t think it’s clarifying to say “the model is basically a complex representation of the average of its training data.”
The average thing points to the real phenomenon of underspecified inputs leading to generic outputs, but modern agentic coding tools don’t have this problem the way the chat UIs did because they can take arbitrary input from the codebase.