First you have to classify what “good code” is, something that programmers have still not settled on in the over half a century that the field has existed.
I also think what the other reply said is true, going from average to “good code” is way harder because it implies a need for LLMs to self critique beyond what they do today. I don’t think just training on a set of hand picked samples is enough.
There’s also the knowledge cutoff aspect. I’ve found that LLMs often produce outdated Go code that doesn’t utilise the modern language features. Or for cases where it knows about a commonly used library, it uses deprecated methods. RAG/MCP can kind of paper over this problem but it’s still fundamental to LLMs until we have some kind of continuous training.
AI's can self-critique via mechanisms like chain of thought or user specified guard rails like a hook that requires the test suite to pass before a task can be considered complete/ready for human review. These can and do result in higher quality code.
Agree that "good code" is vague - it probably always be. But we can still agree that code quality is going up over time without having a complete specification for what defines "good".
Unfortunately I can only give anecdotes, but in my experience the LLM's 'thinking' does not lead to code quality improvements in the same way that a programmer thinking for a while would.
In my experience having LLMs write Go, it tends to factor code in not so great way from the start, probably due to lacking the mental model of pieces composing together. Furthermore, once a structure is in place, there doesn't seem to be a trigger point that causes the LLM to step back and think about reorganising the code, or how the code it wants to write could be better integrated into what's already there. It tends to be very biased by the structures that already exist and not really question them.
A programmer might write a function, notice it becoming too long or doing too much, and then decide break it down into smaller subroutines. I've never seen an LLM really do this, they seem biased towards being additive.
I believe good code comes from an intuition which is very hard to convey. Imprinting hard rules into the LLM like 'refactor long functions' will probably just lead to overcorrection and poor results. It needs to build its own taste for good code, and I'm not sure if that's possible with current technology.
> Furthermore, once a structure is in place, there doesn't seem to be a trigger point that causes the LLM to step back and think about reorganising the code, or how the code it wants to write could be better integrated into what's already there.
Older models did do this, and it sucked. You'd ask for a change to your codebase and they would refactor a chunk of it and make a bunch of other unrelated "improvements" at the same time.
This was frustrating and made for code that was harder to review.
The latest generation of models appear to have been trained not to do that. You ask for a feature, they'll build that feature with the least changes possible to the code.
I much prefer this. If I want the code refactored I'll say to the model "look for opportunities to refactor this" and then it will start suggesting larger changes.
> A programmer might write a function, notice it becoming too long or doing too much, and then decide break it down into smaller subroutines. I've never seen an LLM really do this, they seem biased towards being additive.
The nice thing is a programmer with an LLM just steps in here, and course-corrects, and still has that value add, without taking all the time to write the boilerplate in between.
And in general, the cleaner your codebase the cleaner LLM modifications will be, it does pick up on coding style.
>The nice thing is a programmer with an LLM just steps in here, and course-corrects
This does not seem to be the direction things are going. People are talking about shipping code they haven't edited, most notably the author of Claude Code. Sometimes they haven't even read the code at all. With LLMs the path of least resistance is to take your hands off the wheel completely. Only programmers taking particular care are still playing an editorial role.
When the code is constructed by an LLM, the human in the driving seat doesn't get a chance to build the mental models that they usually would writing it manually. This stifles the ability to see opportunities to refactor. It is widely considered to be harder to read code than to write it.
>And in general, the cleaner your codebase the cleaner LLM modifications will be
Whilst true, this is a kind of "you're holding it wrong" argument. If LLMs had model of what differentiates good code from bad code, whatever they pull into their context should make no difference.
> Whilst true, this is a kind of "you're holding it wrong" argument. If LLMs had model of what differentiates good code from bad code, whatever they pull into their context should make no difference.
Good code is in the eye of the beholder. What reviewers in one shop would consider good code is dramatically different than another.
Conforming to the existing code base style is good in and of itself, if the context it pulls in makes no difference that makes it useless.
> When the code is constructed by an LLM, the human in the driving seat doesn't get a chance to build the mental models that they usually would writing it manually
I'm asking the LLM for alternatives and options constantly, to test different models. It can give me a write-up description of options, or go spin up subagents to go try 4 different things at once.
> It is widely considered to be harder to read code than to write it
Even more than writing code, I think LLM's are exceptional at reading code. They can review huge amounts of code incredibly fast, to understand very complex systems. And then you can just ask it questions! Don't understand? Ask more questions!
I have mcp-neovim-server open, so I just ask it to open the relevant pieces of code at those lines, and it can then show me. CodeCompanion makes it easy to ask questions about a line. It's amazing how
Reading code was one of the extremely hard parts of programming, and the machine is far far better at it than us!
> When the code is constructed by an LLM, the human in the driving seat doesn't get a chance to build the mental models that they usually would writing it manually.
Here's one way to tell me you haven't tried the thing without saying you haven't tried the thing. The ability to do deep inquiry into topics & to test &btry different models is far far far better than it has ever been. We aren't stuck with what we right, we can keep iterating &b trying at vastly lower cost, to do the hard work to discover what is a good model. Programmers rarely have had the luxury of time and space to keep working on a problem again and again, to adjust and change and tweak until the architecture truly sings. Now you can try a weeks worth of architectures in an afternoon. There is no better time for those who want to understand to do so.
I feel like one thing missing from this thread is that most people adopting AI at a serious level are building really strong AGENTS.md files, that refine tastes and practices and forms. The AI is pretty tasteless, isnt deliberate. It is up to us to explore the possibility space when working on problems, and to create good context that steers towards good solutions. And our ability to get information out, to probe into systems, to asses, to test hypothesis, is vastly vastly higher, which we can keep using to become far better steersfolk.
There’s also the knowledge cutoff aspect. I’ve found that LLMs often produce outdated Go code that doesn’t utilise the modern language features. Or for cases where it knows about a commonly used library, it uses deprecated methods. RAG/MCP can kind of paper over this problem but it’s still fundamental to LLMs until we have some kind of continuous training.