All this praise for AI.. I honestly don't get it. I have used Opus 4.5 for work and private projects. My experience is that all of the AIs struggle when the project grows. They always find some kind of local minimum where they cannot get out of but tell you this time their solution will work.. but it doesn't. They waste my time with this behaviour enormously. In the end I always have to do it myself.
Maybe when AIs are able to say: "I don't know how this works" or "This doesn't work like that at all." they will be more helpful.
What I use AIs for is searching for stuff in large codebases. Sometimes I don't know the name or the file name and describe to them what I am looking for. Or I let them generate some random task python/bash script. Or use them to find specific things in a file that a regex cannot find. Simple small tasks.
It might well be I am doing it totally wrong.. but I have yet to see a medium to large sized project with maintainable code that was generated by AI.
At what point does the project outgrow the AI in your experience? I have a 70k LOC backend/frontend/database/docker app that Claude still mostly one shots most features/tasks I throw at it. Perhaps, it's not as good remembering all the intertwined side-effects between functionalities/ui's and I have to let it know "in the calendar view, we must hide it as well", but that takes little time/effort.
Does it break down at some point to the extent that it simply does not finish tasks? Honest question as I saw this sentiment stated previously and assumed that sooner or later I'll face it myself but so far I didn't.
I find that with more complex projects (full-stack application with some 50 controllers, services, and about 90 distinct full-feature pages) it often starts writing code that simply breaks functionality.
For example, had to update some more complex code to correctly calculate a financial penalty amount. The amount is defined by law and recently received an overhaul so we had to change our implementation.
Every model we tried (and we have corporate access and legal allowance to use pretty much all of them) failed to update it correctly. Models would start changing parts of the calculation that didn't need to be updated. After saying that the specific parts shouldn't be touched and to retry, most of them would go right back to changing it again. The legal definition of the calculation logic is, surprisingly, pretty clear and we do have rigorous tests in place to ensure the calculations are correct.
Beyond that, it was frustrating trying to get the models to stick to our coding standards. Our application has developers from other teams doing work as well. We enforce a minimum standard to ensure code quality doesn't suffer and other people can take over without much issue. This standard is documented in the code itself but also explicitly written out in the repository in simple language. Even when explicitly prompting the models to stick to the standard and copy pasting it into the actual chat, it would ignore 50% of it.
The most apt comparison I can make is that of a consultant that always agrees with you to your face but when doing actual work, ignores half of your instructions and you end up running after them to try to minimize the mess and clean up you have to do. It outputs more code but it doesn't meet the standards we have. I'd genuinely be happy to offload tasks to AI so I can focus on the more interesting parts of work I have, but from my experience and that of my colleagues, its just not working out for us (yet).
I noticed that you said "models" & not "agents". Agents can receive feedback from automated QA systems, such as linters, unit, & integration tests, which can dramatically improve their work.
There's still the risk that the agent will try to modify the QA systems themselves, but that's why there will always be a human in the loop.
Should've clarified in that case. I used models as a general stand-in for AI.
To provide a bit more context:
- We use VS Code (plus derivatives like Cursor) hooked up to general modals and allowing general context access to the entire repository.
- We have a MCP server that has access to company internal framework and tools (especially the documentation) so it should know how they are used.
So far, we've found 2 use-cases that make AI work for us:
1. Code Review. This took quite a bit of refinement for the instructions but we've got it to a point where it provides decent comments on the things we want it to comment on. It still fails on the more complex application logic, but will consistently point out minor things. It's used now as a Pre-PR review so engineers can use it and fix things before publishing a PR. Less noise for the rest of the developers.
2. CRUD croft like tests for a controller. We still create the controller endpoint, but providing it with the controller, DTOs, and an example of how another controller has its tests done, it will produce decent code. Even then, we still often have to fix a couple of things and debug to see where it went wrong like fixing a broken test by removing the actual strictlyEquals() call.
Just keeping up with newest AI changes is hard. We all have personal curiosity but at the end of the day, we need to deliver our product and only have so much time to experiment with AI stuff. Nevermind all the other developments in our regulatory heavy environment and tech stack we need to keep on top off.
> At what point does the project outgrow the AI in your experience? I have a 70k LOC backend/frontend/database/docker app that Claude still mostly one shots most features/tasks I throw at it.
How do you do this?
Admittedly, I'm using Copilot, not CC.
I can't get Copilot to finish a refactor properly, let alone a feature. It'll miss an import rename, leave in duplicated code, update half the use cases but not all.. etc. And that's with all the relevant files in context, and letting it search the codebase so it can get more context.
It can talk about DRY, or good factoring, or SOLID, but it only applies them when it feels like it, despite what's in AGENTS.md. I have much better results when I break the task down into small chunks myself and NOT tell it the whole story.
I'm having trouble at 150k, but I'm not sure the issue is that per se, as opposed to the issue of the set of relevant context which is easy to find. The relevant part of the context threatens to bring in disparate parts of the codebase. The easy to find part determines whether a human has to manually curate the context.
I think most of us - if not _all_ of us - don't know how to use these things well yet. And that's OK. It's an entirely new paradigm. We've honed our skills and intuition based on humans building software. Humans make mistakes, sure, but humans have a degree and style of learning and failure patterns we are very familiar with. Humans understand the systems they build to a high degree, this knowledge helps them predict outcomes, and even helps them achieve the goals of their organisation _outside_ writing software.
I kinda keep saying this, but in my experience:
1. You trade the time you'd take to understand the system for time spent testing it.
2. You trade the time you'd take to think about simplifying the system (so you have less code to type) into execution (so you build more in less time).
I really don't know if these are _good_ tradeoffs yet, but it's what I observe. I think it'll take a few years until we truly understand the net effects. The feedback cycles for decisions in software development and business can be really long, several years.
I think the net effects will be positive, not negative. I also think they won't be 10x. But that's just me believing stuff, and it is relatively pointless to argue about beliefs.
> Maybe when AIs are able to say: "I don't know how this works" or "This doesn't work like that at all." they will be more helpful.
Funny you say that, I encountered this in a seemingly simple task. Opus inserted something along the lines of "// TODO: someone with flatbuffers reflection expertise should write this". I actually thought this was better than I anticipated even though the task was specifically related to fbs reflection. And it was because I didn't waste more time and could immediately start rewriting it from scratch.
Maybe when AIs are able to say: "I don't know how this works" or "This doesn't work like that at all." they will be more helpful.
What I use AIs for is searching for stuff in large codebases. Sometimes I don't know the name or the file name and describe to them what I am looking for. Or I let them generate some random task python/bash script. Or use them to find specific things in a file that a regex cannot find. Simple small tasks.
It might well be I am doing it totally wrong.. but I have yet to see a medium to large sized project with maintainable code that was generated by AI.