Hacker Newsnew | past | comments | ask | show | jobs | submit | marianebekker's commentslogin

Agreed on the shift — what stood out to us is how quickly teams hit system-level constraints once agents leave the demo phase. Things like evaluation drift, tool reliability, and recovery behavior start to matter more than model choice. Curious which of those has been the biggest bottleneck for you in practice?

One thing that surprised us while putting this together was how uneven the stack still is. Planning and execution tooling feels fairly mature, but evaluation and long-term reliability lag far behind. Curious how people here are testing and validating agents in production today.

Thank you!

Thank you!

Thank you!

Over the last 18 months, agentic AI has shifted from prompt-driven chatbots to system-level engineering. This post maps the open-source tools teams are actually using to build, run, and evaluate agents in production, and where they sit in the agent stack.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: