marianebekker's comments

marianebekker · 2026-02-09T19:14:57 1770664497

Agreed on the shift — what stood out to us is how quickly teams hit system-level constraints once agents leave the demo phase. Things like evaluation drift, tool reliability, and recovery behavior start to matter more than model choice. Curious which of those has been the biggest bottleneck for you in practice?

marianebekker · 2026-02-09T19:05:20 1770663920

One thing that surprised us while putting this together was how uneven the stack still is. Planning and execution tooling feels fairly mature, but evaluation and long-term reliability lag far behind. Curious how people here are testing and validating agents in production today.

marianebekker · 2026-02-09T19:05:01 1770663901

Thank you!

marianebekker · 2026-02-09T19:04:58 1770663898

Thank you!

marianebekker · 2026-02-09T19:04:54 1770663894

Thank you!

marianebekker · 2026-02-09T18:21:05 1770661265

Over the last 18 months, agentic AI has shifted from prompt-driven chatbots to system-level engineering. This post maps the open-source tools teams are actually using to build, run, and evaluate agents in production, and where they sit in the agent stack.