Writing

What I Learned Building Software with Autonomous AI Agents

June 12, 2026 · 6 min read

A few months ago I caught myself reviewing a pull request at midnight that I hadn't asked anyone to write. An agent had noticed a failing integration test in one of my projects, traced it to a bad assumption in a serializer, fixed it, and opened the PR with a description that was honestly better than most I get from people. I approved it and went to bed.

That's roughly where I've landed after spending the better part of a year building two SaaS products, StayRecap and RetroStoreManager, almost entirely through a pipeline of AI agents.

I don't mean "AI-assisted" in the tab-complete sense. I mean a pipeline that reads a spec, works out what needs to change, and hands the work to agents that write the code, review each other, run the tests, and ship it across four separate repositories per product. When something breaks in production, it takes a swing at fixing itself before I'm awake.

I want to write down what that's actually like, because most of what I read about agentic coding is either breathless or apocalyptic, and the truth is more mundane and more interesting than either.

Most of the job was never the typing

The thing that caught me off guard is how much of engineering turns out to be coordination, not code. Agents are genuinely good at the mechanical majority of the work: boilerplate, wiring services together, scaffolding tests, the kind of fifty-file refactor that's tedious but not hard. On the team I lead at Victra, leaning into this for exactly that category of work has been a real three-to-five-times speedup on the grindy parts of a sprint, the parts nobody was excited to do anyway.

What it doesn't do is decide what's worth building. That part got more important, not less.

Agents will build the wrong thing, beautifully

There's a ceiling, and it's the same one you hit with a fast junior engineer who doesn't yet hold the whole system in their head. An agent will take a vague instruction and confidently produce something clean, well-tested, and not what you meant. The cost of a sloppy spec used to be a few wasted hours. Now it's a few wasted repos.

So the spec became the product. I spend more time than I used to writing down what "done" looks like: the constraints, the edge cases, the things that must never happen. That document is now the highest-leverage thing I touch all day. Vague in, polished-wrong out.

The hard line: the AI never does the math

StayRecap generates financial reports, and property owners make real decisions off them. A made-up number there isn't a bug, it's a liability. So the whole thing is built so the model literally cannot invent a figure. A deterministic layer computes and verifies every number first; the language model only gets to describe numbers that already exist. It writes the sentence. It never does the arithmetic.

That split is the most useful design rule I've taken from any of this: let the model handle language, and never let it handle truth. It isn't specific to reports. Anywhere correctness matters, the move is the same: pin the model to verified ground and let it be fluent on top of that, never underneath it.

Where this leaves me

My job didn't shrink. It moved up a level. I write fewer functions and spend more time on the things that were always the actual work: deciding what to build, drawing the boundaries between systems, and reviewing with the kind of suspicion that comes from thirteen years of watching confident code fail in interesting ways.

I don't think this replaces engineers. I think it quietly retires the part of the job most of us liked least and puts a lot more weight on judgment. That's a trade I'll take.