Steve Yegge wrote about Gas Town — a vision of AI agents building software autonomously. The idea took off: agents spawning agents, tools calling tools, some setups even A/B testing their own agent configurations to optimize behavior on the fly. Very futuristic. But reading about it, I kept coming back to one question:
Given all the possible AI automation in the world, what is the minimum control over the development process I still want to have?
The answer led me to a flow I’m much happier with — three non-negotiables:
- I’m not ready to give up implementation plans — I want to hear how the AI plans to address an issue before it writes a line of code.
- I’m not ready for PRs being merged without my review.
- I can’t work on more than three parallel tracks simultaneously — my brain starts overheating.
Everything else I’ve built follows from these constraints. The whole flow runs on Claude Code — an AI coding agent that lives in the terminal.
From Epic to Specs
It starts with an orchestrator agent. When I kick off a sprint, the orchestrator reads an Epic — a group of related stories — and builds a dependency graph. It filters down to unblocked stories and picks the top candidates in epic order.
Each story is just intent — what to build and roughly why, but not how. The orchestrator delegates the “how” to planner agents: it spawns one for each candidate story, in parallel. Several planners exploring the codebase simultaneously, each producing a Spec. More candidates than needed — some will be dismissed later if they touch the same files and risk merge conflicts.
A Spec is the bridge between intent and code. The planner reads the story, explores the codebase, and produces a document with a fixed structure:
Description
what changes and why
Relevant Docs
codebase files, standards, prior art
Implementation
steps in execution order, with exact file paths
Files to Modify
needed for overlap detection across parallel stories
Validation
tests and linters to run after implementation
When all planners report back, the orchestrator compares the “Files to Modify” lists across Specs. Stories that touch the same files risk merge conflicts, so the orchestrator minimizes overlap — dropping lower-priority candidates in favor of ones that won’t collide, keeping the best set that fits three tracks.
Then it presents me the selected Specs. I review each one — push back, suggest changes, or approve. This is where architectural decisions happen. That’s constraint #1 satisfied.
From Spec to Merged PRs
Once I approve a Spec, the orchestrator assigns it to an available track. I keep three permanent git worktrees — track-1, track-2, track-3 — each in its own terminal, each with its own full Docker Compose stack. Docker Compose uses the directory name as the project name, so each track gets fully isolated networks, volumes, and databases.
An implementer agent takes the approved Spec and works through it, splitting it into small, independently deployable pull requests. Each PR must pass validation on its own — nothing breaks if the next PR hasn’t landed yet. One story typically produces two or three PRs.
Each PR validates itself before I ever see it. The Spec includes validation commands — tests, linters, and specialist reviewers — and the implementer runs them after finishing its changes. If something fails, it reads the error, fixes the code, and re-validates. Up to three rounds. If it still can’t fix it, it stops and asks me.
When validation passes, the agent creates a PR. I review the diff — it’s small, it matches the Spec I already approved, so the review is fast. Sometimes I also want to verify the result manually — check a page in the browser, or test that the MCP connection works. Each track can be exposed via an HTTPS tunnel when I need to check it from a browser or connect an external client. That’s constraint #2 satisfied.
After I merge, CI/CD deploys automatically. If there are more steps in the Spec, the implementer continues with the next PR batch on the same track. When the story is complete, the implementer attaches the Spec to the story for history and the orchestrator picks the next unblocked story from the Epic for the freed track.
From “start sprint” to “three agents writing code”, most of the time is me reading Specs. That’s constraint #3 — three tracks is the most I can follow without losing focus.
Agents and Skills
Claude Code has two extensibility mechanisms: agents and skills. Understanding the distinction helped me design this workflow.
An agent is a long-running process with its own context window. It can reason, explore the codebase, make decisions, and call other tools. Each agent gets a fresh context — which makes it useful for isolating large operations that would otherwise bloat the main conversation. Use agents when the work requires judgment: planning an implementation, writing code, deciding what to fix.
A skill is a reusable recipe — a set of instructions that loads into the calling agent’s context when invoked. Skills are cheap: no new context window, no overhead. Optionally, a skill can run in a forked, isolated context for noisy tasks. Skills replaced what used to be slash commands. Use them for focused, repeatable work: running a linter, executing tests, checking code against a domain style guide.
In this workflow, the orchestrator, planner, and implementer are agents. The linter, test runner, and specialist reviewers are skills that the agents invoke as needed.
The validator sits in between. It’s a skill — it doesn’t decide what to validate, it just runs the commands the Spec lists and reports pass/fail. But tests and linters produce hundreds of lines of output that would trash the implementer’s context. So the validator runs in a forked context: an isolated subagent that absorbs all the noise and returns only a concise summary. The implementer sees “3 tests failed in user_spec.rb”, not 200 lines of RSpec output.
What My Day Looks Like Now
My workflow has settled into a rhythm:
- Create stories — from brainstorming, or from things discovered while working on current stories (they always spawn new ones).
- Organize them into Epics — group related work, set priorities and dependencies.
- Point the orchestrator at an Epic — and let it plan.
- Review Specs and PRs — this is where I spend most of my time. Push back on Specs, approve good ones, merge clean PRs.
That’s it. I write stories, review Specs, and review code. The agents handle everything in between.
The Takeaway
What remains after automation is judgment: knowing what to build, how it should fit together, and whether the result is correct. The two gates I kept — Spec approval and PR review — are exactly where that judgment lives. Everything between them is automated. And three parallel tracks turns out to be just enough to keep me busy without losing focus.
Comments