Boris Cherny runs five AI coding agents simultaneously in his terminal. While one debugs a legacy module, another writes documentation, and a third executes test suites. He switches between them using numbered tabs and system notifications, treating software development less like writing and more like commanding a fleet. This isn't a futuristic vision—it's how the head of Claude Code at Anthropic works today, and the engineering world is scrambling to understand why it matters.
Last week, Cherny shared his personal development workflow in a thread on X that has since become required reading in Silicon Valley. The response has been extraordinary: developers calling it "game-changing," industry observers suggesting Anthropic might be approaching "their ChatGPT moment," and engineers reporting that adopting even parts of his system makes coding "feel more like Starcraft" than traditional programming. The thread has exposed a fundamental shift in how elite developers now operate—one that most of the industry hasn't yet grasped.
The Architecture of Parallel Development
Traditional software development follows what engineers call the "inner loop": write code, test it, fix bugs, repeat. This sequential process has defined programming for decades. Cherny has abandoned it entirely.
His setup involves running five instances of Claude simultaneously in iTerm2, each handling separate work streams. He numbers his terminal tabs 1-5 and relies on system notifications to alert him when an agent needs human input. While one agent refactors code, another runs integration tests, and a third generates API documentation. He supplements this with 5-10 additional Claude instances running in browser tabs on claude.ai, using a custom "teleport" command to transfer sessions between his local machine and the web interface when context needs to move.
This parallel execution model fundamentally changes the economics of software development. A single developer can now maintain the output velocity of a small team, not by working faster, but by orchestrating multiple autonomous processes. The human becomes a coordinator rather than a typist, making high-level decisions while agents handle implementation details.
Why the Slowest Model Wins
In an industry obsessed with reducing latency, Cherny's model choice seems counterintuitive. He exclusively uses Opus 4.5—Anthropic's largest, slowest, and most expensive model—with extended thinking enabled for every task.
"It's the best coding model I've ever used, and even though it's bigger & slower than Sonnet, since you have to steer it less and it's better at tool use, it is almost always faster than using a smaller model in the end," Cherny explained. This reveals a critical insight about AI-assisted development that many organizations miss: the bottleneck isn't token generation speed, it's correction cycles.
Faster models that produce lower-quality output create a hidden tax. Developers spend time reviewing flawed code, explaining what went wrong, and re-prompting for fixes. Cherny's approach inverts this calculation—paying upfront in compute costs to eliminate downstream correction time. For enterprise teams evaluating AI coding tools, this suggests that model selection should optimize for accuracy over speed, even when the per-request cost is higher.
The Compound Effect of Quality
The advantage of using a more capable model compounds over time. When an agent produces correct code on the first attempt, it can immediately move to the next task. When it requires multiple correction rounds, the entire parallel workflow stalls. Cherny's five-agent system only works because each agent reliably completes its assigned work without constant human intervention.
Building Institutional Memory Into the Codebase
Large language models suffer from a fundamental limitation: they don't retain context between sessions. An AI might make the same architectural mistake repeatedly because it has no memory of previous corrections. Cherny's team solved this with a deceptively simple mechanism.
They maintain a single file called CLAUDE.md in their git repository. Every time a human reviewer spots an error in AI-generated code—whether it's using the wrong naming convention, violating an architectural principle, or misunderstanding a business rule—they don't just fix the code. They update CLAUDE.md with a new instruction, and the AI references this file in all future work.
This creates a self-improving system. The codebase accumulates institutional knowledge that persists across sessions and team members. A mistake made once becomes a rule enforced forever. As product leader Aakash Gupta observed while analyzing the thread, "Every mistake becomes a rule." The longer the team works together, the more aligned the AI becomes with their specific standards and practices.
This approach has broader implications for how organizations should think about AI integration. Rather than treating each AI interaction as isolated, successful teams will build feedback loops that capture corrections and encode them as persistent instructions. The competitive advantage goes to organizations that systematically convert human expertise into machine-readable rules.
Automation Layers: From Slash Commands to Subagents
Cherny's workflow eliminates repetitive tasks through two mechanisms: slash commands and specialized subagents. Slash commands are custom shortcuts stored in the project repository that handle complex multi-step operations with a single keystroke. His most-used command, /commit-push-pr, automates the entire process of committing code, pushing to the remote repository, and opening a pull request—tasks that traditionally require multiple manual steps and context switching.
Subagents take this further by creating specialized AI personas for specific phases of development. Cherny uses a code-simplifier agent that reviews completed work and refactors for clarity, and a verify-app agent that runs end-to-end tests before any code ships. These agents operate with narrow, well-defined mandates, which improves their reliability compared to general-purpose prompting.
The strategic insight here is that AI coding tools become exponentially more valuable when integrated into the development workflow rather than bolted onto it. Teams that invest in building custom commands and specialized agents create compounding productivity gains that generic AI assistants can't match.
Verification Loops: The Difference Between Code Generation and Code That Works
The feature that likely explains Claude Code's reported $1 billion in annual recurring revenue isn't its ability to generate code—it's its ability to verify that the code actually works. Cherny's agents don't just write functions; they prove those functions behave correctly.
"Claude tests every single change I land to claude.ai/code using the Claude Chrome extension," Cherny wrote. "It opens a browser, tests the UI, and iterates until the code works and the UX feels good." This closed-loop system—where the AI generates code, tests it, identifies failures, and iterates—improves output quality by what Cherny estimates as "2-3x" compared to generation without verification.
This addresses the fundamental problem with first-generation AI coding tools: they could write plausible-looking code, but humans still had to verify correctness. By giving the AI access to testing infrastructure—whether through browser automation, bash command execution, or test suite integration—the human role shifts from verification to approval. The agent handles the tedious work of ensuring the code compiles, passes tests, and meets specifications.
What This Means for Development Team Structure
If AI agents can reliably verify their own work, the traditional division of labor in software teams begins to break down. Junior developers have historically spent significant time on tasks like writing tests, fixing bugs found in QA, and ensuring code meets style guidelines. When agents can handle these responsibilities autonomously, the human role concentrates on higher-level decisions: architecture, product direction, and complex problem-solving that requires business context.
This doesn't necessarily mean smaller teams—it means teams with different skill distributions. Organizations will need fewer people focused on implementation details and more focused on system design and strategic direction.
The Productivity Multiplier That Changes Everything
The reaction to Cherny's thread reveals how quickly the developer community recognizes a genuine breakthrough. This isn't hype about potential future capabilities—it's documentation of a working system that's already in production use at one of the world's leading AI companies.
What makes Cherny's workflow significant isn't any single technique. It's the combination: parallel execution, intelligent model selection, persistent institutional memory, workflow automation, and closed-loop verification. Each element amplifies the others. Parallel execution only works if agents are reliable enough to run unsupervised. Institutional memory only matters if agents can actually learn from it. Verification loops only add value if the underlying model is capable enough to iterate toward correct solutions.
The developers who implement this approach first aren't just adopting a new tool—they're operating under a different set of constraints. They're no longer limited by typing speed or the number of hours in a day. Their bottleneck becomes decision-making: choosing which problems to solve, how to architect solutions, and when to ship. As Jeff Tang summarized on X, this is about giving engineers "more power"—the ability to execute at a scale previously reserved for entire teams.
The tools to achieve this multiplication factor are already available. What's missing isn't technology—it's the mental model shift required to stop thinking of AI as an assistant and start treating it as a workforce. The programmers making that transition now won't just be more productive. They'll be competing in a fundamentally different game, one where the constraints that limited previous generations of developers no longer apply. Everyone else will still be typing.