Chapter 10: Making It Shippable

8 min readThis chapter is still being written

Here's what shipping looked like at Apex in the early days: push code to main from whatever branch I happened to be on, run the deployment script, and watch the logs to see if anything exploded. If something exploded at 2am, I fixed it at 2am.

The commit log is the evidence. Forty-four commits between midnight and 5am. Nine commits in thirteen minutes on the night of April 2. Not all firefighting — some were genuine late-night flow state. But when I categorized them, the majority were reactive: fixing things that broke in production, patching issues a test suite would have caught, untangling deploy problems a pipeline would have prevented. The flow-state commits were maybe a quarter. The rest were the tax for shipping without a safety net. With interest.

Making it shippable means reaching a state where deploying new code is boring. Not exciting. Not terrifying. Not a ritual you perform with your fingers crossed. Boring. You push. The pipeline runs. Tests pass. It deploys. You go to bed. You sleep.

Three things get you there: tests, a pipeline, and the discipline to use them.

Tests

The most common excuse for skipping tests during zero-to-one is “we're moving too fast.” I used that excuse. And the cost of that speed was having absolutely no idea whether changes broke existing behavior. At Apex, we went from roughly 20% coverage to over a thousand tests. The rule became simple: every production bug becomes a test. You fix the bug, you write the test that would have caught it, you never have that exact bug again. Coverage grew organically from pain, which is the only way it sticks.

You don‘t need 100% coverage. You need the critical path tested, the things that have broken before tested, and the boundaries tested. The AI is genuinely good at writing tests — point it at a function and tell it to focus on edge cases and error conditions. It’ll find things you wouldn't think of.

But watch for the failure modes. AI-generated tests tend to test implementation rather than behavior — they‘ll assert that a function calls another function in a specific order rather than asserting the output is correct. They hallucinate edge cases that don’t apply to your system. They can be verbose in ways that slow your test suite down without adding real coverage. The fix: review the tests the AI writes. Read them like you‘d read any code. Delete the ones that are testing the shape of the code rather than the outcome. Keep the ones that would actually catch a regression. The AI is the drafter. You’re the editor.

The Pipeline

At minimum: run tests, check for known vulnerabilities, build, deploy to staging, verify staging works, deploy to production. That's the full version. Start with step one: automated tests that run when you push. GitHub Actions, basic CI workflow — an hour to set up. Then add steps as the pain justifies them.

The AI Review Agent

This deserves its own section because it changed how I think about quality when building with AI.

The principle: the agent that writes the code should not be the only agent that judges the code.

AI agents have the same problem as human authors, magnified. A writing agent builds momentum. It makes an architectural choice in file one and reinforces it across twenty files because consistency feels like correctness. It doesn't get tired, but it does get locked in.

A second AI reviewing the diff catches a different class of problems. It sees the change as a reader, not as the author. It asks: does this diff actually match the issue description? Did the writing agent introduce a dependency that wasn‘t there before? Is there a simpler way to do what this code does? Are there error conditions that aren’t handled? When 45% of AI-generated code contains security vulnerabilities, a second set of eyes scanning for injection points and authentication gaps isn‘t paranoia — it’s basic hygiene.

The setup is straightforward. Add a step in your CI pipeline — after tests pass, before merge. The review agent gets the diff, the issue description, and a prompt that says: review this change for correctness, security, simplicity, and adherence to the project's existing patterns. Flag anything that concerns you. Most CI platforms support this as a GitHub Action or a webhook. The cost is a few cents per review and a minute or two of pipeline time.

What it catches: security issues the author was too focused on functionality to notice, inconsistencies with the codebase, overly complex solutions, missing error handling. What it doesn't catch: fundamental architectural mistakes, whether the feature is the right thing to build, anything requiring business context. The review agent is a strong second opinion on the code. It is not a substitute for understanding your product.

I didn't have this at Apex. The fix-to-feature ratio from Chapter 8 tells me review at merge time would have caught a meaningful percentage of what I spent nights fixing.

Rollback: The Safety Net Under the Safety Net

Catching bugs before deploy is important. The ability to undo a deploy in under a minute is more important.

No pipeline is perfect. Tests pass, review signs off, staging looks good — production still breaks. The question isn‘t whether this will happen. It’s how long your users suffer when it does.

Rollback should be a single command. Not a process. Not a meeting. One command that puts the previous known-good version back in production while you figure out what went wrong at a pace that doesn't involve panic.

This means versioned deployments, backward-compatible database migrations, and testing the rollback itself. The deploy you practice every day. The rollback you need the one day everything goes wrong. If you‘ve never tested it, you don’t have a rollback strategy. You have a hope.

At Apex, the deploys that cost me the most sleep weren‘t the ones where something broke. They were the ones where something broke and I couldn’t quickly get back to the previous state. The fix-forward pressure at 3am — “I'll just push another commit to fix this” — is how nine commits in thirteen minutes happens. A working rollback would have meant: revert, go to bed, fix it tomorrow with a clear head.

The Discipline

This is the hard part. When you're moving fast, the pipeline feels like friction. Tests take two minutes. Staging deploy takes five. You know the code is fine. You just want to ship it.

This is the exact impulse that created the problems you're now cleaning up. I know because it was my impulse, repeatedly, and it cost me dozens of midnight commits.

The rule: never deploy directly to production. Always go through the pipeline. Especially — especially — for “small” changes. The changes that break production are never the ones you thought were risky. They're the one-line fixes you were sure about.

I learn this the specific way you learn things you already know but haven‘t felt yet. It’s a Tuesday at Apex. A customer has reported a display bug — a label is wrong on one of the dashboard panels. I can see the fix. One string, one file. I‘ve been through the pipeline a dozen times today for real changes. This isn’t a real change. This is a typo.

So I push it straight to production. Skip the tests, skip staging. The label fix works. But the file I edited has an export that another module depends on, and my editor‘s auto-import has quietly added a circular dependency. The dashboard panel loads fine in isolation — which is how I test it, eyeballing it in the browser. In production, with the full module graph, the import cycle causes a silent failure in the billing summary component. Customers can’t see their invoices. It takes three hours to trace because the error isn't where the change is. The error is two modules away, caused by a side effect of a side effect of my “typo fix.”

Three hours of firefighting and a groveling email to affected customers. For a label change. The pipeline would have caught it in ninety seconds — the integration tests exercise the full module graph. I knew that. I just didn't think I needed it for something so small.

That‘s the trap. “So small” is the most dangerous phrase in deployment. The pipeline exists precisely for the moments when you’re sure you don't need it.

The Evolution

At Apex, the deployment evolved in phases. It started with what I can only describe as YOLO: push to main, run the script, watch the logs, hope. Then came tests — coverage climbed, and the midnight commits started to thin out, not because I was more disciplined but because the tests caught things before they hit production. Next was the pipeline itself: automated test runs, staging separated from production (a 4am fix — shamefully late, that one still bothers me), and deploys that finally started getting boring. Boring was the goal. The last phase was the AI review agent — the principle that the agent that writes shouldn‘t be the only one that judges. I didn’t fully get here at Apex. I should have started earlier.

Each phase reduced the midnight commits. Each phase gave back a piece of my sleep, my weekends, my ability to think about something other than whether the last deploy was slowly corrupting data.

What Boring Shipping Buys You

When shipping is boring, you‘ve hit one of the clearest markers of two. You can deploy on a Tuesday afternoon and go have dinner. You can hand the deployment process to someone new and not hover. The builder’s attention is freed for the things that actually matter — the product, the customers, the next problem worth solving.

But there‘s something deeper that nobody talks about. Boring shipping changes your relationship with your own work. When every deploy is a dice roll, you carry it with you — at dinner, in the shower, at your kid’s soccer game, you‘re half-present because part of your brain is still listening for the alert. The cognitive overhead of unreliable shipping is enormous and invisible. You don’t notice it until it's gone.

The first week at Apex when I deployed three times and didn‘t think about any of them afterward, I realized how much mental capacity I’d been burning on anxiety that felt like diligence. I thought I was being responsible, staying alert, keeping watch. I was actually just operating a system so fragile that it required constant human attention to avoid falling over.

Boring shipping is a gift you give yourself. It‘s the builder saying: I’ve built something trustworthy enough that I don't have to babysit it. That trust — in your tests, in your pipeline, in your rollback plan — is what frees you to do the creative work that only you can do. The system handles the mechanics. You handle the meaning.