Singletrack

Software Delivery Process

Menu

← Blog

Singletrack blog

Most of my Singletrack work runs on Opus or Sonnet. That makes sense for Phase 1 (Product Manager), Phase 2 (Scout), Phase 3 (Architect), and Phase 5 (Verifier) — those phases are the reasoning. They earn their tokens.

Phase 4 — the Coder — is where it gets interesting. Once the Architect has produced a real spec, the actual implementation is often mechanical: write this helper, update these call sites, add these tests, generate this workflow YAML. You don’t need a frontier model to type that out. You need a frontier model to decide what should be typed, and then a fast, cheap model to type it.

So I built a small companion skill called /haiku that hands one focused implementation task off to claude --model claude-haiku-4-5 -p as a subprocess. The orchestrating model (Opus or Sonnet) writes a tight spec, Haiku executes it, and then the orchestrator independently verifies the result. Across the last six PRs I’ve shipped with this pattern, Haiku has delivered ~85% of the implementation cleanly, with the heavier model only stepping in for the 1–2 nuanced parts per PR.

What “delegable” actually looks like

The skill is opinionated about what counts as a Haiku-shaped task. The line that’s held up well in practice:

Delegate when the spec eliminates judgment calls.

  • Mechanical refactor with a clear before/after pattern → yes
  • Adding tests against a known spec → yes
  • Implementing a helper with a clear signature + verify command → yes
  • Updating multiple call sites of an API to a new pattern → yes
  • Boilerplate generation (CI workflow YAML, config files, scaffolding) → yes

And the things that look delegable but aren’t:

  • “Where should this helper live?” → architecture, not implementation
  • “Why is this test flaking in CI?” → debugging, needs repo-wide judgment
  • “Review this diff for security issues” → adversarial; needs independence, not just cost savings
  • “Decide whether to widen tolerance or pad audio” → that’s the spec, not the execution

If “the spec is figuring out the spec,” Haiku is the wrong tool. If the spec is written and the work is typing it into the right files, Haiku is exactly right.

The spec template that earns its keep

The first few times I tried this, Haiku would happily roam — broadening assertions to silence failures, adding “helpful” extra test cases, inventing function signatures when the spec referenced something stale. The fix was treating the spec like a contract:

TASK <ID>: <one-line imperative>

CONTEXT — read these files first:
- <file:line> — <why>

EXISTING BEHAVIOR THAT MUST BE PRESERVED:
- <invariant 1>
- <invariant 2>

CHANGES TO MAKE:
### 1. <file path>
<specific change, with the exact new code block>

VERIFY:
<exact command 1>
<exact command 2>
Both must succeed. If <X> fails, do NOT broaden the test to make it pass;
STOP and report what actually happened.

CONSTRAINTS:
- Edit ONLY the listed files.
- Do NOT add features beyond what's specified.

That last block is the critical one. Haiku will, by default, try to be helpful when something doesn’t pass — relaxing assertions, expanding scope, trying workarounds. Telling it explicitly to stop and report failure instead of papering over it is the difference between a clean handoff and a silent regression.

The orchestrator’s job hasn’t shrunk

One important finding: cost savings only show up if the orchestrating model does its job at both ends.

Before the spawn:

  • Read the files the spec will reference. Lock down accurate line numbers.
  • Verify the functions and tests the spec assumes actually exist.
  • Write the verify command and confirm it would actually catch a regression.

After the spawn:

  • Do NOT trust Haiku’s “all tests pass” summary. Run the verify commands independently.
  • Read the modified files. Confirm the diff matches the spec.
  • If verify fails, diagnose it yourself — don’t just re-spawn Haiku with “try again.”

The pattern only works because the heavier model is acting as a tight loop around the cheaper one. Skip the verification step and you’ve just bought yourself a high-confidence regression at a discount.

What this changes about how I plan work

The most useful side effect has been that I now think about specs differently. When I’m in the Architect phase, I’m not just asking “is this design right?” — I’m also asking “is this spec delegable?” If the answer is no, that’s usually a signal the architecture isn’t quite settled yet. Vague specs and ambiguous handoffs are the same problem wearing different hats.

Estimated cost per delegated task is in the $0.10–$0.30 range. Compared to the cost of running Opus end-to-end on rote work, that’s a real number across a typical week. But the bigger win has been the discipline: separating “what should happen” from “make it happen” forces a level of clarity I’d been getting away without.

Where to find it

The skill lives in my Claude Code skills directory and is now also published as the haiku skill in the Singletrack skill repo. It’s a single SKILL.md — drop it in ~/.claude/skills/haiku/ and invoke with /haiku or any of the voice triggers ("delegate to haiku", "have haiku do it", "cheap model for this").