Posts on evals, failure modes, production agents, and how we scope engagements. Updated monthly.
A head-to-head comparison from a senior engineering studio that ships both — developer ergonomics, cost, latency, tool-use reliability, and a framework for picking the right one per workflow.
The 5 patterns our studio uses to ship Claude Agent SDK agents that survive real traffic — subagents, tool-use retries, skills, orchestration, and evals-as-gate.
A walkthrough of the three-layer eval harness we ship with every agent — unit tests for prompts, property tests for outputs, and a drift detector that runs nightly against production traffic.
How we replaced a three-person triage rotation with a single Claude-powered classifier — and what we measured before saying it worked.
What we actually produce in a one-week automation audit — stakeholder interviews, workflow map, ranked build list with estimated $ impact.
Zapier is a great tool until it isn't. Here are the four signals that tell us a workflow has outgrown low-code — and what we usually replace it with.
Tiered model routing, aggressive caching, and shrinking prompts — the engineering we do to get agent unit economics that actually work at scale.
We use a single cookie to measure anonymous site traffic. No ads, no third-party tracking. Privacy policy.