Field notes

Writing from the engineering team.

Posts on evals, failure modes, production agents, and how we scope engagements. Updated monthly.

Engineering · 28 min read

OpenAI Agent Builder vs Claude Agent SDK: a studio's decision framework

A head-to-head comparison from a senior engineering studio that ships both — developer ergonomics, cost, latency, tool-use reliability, and a framework for picking the right one per workflow.

Sadig Muradov May 2026
Engineering · 22 min read

Claude Agent SDK in production: a studio's playbook

The 5 patterns our studio uses to ship Claude Agent SDK agents that survive real traffic — subagents, tool-use retries, skills, orchestration, and evals-as-gate.

Sadig Muradov May 2026
eval
Engineering · 12 min read

How we build eval suites that catch drift before customers do

A walkthrough of the three-layer eval harness we ship with every agent — unit tests for prompts, property tests for outputs, and a drift detector that runs nightly against production traffic.

Maria Chen · Staff Eng Apr 2026
rag
Case study · 6 min read

Cutting inbound triage time 74% for a 40-person SaaS

How we replaced a three-person triage rotation with a single Claude-powered classifier — and what we measured before saying it worked.

Case study Mar 2026
ops
Playbook · 8 min read

The $1,500 audit — what a good automation roadmap looks like

What we actually produce in a one-week automation audit — stakeholder interviews, workflow map, ranked build list with estimated $ impact.

Playbook Mar 2026
zap
Opinion · 5 min read

When Zapier stops being the answer (and what comes next)

Zapier is a great tool until it isn't. Here are the four signals that tell us a workflow has outgrown low-code — and what we usually replace it with.

Opinion Feb 2026
cost
Engineering · 9 min read

Why our agents cost $0.002/request instead of $0.12

Tiered model routing, aggressive caching, and shrinking prompts — the engineering we do to get agent unit economics that actually work at scale.

Engineering Feb 2026