Field notes

Writing from the engineering team.

Posts on evals, failure modes, production agents, and how we scope engagements. Updated monthly.

Guide · 30 min read

AI automation agency for ops teams — what 'custom' actually costs

A buyer's guide to AI automation agencies for ops teams — pricing models, how to tell a studio from a course, 6 archetypes compared, and what "custom" actually costs in 2026.

Sadig Muradov Jun 2026
Engineering · 28 min read

OpenAI Agent Builder vs Claude Agent SDK: a studio's decision framework

A head-to-head comparison from a senior engineering studio that ships both — developer ergonomics, cost, latency, tool-use reliability, and a framework for picking the right one per workflow.

Sadig Muradov May 2026
Engineering · 22 min read

Claude Agent SDK in production: a studio's playbook

The 5 patterns our studio uses to ship Claude Agent SDK agents that survive real traffic — subagents, tool-use retries, skills, orchestration, and evals-as-gate.

Sadig Muradov May 2026
Opinion · 26 min read

When Zapier stops being the answer: 4 ops signals

The four signals that tell us an ops workflow has outgrown Zapier — branching logic, retries and rate limits, human-in-the-loop review, and real observability — plus the typed, tested Claude-based replacement we ship when the signals fire.

Sadig Muradov Apr 2026
Guide · 24 min read

Agentic AI for ops teams: a working definition + 6 patterns we ship

A plain-English definition of "agentic AI" for ops teams, plus the 6 patterns we actually ship — triage, extraction, research, copilots, orchestrators, and human-in-the-loop review — with cost profiles and the eval hooks that keep each one honest in production.

Sadig Muradov Apr 2026
Engineering · 20 min read

Eval suites that catch drift before customers do: the 3-layer harness

A walkthrough of the 3-layer eval harness we ship with every production agent — prompt unit tests, property tests on outputs, and nightly drift detection on production traffic — with cost per layer, the specific failure modes each catches, and the decision to use LLM-as-judge only where it earns its keep.

Sadig Muradov Apr 2026