Why our agents cost $0.002/request instead of $0.12
Tiered model routing, aggressive caching, and shrinking prompts — the engineering we do to get agent unit economics that actually work at scale.
Draft — full engineering write-up coming soon.
Most teams' first agent deployment runs on the biggest available model with the longest possible prompt. It works, and then the invoice arrives. This post covers the three levers we pull on every engagement to bring per-request cost down by 50–100x: routing simple requests to smaller models, caching aggressively at multiple layers, and compressing context without losing quality.