AI Engineering Insights

Prompt caching in production: where the savings disappear

KV cache reuse can cut inference costs and latency substantially on paper. Most production deployments capture a fraction of that gap, and the reason is structural rather than a missing configuration flag.

8 min read · July 21, 2026
Why agent memory degrades in production

What happens to agent memory stores after weeks of production use, and what the read path never exposes.

6 min read · May 28, 2026
Context engineering as a production discipline

Failure modes, architectural patterns, and evidence from real systems.

6 min read · April 08, 2026
Why LLM evaluation metrics look stable but customers are unhappy

Classic metrics hide the failures users notice. Production evaluation should measure friction, drift, and task completion.

4 min read · January 25, 2026
Context limits degrade routing quality faster than generation

Routing and classification under long prompts: score dilution, margin collapse, routing collapse, and practical caps.

5 min read · January 18, 2026

Production AI systems, evaluation, and architecture