AI Engineering Insights

Domain-driven design for AI systems: architectural patterns and production experience

Exploring how domain-driven design principles (bounded contexts, anti-corruption layer, ubiquitous language, domain events) enable modularity, safety, and traceability in production AI and LLM systems.

15 min read · November 04, 2025
Semantic prompt caching: when LLM-judge beats exact match

Standard prompt caching requires exact prefix match. LLM-Judge validates semantic equivalence, rescuing cache hits on paraphrases while adding controllable latency overhead.

5 min read · October 30, 2025
The reranking trap: when cross-encoders make things worse

Cross-encoders and LLM rerankers promise better retrieval precision, but the 200% latency penalty, diversity collapse, and production failures reveal when this expensive step becomes counterproductive.

7 min read · October 27, 2025
Structured output engineering for production LLMs

Transitioning from 85% parse rates to production-grade reliability. Constrained decoding guarantees format, Pydantic ensures correctness, token optimization cuts costs by 50%.

8 min read · October 24, 2025
The chunk size dilemma: identifying the optimal value in RAG systems

Finding the optimal chunk size is non-trivial: too small loses context, too large dilutes semantics through mean pooling. A systematic methodology for identifying the sweet spot.

8 min read · October 20, 2025

Production AI systems, evaluation, and architecture