-
Context engineering as a production discipline
Failure modes, architectural patterns, and evidence from real systems.
-
Why LLM evaluation metrics look stable but customers are unhappy
Classic metrics hide the failures users notice. Production evaluation should measure friction, drift, and task completion.
-
Context limits degrade routing quality faster than generation
Routing and classification under long prompts: score dilution, margin collapse, routing collapse, and practical caps.
-
RAG fails upstream
Why most RAG failures originate in the data preparation layer, and what to do about it.
-
Embeddings for intent classification: architecture trade-offs
Practical guide to building intent classifiers with embeddings. When shallow classifiers beat fine-tuning, how to handle confidence thresholds, and what actually matters in production.