-
Embeddings for intent classification: architecture trade-offs
Practical guide to building intent classifiers with embeddings. When shallow classifiers beat fine-tuning, how to handle confidence thresholds, and what actually matters in production.
-
Similarity metrics for embeddings
Why almost always cosine and what actually works?
-
Tokenizers: production economics cheat-sheet
Compact reference for tokenizer selection, metrics, and failure modes in production LLM systems.
-
The metric gap: bridging business outcomes and AI component optimization
Why high component scores often mask system failures. A methodology for using E2E evaluation to prioritize engineering work.
-
Reflection vs evaluation: why the Agent-Critic pattern fails without separation of concerns
Architectural separation of reflection (context generation) and evaluation (quality gating) prevents confirmation bias, premature stopping, and infinite loops in multi-agent research systems.