-
Similarity metrics for embeddings
Why almost always cosine and what actually works?
-
Tokenizers: production economics cheat-sheet
Compact reference for tokenizer selection, metrics, and failure modes in production LLM systems.
-
The metric gap: bridging business outcomes and AI component optimization
Why high component scores often mask system failures. A methodology for using E2E evaluation to prioritize engineering work.
-
Reflection vs evaluation: why the Agent-Critic pattern fails without separation of concerns
Architectural separation of reflection (context generation) and evaluation (quality gating) prevents confirmation bias, premature stopping, and infinite loops in multi-agent research systems.
-
Vector search + hard filters in Elasticsearch: the hidden RAG bottleneck
HNSW graph topology breaks under metadata filtering. A hybrid retrieval strategy for production RAG systems.