-
Vector search + hard filters in Elasticsearch: the hidden RAG bottleneck
HNSW graph topology breaks under metadata filtering. A hybrid retrieval strategy for production RAG systems.
-
Architecture design: a constraint-satisfaction approach
Methodology for reducing the architectural search space through hierarchical constraint definition: problem, boundaries, and trade-offs.
-
Classification with LLMs: getting accurate probabilities from structured output
Verbalized confidence in JSON schema provides fast probability estimates for classification tasks. Optimization patterns improve calibration.
-
Token optimization: three production patterns that reduce LLM costs by 70%
API-level caching, semantic similarity-based caching, and dynamic compression with LLMLingua form a layered approach to token reduction. Each pattern targets different inefficiencies in the prompt processing pipeline.
-
Hierarchical signal tuning: optimizing components before fusion
Fusion algorithms like linear combination or RRF cannot fix poor input signals. Effective hybrid search requires a bottom-up optimization strategy: tuning field weights within BM25 and embedding strategies within dense components before attempting to merge them.