AI Engineering Insights

Vector search + hard filters in Elasticsearch: the hidden RAG bottleneck

HNSW graph topology breaks under metadata filtering. A hybrid retrieval strategy for production RAG systems.

5 min read · December 12, 2025
Architecture design: a constraint-satisfaction approach

Methodology for reducing the architectural search space through hierarchical constraint definition: problem, boundaries, and trade-offs.

5 min read · December 09, 2025
Classification with LLMs: getting accurate probabilities from structured output

Verbalized confidence in JSON schema provides fast probability estimates for classification tasks. Optimization patterns improve calibration.

8 min read · December 05, 2025
Token optimization: three production patterns that reduce LLM costs by 70%

API-level caching, semantic similarity-based caching, and dynamic compression with LLMLingua form a layered approach to token reduction. Each pattern targets different inefficiencies in the prompt processing pipeline.

12 min read · December 02, 2025
Hierarchical signal tuning: optimizing components before fusion

Fusion algorithms like linear combination or RRF cannot fix poor input signals. Effective hybrid search requires a bottom-up optimization strategy: tuning field weights within BM25 and embedding strategies within dense components before attempting to merge them.

4 min read · November 26, 2025

Production AI systems, evaluation, and architecture