AI Engineering Insights

The chunk size dilemma: identifying the optimal value in RAG systems

Finding the optimal chunk size is non-trivial: too small loses context, too large dilutes semantics through mean pooling. A systematic methodology for identifying the sweet spot.

8 min read · October 20, 2025
Mitigating positional bias in LLM-as-a-judge evaluation: the swapping technique

LLM judges often exhibit a strong preference for the first presented option (position bias). A position-swapping methodology significantly improves agreement with human ratings.

4 min read · October 10, 2025
Hybrid retrieval with RRF: solving the score normalization problem

Pure vector search isn't always enough. Weighted averaging of BM25 and vector scores breaks due to incompatible scales. RRF solves this by using ranks instead of scores.

9 min read · October 06, 2025
LLM orchestration: a pragmatic guide to complexity

Most production apps are simple chains, yet everyone is building agents. Here's a clear framework on when you really need loops, graphs, and agents in your LLM app.

4 min read · October 02, 2025
How Qdrant's scalar quantization cut our RAG latency by 3x

A deep dive into how we cut RAG retrieval latency by 3x and costs by 65% using Qdrant's scalar quantization and a hybrid storage strategy, without sacrificing search quality.

4 min read · September 28, 2025

Production AI systems, evaluation, and architecture