AI Engineering Insights

Mitigating positional bias in LLM-as-a-judge evaluation: the swapping technique

Large language model judges often exhibit a strong preference for the first presented option (position bias). A position-swapping methodology significantly improves agreement with human ratings.

4 min read · October 10, 2025
Hybrid retrieval with reciprocal rank fusion: solving the score normalization problem

Pure vector search isn't always enough. Weighted averaging of BM25 and vector scores breaks due to incompatible scales. Reciprocal rank fusion solves this by using ranks instead of scores.

9 min read · October 06, 2025
LLM orchestration: a pragmatic guide to complexity

Most production apps are simple chains, yet everyone is building agents. Here's a clear framework on when you really need loops, graphs, and agents in your LLM app.

4 min read · October 02, 2025
How Qdrant's scalar quantization cut our RAG latency by 3x

A deep dive into how we cut RAG retrieval latency by 3x and costs by 65% using Qdrant's scalar quantization and a hybrid storage strategy, without sacrificing search quality.

4 min read · September 28, 2025
Why vision-language models ignore visual evidence (and how to fix it)

Vision-language models have a strong contextual bias, prioritizing 'logical' conclusions over visual facts. We fixed this in a production case by explicitly telling the model to ignore what it thought it knew.

4 min read · September 21, 2025

Production AI systems, evaluation, and architecture