-
Mitigating positional bias in LLM-as-a-judge evaluation: the swapping technique
Large language model judges often exhibit a strong preference for the first presented option (position bias). A position-swapping methodology significantly improves agreement with human ratings.
-
Hybrid retrieval with reciprocal rank fusion: solving the score normalization problem
Pure vector search isn't always enough. Weighted averaging of BM25 and vector scores breaks due to incompatible scales. Reciprocal rank fusion solves this by using ranks instead of scores.
-
LLM orchestration: a pragmatic guide to complexity
Most production apps are simple chains, yet everyone is building agents. Here's a clear framework on when you really need loops, graphs, and agents in your LLM app.
-
How Qdrant's scalar quantization cut our RAG latency by 3x
A deep dive into how we cut RAG retrieval latency by 3x and costs by 65% using Qdrant's scalar quantization and a hybrid storage strategy, without sacrificing search quality.
-
Why vision-language models ignore visual evidence (and how to fix it)
Vision-language models have a strong contextual bias, prioritizing 'logical' conclusions over visual facts. We fixed this in a production case by explicitly telling the model to ignore what it thought it knew.