-
Hybrid retrieval with RRF: solving the score normalization problem
Pure vector search isn't always enough. Weighted averaging of BM25 and vector scores breaks due to incompatible scales. RRF solves this by using ranks instead of scores.
-
LLM orchestration: a pragmatic guide to complexity
Most production apps are simple chains, yet everyone is building agents. Here's a clear framework on when you really need loops, graphs, and agents in your LLM app.
-
How Qdrant's scalar quantization cut our RAG latency by 3x
A deep dive into how we cut RAG retrieval latency by 3x and costs by 65% using Qdrant's scalar quantization and a hybrid storage strategy, without sacrificing search quality.
-
Why VLMs ignore visual evidence (and how to fix it)
VLMs have a strong contextual bias, prioritizing logical conclusions over visual facts. We fixed this in a production case by explicitly telling the model to ignore what it thought it knew.
-
Our agents argued endlessly. Here's how a hybrid AI pattern tamed LLM chaos
A deep dive into building a medical ranking PoC where pure LLM reasoning failed, and how a hybrid pattern combining LLM feature extraction with a deterministic rule engine achieved stable, auditable results.