Lately, I’ve been diving deep into prototyping and debugging LLM applications — from RAG to multimodal agents. The circumstances are almost always the same: no time for complex solutions, no perfect datasets, and a pressing need for high quality right now (or yesterday). This forced me to develop a pragmatic, almost detective-like approach to debugging, which I want to share.

Step 1: simplify and isolate

When a system behaves unpredictably, my first reaction is to disconnect everything non-essential. Any “enhancing” but optional steps (like complex re-rankers or additional prompts) are temporarily removed. This helps isolate the core of the problem. If a task is too complex, I try swapping a giant model for something smaller and faster. It’s surprising how often this not only speeds up iterations but also improves stability for a specific task.

Step 2: a granular breakdown

After simplifying, I begin a step-by-step analysis, where each component is tested separately. Think of it as examining every clue at a crime scene.

Step 3: the pursuit of consistency

The goal of debugging isn’t just to find a bug, but to achieve stable, predictable behavior. To do this, I heavily rely on structured and granular output. I instruct the model to return its response in JSON format and even apply post-processing to force the output into the required schema. This granularity is not just for reliability in the moment; it also makes it much easier to spot performance drift over time.

The guiding principle: interpretability

This pragmatic approach — from simplification to a detailed breakdown — helps me quickly manage complex systems that, at first glance, seem like uncontrollable black boxes. It proves that even without perfect conditions, a structured, iterative process can lead to high-quality results.


Every decision the system makes must be explainable with human logic. If we can't explain why an agent chose a particular tool or why a RAG system gave a specific answer, then we don't understand the system.