-
Hybrid intent classification: compact-encoder-first routing for production systems
Production chatbots route most requests through fast compact encoder classifiers, escalating to LLMs only on low-confidence queries. This hybrid architecture mitigates the latency and cost overheads of monolithic LLM solutions, achieving significant speed gains while preserving high classification accuracy.
-
Few-shot prompt ordering: the impact of example position
Investigating positional bias in few-shot prompting. While 'Lost in the Middle' suggests boundary importance, the specific ordering of examples remains an important factor for performance stability.
-
Temporal for LLM pipelines: durable execution starter pack
LLM agents often crash, losing state and expensive API work. Temporal provides durable execution for LLM pipelines: automatic state recovery, configurable retries, and long-running orchestration at the cost of determinism constraints and ops overhead.
-
GraphRAG: beyond vector search for connecting the dots
Vector search finds similar text while GraphRAG finds connected facts. A look at the trade-offs, high indexing costs, and lighter-weight alternatives.
-
Domain-driven design for AI systems: architectural patterns and production experience
Exploring how domain-driven design principles (bounded contexts, anti-corruption layer, ubiquitous language, domain events) enable modularity, safety, and traceability in production AI and LLM systems.