Most hallucination issues are retrieval issues. Teams jump to model tuning before checking whether relevant evidence was ever retrieved.
Retrieval quality signals to track
- Hit-rate at K: How often the right chunk appears in top results.
- Context precision: How much of retrieved context is truly useful.
- Coverage drift: Whether new docs degrade relevance for existing queries.
Reliability improvements that compound
Use query expansion, chunking tuned to domain semantics, and periodic offline evaluation sets. These changes usually improve outcomes faster than prompt experiments alone.
Team workflow
Treat retrieval changes like API changes: version them, evaluate them, and release them through one pipeline.


