Glossary

RAG (Retrieval-Augmented Generation)

AI pattern where an LLM generates answers from documents retrieved at query time, rather than from training data alone.

Definition

Retrieval-Augmented Generation is the dominant production pattern for LLM applications that need to answer questions over private or up-to-date content. The system first retrieves relevant chunks from a document store (often using embeddings and vector search, sometimes combined with keyword retrieval), then passes those chunks to the LLM as context so the generated answer is grounded in the source material. RAG sidesteps the cost and risk of fine-tuning while keeping answers current and citable.

Why it matters

RAG is what makes 'AI over our docs' actually work in production. Done well, it dramatically reduces hallucinations and lets you cite sources. Done badly, it returns confident answers from irrelevant chunks. Evaluation harnesses are mandatory.

See also

Working on RAG (Retrieval-Augmented Generation)?

Our AI Developmentteam ships this in production. Tell us your scope and we'll share a written recommendation and fixed quote within 48 hours.

AI Development