Retrieval-Augmented Generation — AI Glossary

What Is RAG

RAG (Retrieval-Augmented Generation) is an architectural pattern that combines information retrieval with text generation. Instead of relying solely on the model’s training data, RAG first finds relevant documents, then uses them as context.

How RAG Works

Indexing — documents are split into chunks and converted to numerical vectors (embeddings) and stored in a vector database like Pinecone or Weaviate
Retrieval — when a user queries, the most relevant chunks are found
Generation — found chunks are added to the prompt, and the model generates a response

When to Use RAG

Corporate knowledge base (internal documentation)
Up-to-date information (news, product updates)
Specialized domains (medicine, law)
When you need source transparency in responses

RAG vs Fine-tuning

Aspect	RAG	Fine-tuning
Freshness	Always current	Needs retraining
Cost	Cheaper to start	More expensive, needs GPUs
Transparency	Has sources	No sources
Complexity	Medium	High

What Is RAG

How RAG Works

When to Use RAG

RAG vs Fine-tuning

Related Terms