AISuffer
llm RAG

Retrieval-Augmented Generation

A technique that improves LLM responses by adding relevant information from external sources before generating an answer.

What Is RAG

RAG (Retrieval-Augmented Generation) is an architectural pattern that combines information retrieval with text generation. Instead of relying solely on the model’s training data, RAG first finds relevant documents, then uses them as context.

How RAG Works

  1. Indexing — documents are split into chunks and converted to numerical vectors (embeddings)
  2. Retrieval — when a user queries, the most relevant chunks are found
  3. Generation — found chunks are added to the prompt, and the model generates a response

When to Use RAG

  • Corporate knowledge base (internal documentation)
  • Up-to-date information (news, product updates)
  • Specialized domains (medicine, law)
  • When you need source transparency in responses

RAG vs Fine-tuning

AspectRAGFine-tuning
FreshnessAlways currentNeeds retraining
CostCheaper to startMore expensive, needs GPUs
TransparencyHas sourcesNo sources
ComplexityMediumHigh