llm RAG
Retrieval-Augmented Generation
A technique that improves LLM responses by adding relevant information from external sources before generating an answer.
What Is RAG
RAG (Retrieval-Augmented Generation) is an architectural pattern that combines information retrieval with text generation. Instead of relying solely on the model’s training data, RAG first finds relevant documents, then uses them as context.
How RAG Works
- Indexing — documents are split into chunks and converted to numerical vectors (embeddings)
- Retrieval — when a user queries, the most relevant chunks are found
- Generation — found chunks are added to the prompt, and the model generates a response
When to Use RAG
- Corporate knowledge base (internal documentation)
- Up-to-date information (news, product updates)
- Specialized domains (medicine, law)
- When you need source transparency in responses
RAG vs Fine-tuning
| Aspect | RAG | Fine-tuning |
|---|---|---|
| Freshness | Always current | Needs retraining |
| Cost | Cheaper to start | More expensive, needs GPUs |
| Transparency | Has sources | No sources |
| Complexity | Medium | High |