Chunking Strategies for RAG: Size Matters
By Learnia Team
Chunking Strategies for RAG: Size Matters
This article is written in English. Our training modules are available in French.
You've built a RAG system, but the AI keeps returning irrelevant answers. The problem might not be your model—it might be how you're chunking your documents.
What Is Chunking?
Chunking is the process of breaking large documents into smaller pieces for storage and retrieval in a RAG system.
Why We Chunk
Problem:
- Your document: 50,000 tokens
- Context window: 8,000 tokens
- Embedding models: Max 512 tokens
Solution:
- Split into ~100 chunks of 500 tokens each
- Embed and store each chunk
- Retrieve only relevant chunks
You can't feed entire documents to most AI systems—chunking makes them manageable.
Why Chunking Strategy Matters
Bad Chunking
Chunk 1: "...increased by 15%. The new policy"
Chunk 2: "requires all employees to submit forms"
Chunk 3: "by Friday. Safety regulations mandate"
Chunks split mid-sentence. Context lost. Retrieval fails.
Good Chunking
Chunk 1: "Revenue increased by 15% in Q3 2024."
Chunk 2: "The new expense policy requires all employees
to submit reimbursement forms by Friday each week."
Chunk 3: "Safety regulations mandate quarterly equipment
inspections for all manufacturing facilities."
Complete thoughts. Clear context. Effective retrieval.
The Chunking Trade-Off
Small Chunks (100-200 tokens)
✅ Precise retrieval
✅ Less noise in results
❌ May lose context
❌ More chunks to search
Large Chunks (1000+ tokens)
✅ More context preserved
✅ Fewer chunks to manage
❌ More noise in results
❌ May exceed model limits
The Sweet Spot
For most use cases: 300-500 tokens per chunk with 50-100 token overlap
5 Chunking Strategies
1. Fixed-Size Chunking
Split by character/token count:
Every 500 tokens → new chunk
Overlap: 50 tokens between chunks
Simple but blunt. May cut mid-sentence.
Best for: Quick prototypes, uniform documents
2. Sentence-Based Chunking
Split on sentence boundaries:
Chunk until reaching ~500 tokens
Always end on a complete sentence
Respects natural language boundaries.
Best for: General text, articles, documentation
3. Paragraph-Based Chunking
Keep paragraphs together:
Each paragraph = one chunk (if reasonable size)
Combine small paragraphs
Split very large paragraphs
Preserves topical coherence.
Best for: Well-structured documents, reports
4. Semantic Chunking
Split based on meaning changes:
Use AI to detect topic shifts
Start new chunk when topic changes
Most accurate but slower/costlier.
Best for: Complex documents, mixed content
5. Document Structure Chunking
Follow document hierarchy:
Respect headers, sections, lists
Each H2 section = logical chunk
Tables kept intact
Leverages author's organization.
Best for: Technical docs, manuals, structured content
The Overlap Question
Why Overlap?
Without overlap:
Chunk 1: "...the company achieved record sales."
Chunk 2: "This was primarily due to the new product line."
The connection between "record sales" and "new product line" is lost.
With overlap (last 2 sentences repeated):
Chunk 1: "...the company achieved record sales."
Chunk 2: "...achieved record sales. This was primarily
due to the new product line."
Context preserved across chunk boundaries.
How Much Overlap?
10-15% of chunk size is typical
Example: 500 token chunks, 50-75 token overlap
Too little: Context breaks
Too much: Wasted storage, duplicate results
Chunk Size by Use Case
| Use Case | Recommended Size | Why | |----------|-----------------|-----| | Q&A / Factoid | 200-300 tokens | Precise answers | | General chat | 400-500 tokens | Balanced context | | Summarization | 800-1000 tokens | More source material | | Legal/Technical | 300-400 tokens | Specific clauses | | Creative content | 500-800 tokens | Flow and context |
Common Chunking Mistakes
1. One Size Fits All
Using same chunk size for FAQs and legal contracts ❌
Different content types need different strategies
2. Ignoring Structure
Splitting a table across chunks ❌
Separating a heading from its content ❌
Breaking up a code block ❌
3. No Metadata
Chunk without knowing its source document ❌
No idea which section it came from ❌
Always preserve: source, page, section, date
4. Never Testing
Set chunk size once, never evaluate ❌
Retrieval quality varies—test and iterate
Metadata: The Secret Weapon
Good chunks include context:
{
"text": "The return policy allows 30 days...",
"metadata": {
"source": "customer-policies.pdf",
"section": "Returns & Refunds",
"page": 12,
"last_updated": "2024-06-15"
}
}
This enables:
- →Filtering by source
- →Citing specific pages
- →Showing freshness
- →Debugging retrieval issues
Evaluation: Is Your Chunking Working?
Test with Real Queries
Question: "What's the vacation policy for new employees?"
Check:
1. Is the right chunk retrieved?
2. Does it contain the complete answer?
3. Is there too much irrelevant content?
Metrics to Track
Retrieval precision: % of retrieved chunks that are relevant
Retrieval recall: % of relevant chunks that are retrieved
Answer quality: Does the LLM produce correct answers?
Key Takeaways
- →Chunking = splitting documents for RAG retrieval
- →Size matters: 300-500 tokens typical, adjust per use case
- →Strategy matters: Fixed, sentence, paragraph, semantic, structural
- →Overlap preserves context across boundaries (10-15%)
- →Metadata makes chunks traceable and filterable
Ready to Master RAG?
This article covered the what and why of chunking strategies. But production RAG systems require end-to-end design including embedding selection, retrieval tuning, and integration.
In our Module 5 — RAG & Context Engineering, you'll learn:
- →Complete RAG architecture design
- →Advanced chunking implementations
- →Hybrid search strategies
- →Retrieval evaluation and optimization
- →Production deployment patterns
Module 5 — RAG (Retrieval-Augmented Generation)
Ground AI responses in your own documents and data sources.