December 18, 20255 MIN READ

Chunking Strategies for RAG: Size Matters

By Learnia Team

Chunking Strategies for RAG: Size Matters

This article is written in English. Our training modules are available in French.

You've built a RAG system, but the AI keeps returning irrelevant answers. The problem might not be your model—it might be how you're chunking your documents.

What Is Chunking?

Chunking is the process of breaking large documents into smaller pieces for storage and retrieval in a RAG system.

Why We Chunk

Problem:
- Your document: 50,000 tokens
- Context window: 8,000 tokens
- Embedding models: Max 512 tokens

Solution:
- Split into ~100 chunks of 500 tokens each
- Embed and store each chunk
- Retrieve only relevant chunks

You can't feed entire documents to most AI systems—chunking makes them manageable.

Why Chunking Strategy Matters

Bad Chunking

Chunk 1: "...increased by 15%. The new policy"
Chunk 2: "requires all employees to submit forms"
Chunk 3: "by Friday. Safety regulations mandate"

Chunks split mid-sentence. Context lost. Retrieval fails.

Good Chunking

Chunk 1: "Revenue increased by 15% in Q3 2024."
Chunk 2: "The new expense policy requires all employees 
          to submit reimbursement forms by Friday each week."
Chunk 3: "Safety regulations mandate quarterly equipment 
          inspections for all manufacturing facilities."

Complete thoughts. Clear context. Effective retrieval.

The Chunking Trade-Off

Small Chunks (100-200 tokens)

✅ Precise retrieval
✅ Less noise in results
❌ May lose context
❌ More chunks to search

Large Chunks (1000+ tokens)

✅ More context preserved
✅ Fewer chunks to manage
❌ More noise in results
❌ May exceed model limits

The Sweet Spot

For most use cases: 300-500 tokens per chunk with 50-100 token overlap

5 Chunking Strategies

1. Fixed-Size Chunking

Split by character/token count:

Every 500 tokens → new chunk
Overlap: 50 tokens between chunks

Simple but blunt. May cut mid-sentence.

Best for: Quick prototypes, uniform documents

2. Sentence-Based Chunking

Split on sentence boundaries:

Chunk until reaching ~500 tokens
Always end on a complete sentence

Respects natural language boundaries.

Best for: General text, articles, documentation

3. Paragraph-Based Chunking

Keep paragraphs together:

Each paragraph = one chunk (if reasonable size)
Combine small paragraphs
Split very large paragraphs

Preserves topical coherence.

Best for: Well-structured documents, reports

4. Semantic Chunking

Split based on meaning changes:

Use AI to detect topic shifts
Start new chunk when topic changes

Most accurate but slower/costlier.

Best for: Complex documents, mixed content

5. Document Structure Chunking

Follow document hierarchy:

Respect headers, sections, lists
Each H2 section = logical chunk
Tables kept intact

Leverages author's organization.

Best for: Technical docs, manuals, structured content

The Overlap Question

Why Overlap?

Without overlap:
Chunk 1: "...the company achieved record sales."
Chunk 2: "This was primarily due to the new product line."

The connection between "record sales" and "new product line" is lost.

With overlap (last 2 sentences repeated):
Chunk 1: "...the company achieved record sales."
Chunk 2: "...achieved record sales. This was primarily 
          due to the new product line."

Context preserved across chunk boundaries.

How Much Overlap?

10-15% of chunk size is typical
Example: 500 token chunks, 50-75 token overlap

Too little: Context breaks
Too much: Wasted storage, duplicate results

Chunk Size by Use Case

| Use Case | Recommended Size | Why | |----------|-----------------|-----| | Q&A / Factoid | 200-300 tokens | Precise answers | | General chat | 400-500 tokens | Balanced context | | Summarization | 800-1000 tokens | More source material | | Legal/Technical | 300-400 tokens | Specific clauses | | Creative content | 500-800 tokens | Flow and context |

Common Chunking Mistakes

1. One Size Fits All

Using same chunk size for FAQs and legal contracts ❌
Different content types need different strategies

2. Ignoring Structure

Splitting a table across chunks ❌
Separating a heading from its content ❌
Breaking up a code block ❌

3. No Metadata

Chunk without knowing its source document ❌
No idea which section it came from ❌

Always preserve: source, page, section, date

4. Never Testing

Set chunk size once, never evaluate ❌
Retrieval quality varies—test and iterate

Metadata: The Secret Weapon

Good chunks include context:

{
  "text": "The return policy allows 30 days...",
  "metadata": {
    "source": "customer-policies.pdf",
    "section": "Returns & Refunds",
    "page": 12,
    "last_updated": "2024-06-15"
  }
}

This enables:

→Filtering by source
→Citing specific pages
→Showing freshness
→Debugging retrieval issues

Evaluation: Is Your Chunking Working?

Test with Real Queries

Question: "What's the vacation policy for new employees?"

Check:
1. Is the right chunk retrieved?
2. Does it contain the complete answer?
3. Is there too much irrelevant content?

Metrics to Track

Retrieval precision: % of retrieved chunks that are relevant
Retrieval recall: % of relevant chunks that are retrieved
Answer quality: Does the LLM produce correct answers?

Key Takeaways

→Chunking = splitting documents for RAG retrieval
→Size matters: 300-500 tokens typical, adjust per use case
→Strategy matters: Fixed, sentence, paragraph, semantic, structural
→Overlap preserves context across boundaries (10-15%)
→Metadata makes chunks traceable and filterable

Ready to Master RAG?

This article covered the what and why of chunking strategies. But production RAG systems require end-to-end design including embedding selection, retrieval tuning, and integration.

In our Module 5 — RAG & Context Engineering, you'll learn:

→Complete RAG architecture design
→Advanced chunking implementations
→Hybrid search strategies
→Retrieval evaluation and optimization
→Production deployment patterns

→ Explore Module 5: RAG & Context Engineering

GO DEEPER

Module 5 — RAG (Retrieval-Augmented Generation)

Ground AI responses in your own documents and data sources.

Explore the Module