RAG from Scratch

2026

A minimal RAG implementation built from the ground up without framework abstractions. Follows the core loop: documents → chunk → embed → store → query → retrieve → generate. The goal is to understand each step of the RAG pipeline before reaching for tools like LangChain.

What's Inside

Chunking — configurable text splitting with overlap
Embedding — sentence-transformers for vector representations
Vector store — cosine similarity search implemented with NumPy
Generation — context-augmented prompts sent to Ollama or Gemini
Modular architecture — separate modules for each pipeline stage

Repository: GitHub
Platform: Python CLI
Stack: Python, NumPy, sentence-transformers, Ollama, Gemini

RAG from Scratch

What's Inside

Architecture