RAG from Scratch
2026A minimal RAG implementation built from the ground up without framework abstractions. Follows the core loop: documents → chunk → embed → store → query → retrieve → generate. The goal is to understand each step of the RAG pipeline before reaching for tools like LangChain.
What's Inside
- Chunking — configurable text splitting with overlap
- Embedding — sentence-transformers for vector representations
- Vector store — cosine similarity search implemented with NumPy
- Generation — context-augmented prompts sent to Ollama or Gemini
- Modular architecture — separate modules for each pipeline stage
- Repository
- GitHub
- Platform
- Python CLI
- Stack
- Python, NumPy, sentence-transformers, Ollama, Gemini
Architecture