Semantic Search with LangChain

2026

A semantic search engine that enables natural language querying over PDF documents. Instead of keyword matching, it retrieves passages by semantic meaning using vector embeddings. Delivered as a walkthrough Jupyter notebook covering the full pipeline.

Pipeline

Document loading — parse PDFs with PyPDF
Text splitting — chunk documents for embedding
Embedding — all-MiniLM-L6-v2 via HuggingFace
Vector storage — ChromaDB for similarity search
Retrieval — query by natural language and get relevant passages

Repository: GitHub
Platform: Jupyter Notebook
Stack: LangChain, ChromaDB, HuggingFace, PyPDF, Python