Semantic Search with LangChain
2026A semantic search engine that enables natural language querying over PDF documents. Instead of keyword matching, it retrieves passages by semantic meaning using vector embeddings. Delivered as a walkthrough Jupyter notebook covering the full pipeline.
Pipeline
- Document loading — parse PDFs with PyPDF
- Text splitting — chunk documents for embedding
- Embedding — all-MiniLM-L6-v2 via HuggingFace
- Vector storage — ChromaDB for similarity search
- Retrieval — query by natural language and get relevant passages
- Repository
- GitHub
- Platform
- Jupyter Notebook
- Stack
- LangChain, ChromaDB, HuggingFace, PyPDF, Python