Table of Contents
- Prerequisites
- Step 1: Install and Import Dependencies
- Step 2: Prepare Your Documents
- Step 3: Create a Chroma Vector Store
- Step 4: Execute Semantic Queries
- How It Works
- Further Reading
Prerequisites
- Python 3.7+
- An OpenAI API key
- Install the required libraries:
Ensure your
OPENAI_API_KEY environment variable is set:Step 1: Install and Import Dependencies
Begin by importing LangChain’s embedding model and the Chroma vector store:Step 2: Prepare Your Documents
Here, we create a small collection of sports headlines. In real applications, you might load text from files, PDFs, or a database.Step 3: Create a Chroma Vector Store
Chroma will automatically generate embeddings for each document and store them in a local vector database:| Component | Description | Example |
|---|---|---|
| Embeddings | Converts text into high-dimensional vectors | OpenAIEmbeddings(model="text-embedding-ada-002") |
| Vector Database | Stores and indexes embedding vectors for similarity ops | Chroma.from_texts(texts=docs, embedding=embeddings) |
Step 4: Execute Semantic Queries
With your documents indexed, you can now query the vector store. Semantic search will return contextually related headlines, even without shared keywords.Cricket Query
Football Query
Adjust
k to change the number of returned results. For instance, k=1 returns only the single most similar document.How It Works
- Embedding Generation
Both documents and queries are transformed into vector embeddings by the same model. - Vector Similarity
Chroma computes distances between the query vector and stored document vectors, retrieving the top-kclosest matches. - Semantic Matching
Unlike keyword-based search, semantic search finds conceptually related content—even if the exact terms differ.