Prerequisites
- Python 3.7+
openaiandlangchainPython packages- An OpenAI API key
Make sure your
OPENAI_API_KEY environment variable is set before running the examples:1. Import and Initialize the Embedding Model
Begin by importing theOpenAIEmbeddings class from LangChain and creating an instance with your chosen model. Replace "text-embedding-3-large" with any supported model from the OpenAI Embeddings guide.
Different models produce vectors of varying dimensions and performance characteristics. Refer to the OpenAI documentation for details on each embedding model.
2. Prepare the Input Documents
In a real application, you’d split your source (e.g., PDF, web pages) into text chunks. For demonstration, we’ll use four sports headlines:| Document Index | Headline | Category |
|---|---|---|
| 1 | Thrilling Finale Awaits: The Countdown to the Cricket World Cup Championship | Cricket |
| 2 | Global Giants Clash: Football World Cup Semi-Finals Set the Stage for Epic Showdowns | Football |
| 3 | Record Crowds and Unforgettable Moments: Highlights from the Cricket World Cup | Cricket |
| 4 | From Underdogs to Contenders: Football World Cup Surprises and Breakout Stars | Football |
3. Generate Embeddings
Use theembed_documents method to convert each string into its corresponding vector. Depending on the number of documents, this may take a few seconds.
4. Inspect and Validate the Results
4.1 Confirm Count
Ensure that the number of embeddings matches the number of input documents:4.2 View a Sample Embedding
Each embedding is a list of floats. For instance, the first document’s embedding might look like this:4.3 Check Embedding Dimensions
Determine the dimensionality of the vectors your model produces:Vector size varies by model. For example,
text-embedding-3-large outputs 3072-dimensional embeddings. Always verify dimensions before storing in a vector database.5. Next Steps
With embeddings generated, the typical workflow involves:- Storing vectors in a vector database (e.g., Pinecone, Weaviate, or Chroma).
- Performing similarity searches to retrieve semantically related documents.
- Building applications like semantic search engines or Q&A chatbots.