This tutorial demonstrates how to integrate Pinecone with LangChain for multimodal tasks, such as image and text embeddings, leveraging OpenCLIP for embedding generation.
We cover setting up a Pinecone index, processing multimodal datasets, and efficiently uploading vectors with parallelism. Additionally, we explore how to perform text-based and image-based searches using the Pinecone index.
By the end of this guide, you'll be able to build a scalable and efficient multimodal vector search system.
[Note] If you are using a .env file, proceed as follows.
from dotenv import load_dotenv
load_dotenv(override=True)
True
Using multimodal
We will use the datasets library to load a sample dataset and process it for embedding generation.
from utils.pinecone import PineconeDocumentManager
import os
multimodal_pc = PineconeDocumentManager(
api_key=os.getenv("PINECONE_API_KEY"),
)
Step 1: Load and Save Dataset Images Temporarily
The dataset we use here includes images and associated metadata (e.g., prompts and categories). The images are saved temporarily for embedding generation.
from datasets import load_dataset
# Load dataset
dataset = load_dataset("Pupba/animal-180", split="train")
# Process first 50 images
images = dataset[:50]["png"]
image_paths = [multimodal_pc.save_temp_image(img) for img in images]
metas = dataset[:50]["json"]
prompts = [data["prompt"] for data in metas]
categories = [data["category"] for data in metas]
print("Image Path:", image_paths[10])
print("Prompt:", prompts[10])
print("Category:", categories[10])
images[10]
Image Path: C:\Users\Public\Documents\ESTsoft\CreatorTemp\tmppxen5rk3.png
Prompt: a rabbit lying on a soft blanket, warm indoor lighting, cozy atmosphere, highly detailed, 8k resolution.
Category: rabbit
Step 2: Loading OpenCLIP for Embedding Generation
OpenCLIP will be used to generate embeddings for both images and text.
# Load OpenCLIP model
model = "ViT-H-14-378-quickgelu"
checkpoint = "dfn5b"
image_embedding = multimodal_pc._initialize_openclip(
model_name=model,
checkpoint=checkpoint,
)
[INFO] OpenCLIP model initialized.
Step 3: Create Pinecone Index for Multimodal Data
We create a Pinecone index to store image embeddings. This index will later be used for searching.
from pinecone import ServerlessSpec, PodSpec
# Create or reuse the index
index_name = "langchain-opentutorial-multimodal-1024"
# Set to True when using the serverless method, and False when using the PodSpec method.
use_serverless = True
if use_serverless:
spec = ServerlessSpec(cloud="aws", region="us-east-1")
else:
spec = PodSpec(environment="us-west1-gcp", pod_type="p1.x1", pods=1)
multimodal_pc.create_index(
index_name=index_name,
dimension=1024,
metric="dotproduct",
spec=spec
)
index = multimodal_pc.get_index(index_name)
Using existing index: langchain-opentutorial-multimodal-1024
Step 4: Uploading Data to Pinecone
We will vectorize the dataset images using OpenCLIP and upload them to the Pinecone index.
Text Query: a running elephant
Category: elephant, Prompt: a majestic elephant walking through the savanna, golden sunlight illuminating its wrinkled skin, highly detailed, 8k resolution., Score: 0.36785552
Category: elephant, Prompt: a baby elephant exploring its surroundings, soft sunlight, highly detailed, photorealistic, adorable and realistic., Score: 0.365934
Category: elephant, Prompt: an elephant walking through a dusty savanna, soft natural lighting, highly detailed, photorealistic, natural textures., Score: 0.36491212
Category: elephant, Prompt: an elephant walking through tall grass, golden sunlight reflecting off its skin, highly detailed, natural lighting, ultra-realistic., Score: 0.35923028
Category: elephant, Prompt: an elephant spraying water with its trunk, playful expression, soft natural lighting, highly detailed, 8k resolution., Score: 0.34974286