Multimodal Embeddings With Langchain

Open in ColabOpen in GitHub

Overview

This tutorial covers how to perform Text Embedding and Image Embedding using Multimodal Embedding Model with Langchain.

The Multimodal Embedding Model is a model that can vectorize text as well as image.

In this tutorial, we will create a simple Image Similarity Searching example using Multimodal Embedding Model and Langchain.

example

Table of Contents

References


Environment Setup

Set up the environment. You may refer to Environment Setup for more details.

[Note]

  • langchain-opentutorial is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials.

  • You can checkout the langchain-opentutorial for more details.

Multimodal Embedding

Multimodal embedding is the process of creating a vector that represents an image’s features and context, making it compatible with text search in the same vector space.

Concept of Mutlimodal Embedding

Image Similarity Search is a technique that allows you to find images in a database that are similar to a given query (either an image or text describing the image) using vector-based representations.

The process involves converting images or text into embedding vectors that capture their visual or semantic features.

These vectors are then compared using similarity metrics, such as Cosine Similarity or Euclidean Distance, to find the most similar images in the database based on their vector representations.

Setting Image Data

In this tutorial, example images are provided. These images are copyright-free and cover a variety of topics (e.g., dog, cat, female, male,...) created using SDXL.

The images are located at ./data/for_embed_images.zip.

Create a list containing the image path.

Model Load and Embedding Images

In this tutorial, we use OpenCLIP, which implements OpenAI's CLIP as an open source.

OpenCLIP can be used with Langchain to easily embed Text and Image .

You can load OpenCLIP Embedding model using the Python libraries open_clip_torch and langchain-experimental.

Image Similarity Search with Text

Image Similarity Search with Text finds the image in the image dataset that most relates to a given text query.

We will use cosine similarity for calculation of similarity.

Because cosine similarity is commonly used in image similarity search.

Steps

  1. Text Query Embedding

  2. Calculate the similarity between the Text Query Embedding Vector and the Image Embedding Vector

  3. Get similar images

png

Image Similarity Search with Image

Image Similarity Search with Image finds the image in the image dataset that most relates to a given image query.

We will use cosine similarity for calculation of similarity.

Because cosine Similarity is commonly used in image similarity search.

Steps

  1. Image Query Embedding

  2. Calculate the similarity between the Image Query Embedding Vector and the Image Embedding Vector

  3. Get similar images

png
png

Last updated