CacheBackedEmbeddings

Open in ColabOpen in GitHub

Overview

Embeddings can be stored or temporarily cached to avoid recalculation.

Caching embeddings can be done using CacheBackedEmbeddings. A cache-backed embedder is a wrapper around an embedder that caches embeddings in a key-value store. The text is hashed, and the hash is used as a key in the cache.

Table of Contents

References


Environment-setup

Set up the environment. You may refer to Environment Setup for more details.

[Note]

  • langchain-opentutorial is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials.

  • You can checkout the langchain-opentutorial for more details.

Configuration file for managing API keys as environment variables.

Check and create the ./cache/ directory for persistent storage.

Using Embeddings with LocalFileStore (Persistent Storage)

The primary supported method for initializing CacheBackedEmbeddings is from_bytes_store.

It accepts the following parameters:

  • underlying_embeddings: The embedder is used for generating embeddings.

  • document_embedding_cache: One of the ByteStore implementations for caching document embeddings.

  • namespace: (Optional, default is "") A namespace is used for the document cache. This is utilized to avoid conflicts with other caches. For example, set it to the name of the embedding model being used.

Note: It is important to set the namespace parameter to avoid conflicts when the same text is embedded using different embedding models.

First, let's look at an example of storing embeddings using the local file system and retrieving them with the FAISS vector store.

The cache is empty prior to embedding

Load the document, split it into chunks, embed each chunk and load it into the vector store.

Create FAISS database from documents.

If we try to create the vector store again, it'll be much faster since it does not need to re-compute any embeddings.

Here are some of the embeddings that got created.

Using InMemoryByteStore (Non-Persistent)

To use a different ByteStore, simply specify the desired ByteStore when creating the CacheBackedEmbeddings.

Below is an example of creating the same cached embedding object using the non-persistent InMemoryByteStore.

Last updated