LlamaCpp Embeddings With Langchain

Open in ColabOpen in GitHub

Overview

This tutorial covers how to perform Text Embedding using Llama-cpp and Langchain.

Llama-cpp is an open-source package implemented in C++ that allows you to use LLMs such as llama very efficiently locally.

In this tutorial, we will create a simple example to measure similarity between Documents and an input Query using Llama-cpp and Langchain.

Table of Contents

References


Environment Setup

Set up the environment. You may refer to Environment Setup for more details.

[Note]

  • langchain-opentutorial is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials.

  • You can check out the langchain-opentutorial for more details.

You can alternatively set LANGCHAIN_API_KEY in .env file and load it.

[Note] This is not necessary if you've already set LANGCHAIN_API_KEY in previous steps.

Llama-cpp Installation and Model Serving

Llama-cpp is an open-source project that makes it easy to run large language models (LLMs) locally. It allows you to download and run various LLMs on your own computer, giving you the freedom to experiment with AI models.

To install llama-cpp-python:

  1. Make sure you have the required environment for C++ compilation (e.g., on Linux or macOS).

  2. Download or specify your chosen embedding model file (e.g., CompendiumLabs/bge-large-en-v1.5-gguf).

  3. Here, we use bge-large-en-v1.5-q8_0.gguf as an example and you can download it from CompendiumLabs/bge-large-en-v1.5-gguf - Hugging Face.

  4. Check that llama-cpp-python can find the model path.

Below, we will demonstrate how to serve a LLaMA model using Llama-cpp. You can follow the official llama-cpp-python documentation for more details.

Identify Supported Embedding Models and Serving Model

You can find a variety of embedding models, which typically come in different quantizations (e.g., q4_0, q4_1, q5_0, q8_0, etc.).

1. Search models

  • You can look for models on Hugging Face or other community websites.

2. Download or Pull a Model

  • For instance, you could download from Hugging Face if the model is hosted.

3. Verify the Model

  • Check that the .bin (or .gguf) file is accessible to your environment.

Model Load and Embedding

Now that you have installed llama-cpp-python and have downloaded a model, let's see how to load it and use it for text embedding.

Below, we define a Query or some Documents to embed using Llama-cpp within LangChain.

Load the Embedding Model

Below is how you can initialize the LlamaCppEmbeddings class by specifying the path to your LLaMA model file (model_path).

For example, you might have a downloaded model path: ./bge-large-en-v1.5-q8_0.gguf.

We demonstrate how to instantiate the embeddings class and then embed queries and documents using Llama-cpp.

Embedding Queries and Documents

Now let's embed both the query and the documents. We will verify the dimension of the output vectors.

However, there is currently one issue that cannot be resolved when using the latest model with LlamaCppEmbeddings. I will post the link to the issue below, so please check it out and if it is resolved in the latest version, you can use it as instructed in the original langchain official tutorial.

  • Issue link : https://github.com/langchain-ai/langchain/issues/22532

Check custom embeddings

  • To check whether the embedding results are output as expected, I output the dimensions of each embedding vector.

The similarity calculation results

We can use the vector representations of the query and documents to calculate similarity. Here, we use the cosine similarity provided by scikit-learn.


This concludes the Llama-cpp Embeddings With Langchain tutorial in the style of the original reference notebook.

Last updated