Exploring RAG in LangChain

Open in ColabOpen in GitHubrag-1.png

rag-2.png

OverView

This tutorial explores the entire process of indexing, retrieval, and generation using LangChain's RAG framework. It provides a broad overview of a typical RAG application pipeline and demonstrates how to effectively retrieve and generate responses by using LangChain's key features, such as data loaders, vector databases, embedding, retrievers, and generators, structured in a modular design.

1. Question Processing

The question processing stage involves receiving a user's question, handling it, and finding relevant data. The following components are required for this process:

  • Data Source Connection To find answers to the question, it is necessary to connect to various text data sources. LangChain helps you easily establish connections to various data sources.

  • Data Indexing and Retrieval To efficiently find relevant information from data sources, the data must be indexed. LangChain automates the indexing process and provides tools to retrieve data related to the user's question.

2. Answer Generation

Once the relevant data is found, the next step is to generate an answer based on it. The following components are essential for this stage:

  • Answer Generation Model LangChain uses advanced natural language processing (NLP) models to generate answers from the retrieved data. These models take the user's question and the retrieved data as input and generate an appropriate answer.

Architecture

This Tutorial will build a typical RAG application as outlined in the Q&A Introduction. This consists of two main components:

  • Indexing : A pipeline that collects data from the source and indexes it. This process typically occurs offline.

  • Retrieval and Generation : The actual RAG chain processes user queries in real-time, retrieves relevant data from the index, and passes it to the model.

The entire workflow from raw data to generating an answer is as follows:

Indexing

  • Indexing Image Source: https://python.langchain.com/docs/tutorials/rag/

  1. Load : The first step is to load the data. For this, we will use Document Loaders.

  2. Split : Text splitters divide large Documents into smaller chunks. This is useful for indexing data and passing it to the model, as large chunks can be difficult to retrieve and may not fit within the model's limited context window.

  3. Store : The split data needs to be stored and indexed in a location for future retrieval. This is often accomplished using VectorStore and Embeddings Models.

Retrieval and Generation

  • Retrieval and Generation Image Source: https://python.langchain.com/docs/tutorials/rag/

  1. Retrieval : When user input is provided, Retriever is used to retrieve relevant chunks from the data store.

  2. Generation : ChatModel / LLM enerates an answer using a prompt that includes the question and the retrieved data.

Document Used for Practice

A European Approach to Artificial Intelligence - A Policy Perspective

  • Author: Digital Enlightenment Forum under the guidance of EIT Digital, supported by contributions from EIT Manufacturing, EIT Urban Mobility, EIT Health, and EIT Climate-KIC

  • Link : https://eit.europa.eu/news-events/news/european-approach-artificial-intelligence-policy-perspective

  • File Name: A European Approach to Artificial Intelligence - A Policy Perspective.pdf

Please copy the downloaded file into the data folder for practice.

Table of Contents

References


Environment Setup

Set up the environment. You may refer to Environment Setup for more details.

[Note]

  • langchain-opentutorial is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials.

  • You can checkout the langchain-opentutorial for more details.

Environment variables have been set successfully. You can alternatively set API keys, such as OPENAI_API_KEY in a .env file and load them.

[Note] This is not necessary if you've already set the required API keys in previous steps.

Explore Each Module

The following are the modules used in this content.

Below is an example of using a basic RAG model for handling web pages (WebBaseLoader) .

In each step, you can configure various options or apply new techniques.

If a warning is displayed due to the USER_AGENT not being set when using the WebBaseLoader,

please add USER_AGENT = myagent to the .env file.

Step 1: Load Document

Web Page

WebBaseLoader uses bs4.SoupStrainer to parse only the necessary parts from a specified web page.

[Note]

  • bs4.SoupStrainer makes it convenient to extract desired elements from the web

(example)

Here is another example, a BBC news article. Try running it!

PDF

The following section covers the document loader for importing PDF files.

CSV

The following section covers the document loader for importing CSV files.

CSV retrieves data using row numbers instead of page numbers.

TXT

The following section covers the document loader for importing TXT files.

Load all files in the folder

Here is an example of loading all .txt files in the folder.

The following is an example of loading all .pdf files in the folder.

Python

The following is an example of loading .py files.


Step 2: Split Documents

It splits the document into small chunks.

CharacterTextSplitter

This is the simplest method. It splits the text based on characters (default: "\n\n") and measures the chunk size by the number of characters.

  1. How the text is split : By single character units.

  2. How the chunk size is measured : By the len of characters.

Visualization example: https://chunkviz.up.railway.app/

The CharacterTextSplitter class provides functionality to split text into chunks of a specified size.

  • separator parameter specifies the string used to separate chunks, with two newline characters ("\n\n") being used in this case.

  • chunk_sizedetermines the maximum length of each chunk.

  • chunk_overlapspecifies the number of overlapping characters between adjacent chunks.

  • length_functiondefines the function used to calculate the length of a chunk, with the default being the len function, which returns the length of the string.

  • is_separator_regexis a boolean value that determines whether the separator is interpreted as a regular expression.

This function uses the create_documents method of the text_splitter object to split the given text (state_of_the_union) into multiple documents, storing the results in the texts variable. It then outputs the first document from texts. This process can be seen as an initial step for processing and analyzing text data, particularly useful for splitting large text data into manageable chunks.

RecursiveTextSplitter

This text splitter is recommended for general text.

  1. How the text is split : Based on a list of separators.

  2. How the chunk size is measured : By the len of characters.

The RecursiveCharacterTextSplitter class provides functionality to recursively split text. This class takes parameters such as chunk_size to specify the size of the chunks to be split, chunk_overlap to define the overlap size between adjacent chunks, length_function to calculate the length of the chunks, and is_separator_regex to indicate whether the separator is a regular expression.

In the example, the chunk size is set to 100, the overlap size to 20, the length calculation function to len , and is_separator_regex is set to False to indicate that the separator is not a regular expression.

  • Attempts to split the given document sequentially using the specified list of separators.

  • Attempts splitting in order until the chunks are sufficiently small. The default list is ["\n\n", "\n", " ", ""].

  • This generally has the effect of keeping all paragraphs (as well as sentences and words) as long as possible, while appearing to be the most semantically relevant pieces of text.

Semantic Similarity

Text is split based on semantic similarity.

Source: SemanticChunker

At a high level, the process involves splitting the text into sentences, grouping them into sets of three, and then merging similar sentences in the embedding space.

Step 3: Embedding

It uses OpenAI's embedding model, which is a paid service.

Below is a list of Embedding models supported by OpenAI :

The default model is text-embedding-ada-002 .

MODEL
ROUGH PAGES PER DOLLAR
EXAMPLE PERFORMANCE ON MTEB EVAL

text-embedding-3-small

62,500

62.3%

text-embedding-3-large

9,615

64.6%

text-embedding-ada-002

12,500

61.0%

Free Open Source-Based Embeddings

  1. HuggingFaceEmbeddings (Default model: sentence-transformers/all-mpnet-base-v2)

  2. FastEmbedEmbeddings

Note

  • When using embeddings, make sure to verify that the language you are using is supported.

Step 4: Create Vectorstore

Create Vectorstore refers to the process of generating vector embeddings from documents and storing them in a database.

Step 5: Create Retriever

A Retriever is an interface that returns documents when given an unstructured query.

The Retriever does not need to store documents; it only returns (or retrieves) them.

The Retriever is created by using the invoke() method on the generated VectorStore.

Similarity Retrieval

  • The default setting is similarity , which uses cosine similarity.

The similarity_score_threshold returns only the results with a score_threshold or higher in similarity-based retrieval.

Search using the maximum marginal search result(mmr) .

Create a variety of queries

With MultiQueryRetriever, you can generate similar questions with equivalent meanings based on the original query. This helps diversify question expressions, which can enhance search performance.

Ensemble Retriever

BM25 Retriever + Embedding-based Retriever

  • BM25 retriever (Keyword Search, Sparse Retriever): Based on TF-IDF, considering term frequency and document length normalization.

  • Embedding-based retriever (Contextual Search, Dense Retriever): Transforms text into embedding vectors and retrieves documents based on vector similarity (e.g. cosine similarity, dot product). This reflects the semantic similarity of words.

  • Ensemble retriever : Combines BM25 and embedding-based retrievers to combine the term frequency of keyword searches with the semantic similarity of contextual searches.

Note

TF-IDF(Term Frequency - Inverse Document Frequency) : TF-IDF evaluates words that frequently appear in a specific document as highly important, while words that frequently appear across all documents are considered less important.

Step 6: Create Prompt

Prompt engineering plays a crucial role in deriving the desired outputs based on the given data( context ) .

[TIP1]

  1. If important information is missing from the results provided by the retriever , you should modify the retriever logic.

  2. If the results from the retriever contain sufficient information, but the llm fails to extract the key information or doesn't produce the output in the desired format, you should adjust the prompt.

[TIP2]

  1. LangSmith's hub contains numerous verified prompts.

  2. Utilizing or slightly modifying these verified prompts can save both cost and time.

  • https://smith.langchain.com/hub/search?q=rag

Step 7: Create LLM

Select one of the OpenAI models:

  • gpt-4o : OpenAI GPT-4o model

  • gpt-4o-mini : OpenAI GPT-4o-mini model

For detailed pricing information, please refer to the OpenAI API Model List / Pricing

You can check token usage in the following way.

Use Huggingface

You need a Hugging Face token to access LLMs on HuggingFace.

You can easily download and use open-source models available on HuggingFace.

You can also check the open-source leaderboard, which improves performance daily, at the link below:

Note

Hugging Face's free API has a 10GB size limit. For example, the microsoft/Phi-3-mini-4k-instruct model is 11GB, making it inaccessible via the free API.

Choose one of the options below:

  1. Option: Use Hugging Face Inference Endpoints

Activate Inference Endpoints through a paid plan to perform large-scale model inference.

  1. Option: Run the model locally

Use the transformers library to run the microsoft/Phi-3-mini-4k-instruct model in a local environment (GPU recommended).

  1. Option: Use a smaller model.

Reduce the model size to one supported by the free API and execute it.

RAG Template Experiment

This template is a structure for implementing a Retrieval-Augmented Generation (RAG) workflow.

Document: A European Approach to Artificial Intelligence - A Policy Perspective.pdf

  • LangSmith: https://smith.langchain.com/public/0951c102-de61-482b-b42a-6e7d78f02107/r

Document: A European Approach to Artificial Intelligence - A Policy Perspective.pdf

  • LangSmith: https://smith.langchain.com/public/c968bf7e-e22e-4eb1-a76a-b226eedc6c51/r

Ask a question unrelated to the document.

  • LangSmith: https://smith.langchain.com/public/d8a49d52-3a63-4206-9166-58605bd990a6/r

Last updated