Exploring RAG in LangChain
Author: Jaeho Kim
Peer Review:
Proofread : BokyungisaGod
This is a part of LangChain Open Tutorial

OverView
This tutorial explores the entire process of indexing, retrieval, and generation using LangChain's RAG framework. It provides a broad overview of a typical RAG application pipeline and demonstrates how to effectively retrieve and generate responses by using LangChain's key features, such as data loaders, vector databases, embedding, retrievers, and generators, structured in a modular design.
1. Question Processing
The question processing stage involves receiving a user's question, handling it, and finding relevant data. The following components are required for this process:
Data Source Connection To find answers to the question, it is necessary to connect to various text data sources. LangChain helps you easily establish connections to various data sources.
Data Indexing and Retrieval To efficiently find relevant information from data sources, the data must be indexed. LangChain automates the indexing process and provides tools to retrieve data related to the user's question.
2. Answer Generation
Once the relevant data is found, the next step is to generate an answer based on it. The following components are essential for this stage:
Answer Generation Model LangChain uses advanced natural language processing (NLP) models to generate answers from the retrieved data. These models take the user's question and the retrieved data as input and generate an appropriate answer.
Architecture
This Tutorial will build a typical RAG application as outlined in the Q&A Introduction. This consists of two main components:
Indexing : A pipeline that collects data from the source and indexes it. This process typically occurs offline.
Retrieval and Generation : The actual RAG chain processes user queries in real-time, retrieves relevant data from the index, and passes it to the model.
The entire workflow from raw data to generating an answer is as follows:
Indexing

Indexing Image Source: https://python.langchain.com/docs/tutorials/rag/
Load : The first step is to load the data. For this, we will use Document Loaders.
Split : Text splitters divide large
Documentsinto smaller chunks. This is useful for indexing data and passing it to the model, as large chunks can be difficult to retrieve and may not fit within the model's limited context window.Store : The split data needs to be stored and indexed in a location for future retrieval. This is often accomplished using VectorStore and Embeddings Models.
Retrieval and Generation

Retrieval and Generation Image Source: https://python.langchain.com/docs/tutorials/rag/
Retrieval : When user input is provided, Retriever is used to retrieve relevant chunks from the data store.
Document Used for Practice
A European Approach to Artificial Intelligence - A Policy Perspective
Author: Digital Enlightenment Forum under the guidance of EIT Digital, supported by contributions from EIT Manufacturing, EIT Urban Mobility, EIT Health, and EIT Climate-KIC
Link : https://eit.europa.eu/news-events/news/european-approach-artificial-intelligence-policy-perspective
File Name: A European Approach to Artificial Intelligence - A Policy Perspective.pdf
Please copy the downloaded file into the data folder for practice.
Table of Contents
References
Environment Setup
Set up the environment. You may refer to Environment Setup for more details.
[Note]
langchain-opentutorialis a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials.You can checkout the
langchain-opentutorialfor more details.
Environment variables have been set successfully.
You can alternatively set API keys, such as OPENAI_API_KEY in a .env file and load them.
[Note] This is not necessary if you've already set the required API keys in previous steps.
Explore Each Module
The following are the modules used in this content.
Below is an example of using a basic RAG model for handling web pages (WebBaseLoader) .
In each step, you can configure various options or apply new techniques.
If a warning is displayed due to the USER_AGENT not being set when using the WebBaseLoader,
please add USER_AGENT = myagent to the .env file.
Step 1: Load Document
Web Page
WebBaseLoader uses bs4.SoupStrainer to parse only the necessary parts from a specified web page.
[Note]
bs4.SoupStrainermakes it convenient to extract desired elements from the web
(example)
Here is another example, a BBC news article. Try running it!
PDF
The following section covers the document loader for importing PDF files.
CSV
The following section covers the document loader for importing CSV files.
CSV retrieves data using row numbers instead of page numbers.
TXT
The following section covers the document loader for importing TXT files.
Load all files in the folder
Here is an example of loading all .txt files in the folder.
The following is an example of loading all .pdf files in the folder.
Python
The following is an example of loading .py files.
Step 2: Split Documents
It splits the document into small chunks.
CharacterTextSplitter
This is the simplest method. It splits the text based on characters (default: "\n\n") and measures the chunk size by the number of characters.
How the text is split : By single character units.
How the chunk size is measured : By the
lenof characters.
Visualization example: https://chunkviz.up.railway.app/
The CharacterTextSplitter class provides functionality to split text into chunks of a specified size.
separatorparameter specifies the string used to separate chunks, with two newline characters ("\n\n") being used in this case.chunk_sizedetermines the maximum length of each chunk.chunk_overlapspecifies the number of overlapping characters between adjacent chunks.length_functiondefines the function used to calculate the length of a chunk, with the default being thelenfunction, which returns the length of the string.is_separator_regexis a boolean value that determines whether theseparatoris interpreted as a regular expression.
This function uses the create_documents method of the text_splitter object to split the given text (state_of_the_union) into multiple documents, storing the results in the texts variable. It then outputs the first document from texts. This process can be seen as an initial step for processing and analyzing text data, particularly useful for splitting large text data into manageable chunks.
RecursiveTextSplitter
This text splitter is recommended for general text.
How the text is split: Based on a list of separators.How the chunk size is measured: By the len of characters.
The RecursiveCharacterTextSplitter class provides functionality to recursively split text. This class takes parameters such as chunk_size to specify the size of the chunks to be split, chunk_overlap to define the overlap size between adjacent chunks, length_function to calculate the length of the chunks, and is_separator_regex to indicate whether the separator is a regular expression.
In the example, the chunk size is set to 100, the overlap size to 20, the length calculation function to len , and is_separator_regex is set to False to indicate that the separator is not a regular expression.
Attempts to split the given document sequentially using the specified list of separators.
Attempts splitting in order until the chunks are sufficiently small. The default list is ["\n\n", "\n", " ", ""].
This generally has the effect of keeping all paragraphs (as well as sentences and words) as long as possible, while appearing to be the most semantically relevant pieces of text.
Semantic Similarity
Text is split based on semantic similarity.
Source: SemanticChunker
At a high level, the process involves splitting the text into sentences, grouping them into sets of three, and then merging similar sentences in the embedding space.
Step 3: Embedding
Paid Embeddings (OpenAI)
It uses OpenAI's embedding model, which is a paid service.
Below is a list of Embedding models supported by OpenAI :
The default model is text-embedding-ada-002 .
text-embedding-3-small
62,500
62.3%
text-embedding-3-large
9,615
64.6%
text-embedding-ada-002
12,500
61.0%
Free Open Source-Based Embeddings
HuggingFaceEmbeddings (Default model: sentence-transformers/all-mpnet-base-v2)
FastEmbedEmbeddings
Note
When using embeddings, make sure to verify that the language you are using is supported.
Step 4: Create Vectorstore
Create Vectorstore refers to the process of generating vector embeddings from documents and storing them in a database.
Step 5: Create Retriever
A Retriever is an interface that returns documents when given an unstructured query.
The Retriever does not need to store documents; it only returns (or retrieves) them.
The Retriever is created by using the invoke() method on the generated VectorStore.
Similarity Retrieval
The default setting is
similarity, which uses cosine similarity.
The similarity_score_threshold returns only the results with a score_threshold or higher in similarity-based retrieval.
Search using the maximum marginal search result(mmr) .
Create a variety of queries
With MultiQueryRetriever, you can generate similar questions with equivalent meanings based on the original query. This helps diversify question expressions, which can enhance search performance.
Ensemble Retriever
BM25 Retriever + Embedding-based Retriever
BM25 retriever(Keyword Search, Sparse Retriever): Based on TF-IDF, considering term frequency and document length normalization.Embedding-based retriever(Contextual Search, Dense Retriever): Transforms text into embedding vectors and retrieves documents based on vector similarity (e.g. cosine similarity, dot product). This reflects the semantic similarity of words.Ensemble retriever: Combines BM25 and embedding-based retrievers to combine the term frequency of keyword searches with the semantic similarity of contextual searches.
Note
TF-IDF(Term Frequency - Inverse Document Frequency) : TF-IDF evaluates words that frequently appear in a specific document as highly important, while words that frequently appear across all documents are considered less important.
Step 6: Create Prompt
Prompt engineering plays a crucial role in deriving the desired outputs based on the given data( context ) .
[TIP1]
If important information is missing from the results provided by the
retriever, you should modify theretrieverlogic.If the results from the
retrievercontain sufficient information, but the llm fails to extract the key information or doesn't produce the output in the desired format, you should adjust the prompt.
[TIP2]
LangSmith's hub contains numerous verified prompts.
Utilizing or slightly modifying these verified prompts can save both cost and time.
https://smith.langchain.com/hub/search?q=rag
Step 7: Create LLM
Select one of the OpenAI models:
gpt-4o: OpenAI GPT-4o modelgpt-4o-mini: OpenAI GPT-4o-mini model
For detailed pricing information, please refer to the OpenAI API Model List / Pricing
You can check token usage in the following way.
Use Huggingface
You need a Hugging Face token to access LLMs on HuggingFace.
You can easily download and use open-source models available on HuggingFace.
You can also check the open-source leaderboard, which improves performance daily, at the link below:
Note
Hugging Face's free API has a 10GB size limit. For example, the microsoft/Phi-3-mini-4k-instruct model is 11GB, making it inaccessible via the free API.
Choose one of the options below:
Option: Use Hugging Face Inference Endpoints
Activate Inference Endpoints through a paid plan to perform large-scale model inference.
Option: Run the model locally
Use the transformers library to run the microsoft/Phi-3-mini-4k-instruct model in a local environment (GPU recommended).
Option: Use a smaller model.
Reduce the model size to one supported by the free API and execute it.
RAG Template Experiment
This template is a structure for implementing a Retrieval-Augmented Generation (RAG) workflow.
Document: A European Approach to Artificial Intelligence - A Policy Perspective.pdf
LangSmith: https://smith.langchain.com/public/0951c102-de61-482b-b42a-6e7d78f02107/r
Document: A European Approach to Artificial Intelligence - A Policy Perspective.pdf
LangSmith: https://smith.langchain.com/public/c968bf7e-e22e-4eb1-a76a-b226eedc6c51/r
Ask a question unrelated to the document.
LangSmith: https://smith.langchain.com/public/d8a49d52-3a63-4206-9166-58605bd990a6/r
Last updated
