VectorStore-backed Retriever
Author: Erika Park
Designer: Erika Park
Proofread : jishin86
This is a part of LangChain Open Tutorial
Overview
This tutorial provides a comprehensive guide to building and optimizing a VectorStore-backed retriever using LangChain. It covers the foundational steps of creating a vector store with FAISS(Facebook AI Similarity Search) and explores advanced retrieval strategies for improving search accuracy and efficiency.
A VectorStore-backed retriever is a document retrieval system that leverages a vector store to search for documents based on their vector representations. This approach enables efficient similarity-based search for handling unstructured data.
RAG (Retrieval-Augmented Generation) Workflow

The diagram above illustrates the document search and response generation workflow within a RAG system.
The steps include:
Document Loading: Importing raw documents.
Text Chunking: Splitting text into manageable chunks.
Vector Embedding: Converting the text into numerical vectors using an embedding model.
Store in Vector Database: Storing the generated embeddings in a vector database for efficient retrieval.
During the query phase:
Steps: User Query β Embedding β Search in VectorStore β Relevant Chunks Retrieved β LLM Generates Response
The user's query is transformed into an embedding vector using an embedding model.
This query embedding is compared against stored document vectors within the vector database to retrieve the most relevant results.
The retrieved chunks are passed to a Large Language Model (LLM), which generates a final response based on the retrieved information.
This tutorial aims to explore and optimize the VectorStore β Relevant Chunks Retrieved β LLM Generates Response stages. It will cover advanced retrieval techniques to improve the accuracy and relevance of the responses.
Table of Contents
References
Environment Setup
Set up the environment. You may refer to Environment Setup for more details.
[Note]
langchain-opentutorialis a package that provides a set of easy-to-use environment setup, useful functions, and utilities for tutorials.You can checkout out the
langchain-opentutorialfor more details.
You can alternatively set API keys such as OPENAI_API_KEY in a .env file and load them.
[Note] This is not necessary if you've already set the required API keys in previous steps.
Initializing and Using VectorStoreRetriever
This section demonstrates how to load documents using OpenAI embeddings and create a vector database using FAISS.
The example below showcases how to use OpenAI embeddings for document loading and FAISS for vector database creation.
Once the vector database is created, it can be loaded and queried using retrieval methods such as Similarity Search and Maximal Marginal Relevance (MMR) to search for relevant text within the vector store.
π Creating a Vector Store (Using FAISS)
π 1. Initializing and Using VectorStoreRetriever (as_retriever )
The as_retriever method allows you to convert a vector database into a retriever, enabling efficient document search and retrieval from the vector store.
How It Works:
The
as_retriever()method transforms a vector store (like FAISS) into a retriever object, making it compatible with LangChain's retrieval workflows.This retriever can then be directly used with RAG pipelines or combined with Large Language Models (LLMs) for building intelligent search systems.
Advanced Retriever Configuration
The as_retriever method allows you to configure advanced retrieval strategies, such as similarity search, MMR (Maximal Marginal Relevance), and similarity score threshold-based filtering.
Parameters:
**kwargs: Keyword arguments passed to the retrieval function:search_type: Specifies the search method."similarity": Returns the most relevant documents based on cosine similarity."mmr": Utilizes the Maximal Marginal Relevance algorithm, balancing relevance and diversity."similarity_score_threshold": Returns documents with a similarity score above a specified threshold.
search_kwargs: Additional search options for fine-tuning results:k: Number of documents to return (default:4).score_threshold: Minimum similarity score for the"similarity_score_threshold"search type (e.g.,0.8).fetch_k: Number of documents initially retrieved during an MMR search (default:20).lambda_mult: Controls diversity in MMR results (0= maximum diversity,1= maximum relevance, default:0.5).filter: Metadata filtering for selective document retrieval.
Return Value:
VectorStoreRetriever: An initialized retriever object that can be directly queried for document search tasks.
Notes:
Supports multiple search strategies (
similarity,MMR,similarity_score_threshold).MMR improves result diversity while preserving relevance by reducing redundancy in results.
Metadata filtering enables selective document retrieval based on document properties.
The
tagsparameter can be used to label retrievers for better organization and easier identification.
Cautions:
Diversity Control with MMR:
Adjust both
fetch_k(number of documents initially retrieved) andlambda_mult(diversity control factor) carefully for optimal balance.lambda_multLower values (< 0.5) β Prioritize diversity.
Higher values (> 0.5) β Prioritize relevance.
set
fetch_khigher thankfor effective diversity control.
Threshold Settings:
Using a high
score_threshold(e.g., 0.95) can lead to zero results.
Metadata Filtering:
Ensure the metadata structure is well-defined before applying filters.
Balanced Configuration:
Maintain a proper balance between
search_typeandsearch_kwargssettings for optimal retrieval performance.
Retriever's invoke() Method
invoke() MethodThe invoke() method is the primary entry point for interacting with a Retriever. It is used to search and retrieve relevant documents based on a given query.
How It Works :
Query Submission: A user query is provided as input.
Embedding Generation: The query is converted into a vector representation (if necessary).
Search Process: The retriever searches the vector database using the specified search strategy (similarity, MMR, etc.).
Results Return: The method returns a list of relevant document chunks.
Parameters:
input(Required):The query string provided by the user.
The query is converted into a vector and compared with stored document vectors for similarity-based retrieval.
config(Optional):Allows for fine-grained control over the retrieval process.
Can be used to specify tags, metadata insertion, and search strategies.
**kwargs(Optional):Enables direct passing of
search_kwargsfor advanced configuration.Example options include:
k: Number of documents to return.score_threshold: Minimum similarity score for a document to be included.fetch_k: Number of documents initially retrieved in MMR searches.
Return Value:
List[Document]:Returns a list of document objects containing the retrieved text and metadata.
Each document object includes:
page_content: The main content of the document.metadata: Associated metadata with the document (e.g., source, tags).
Usage Example 1: Basic Usage (Synchronous Search)
Usage Example 2: Search with Options ( search_kwargs )
Usage Example 3: Using config and **kwargs (Advanced Configuration)
Max Marginal Relevance (MMR)
The Maximal Marginal Relevance (MMR) search method is a document retrieval algorithm designed to reduce redundancy by balancing relevance and diversity when returning results.
How MMR Works: Unlike basic similarity-based searches that return the most relevant documents based solely on similarity scores, MMR considers two critical factors:
Relevance: Measures how closely the document matches the user's query.
Diversity: Ensures the retrieved documents are distinct from each other to avoid repetitive results.
Key Parameters:
search_type="mmr": Activates the MMR retrieval strategy.k: The number of documents returned after applying diversity filtering(default:4).fetch_k: Number of documents initially retrieved before applying diversity filtering (default:20).lambda_mult: Diversity control factor (0 = max diversity,1 = max relevance, default:0.5).
Similarity Score Threshold Search
Similarity Score Threshold Search is a retrieval method where only documents exceeding a predefined similarity score are returned. This approach helps filter out low-relevance results, ensuring that the returned documents are highly relevant to the query.
Key Features:
Relevance Filtering: Returns only documents with a similarity score above the specified threshold.
Configurable Precision: The threshold is adjustable using the
score_thresholdparameter.Search Type Activation: Enabled by setting
search_type="similarity_score_threshold".
This search method is ideal for tasks requiring highly precise results, such as fact-checking or answering technical queries.
Configuring top_k (Adjusting the Number of Returned Documents)
top_k (Adjusting the Number of Returned Documents)The parameter
kspecifies the number of documents returned during a vector search. It determines how many of the top-ranked documents (based on similarity score) will be retrieved from the vector database.The number of documents retrieved can be adjusted by setting the
kvalue within thesearch_kwargs.For example, setting
k=1will return only the top 1 most relevant document based on similarity.
Dynamic Configuration (Using ConfigurableField )
ConfigurableField )The ConfigurableField feature in LangChain allows for dynamic adjustment of search configurations, providing flexibility during query execution.
Key Features:
Runtime Search Configuration: Adjust search settings without modifying the core retriever setup.
Enhanced Traceability: Assign unique identifiers, names, and descriptions to each parameter for improved readability and debugging.
Flexible Control with
config: Search configurations can be passed dynamically using theconfigparameter as a dictionary.
Use Cases:
Switching Search Strategies: Dynamically adjust the search type (e.g.,
"similarity","mmr").Real-Time Parameter Adjustments: Modify search parameters like
k,score_threshold, andfetch_kduring query execution.Experimentation: Easily test different search strategies and parameter combinations without rewriting code.
The following examples demonstrate how to apply dynamic search settings using ConfigurableField in LangChain.
Using Separate Query & Passage Embedding Models
By default, a retriever uses the same embedding model for both queries and documents. However, certain scenarios can benefit from using different models tailored to the specific needs of queries and documents.
Why Use Separate Embedding Models?
Using different models for queries and documents can improve retrieval accuracy and search relevance by optimizing each model for its intended purpose:
Query Embedding Model: Fine-tuned for understanding short and concise search queries.
Document (Passage) Embedding Model: Optimized for longer text spans with richer context.
For instance, Upstage Embeddings provides the capability to use distinct models for:
Query Embeddings (
solar-embedding-1-large-query)Document (Passage) Embeddings (
solar-embedding-1-large-passage)
In such cases, the query is embedded using the query embedding model, while the documents are embedded using the document embedding model.
β How to Issue an Upstage API Key
Sign Up & Log In:
Visit Upstage and log in (sign up if you don't have an account).
Open API Key Page:
Go to the menu bar, select "Dashboards", then navigate to "API Keys".
Generate API Key:
Click "Create new key" β Enter name your key (e.g.,
LangChain-Tutorial)
Copy & Store Safely:
Copy the generated key and keep it secure.

The following example demonstrates the process of generating an Upstage embedding for a query, converting the query sentence into a vector, and conducting a vector similarity search.
Last updated