Ensemble Retriever
Author: 3dkids
Proofread : jishin86
This is a part of LangChain Open Tutorial
Overview
This notebook explores the creation and use of an EnsembleRetriever in LangChain to improve information retrieval by combining multiple retrieval methods. The EnsembleRetriever integrates the strengths of sparse and dense retrieval algorithms, using weights and runtime configurations for tailored performance.
Key Features
integrate multiple searchers: take different types of searchers as input and combine results.
result re-ranking: uses the Reciprocal Rank Fusion algorithm to re-rank results.
hybrid search: mainly uses a combination of
sparse retriever(e.g. BM25) anddense retriever(e.g. embedding similarity).
Advantages
Sparse retriever: effective for keyword-based searches
Dense retriever: effective for semantic similarity-based searches
Due to these complementary characteristics, EnsembleRetriever can provide improved performance in a variety of search scenarios.
For more information, please refer to the LangChain official documentation
Table of Contents
References
Environment Setup
Set up the environment. You may refer to Environment Setup for more details.
[Note]
langchain-opentutorialis a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials.You can checkout the
langchain-opentutorialfor more details.
Creating and Configuring Ensemble Retrievers
Initializing an ensemble retriever Ensemble retrievers combine two discovery mechanisms
Sparse search: Uses BM25Retriever for keyword-based matching.
Dense search: Uses FAISS with OpenAI embedding for semantic similarity.
Initialize
EnsembleRetrieverto combine theBM25RetrieverandFAISSsearchers. Set the weights for each searcher.
Query Execution
Perform retrieval for a given query using ensemble_retriever and compare results across retrievers.
Call the
get_relevant_documents()method of theensemble_retrieverobject to retrieve relevant documents.
Change runtime config
You can also change the properties of a retriever at runtime. This is possible using the ConfigurableField class.
Define the
weightsparameter as aConfigurableFieldobject.Set the field's ID to “ensemble_weights”.
Specify the search settings via the
configparameter when searching.Set the weight of the
ensemble_weightsoption to [1, 0] so that all search results are weighted more heavily toward BM25 retriever.
This time, we want all search results to be weighted more heavily in favor of the FAISS retriever.
Last updated