Self-querying
Author: Hye-yoon Jeong
Peer Review:
Proofread : Juni Lee
This is a part of LangChain Open Tutorial
Overview
SelfQueryRetriever is a retriever equipped with the capability to generate and resolve queries autonomously.
SelfQueryRetriever converts the natural language input provided by the user into a structured query using a query-constructing LLM chain . This structured query is then used to retrieve documents from the vector store .
Through this process, SelfQueryRetriever goes beyond merely comparing the user's input query with the content of stored documents semantically, and extracts filters on the metadata from the user's query and executes those filters to retrieve relevant documents.
The list of self-querying retrievers supported by LangChain can be found here.
Table of Contents
References
Environment Setup
Set up the environment. You may refer to Environment Setup for more details.
[Note]
langchain-opentutorialis a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials.You can checkout the
langchain-opentutorialfor more details.
Sample Data
Let's build a vector store that enables similarity search based on the descriptions and metadata of some cosmetic products.
SelfQueryRetriever
To instantiate the retriever , you need to define metadata fields and a brief description of the document contents in advance using the AttributeInfo class.
In this example, the metadata for cosmetic products is defined as follows:
category: String type, represents the category of the cosmetic product and takes one of the following values: ['Skincare', 'Makeup', 'Cleansers', 'Sunscreen'].year: Integer type, represents the year the cosmetic product was released.user_rating: Float type, represents the user rating in the range of 1 to 5.
Create retriever object with SelfQueryRetriever.from_llm method.
llm: Large language modelvectorstore: Vector storedocument_contents: Description of the contents of the documentsmetadata_field_info: Metadata field information
Now, let's test this SelfQueryRetriever with some example queries.
SelfQueryRetriever can also be used to retrieve items with two or more conditions.
You can also specify the number of documents to retrieve using the argument k when using SelfQueryRetriever .
This can be done by passing enable_limit=True to the constructor.
There are 3 products released in 2023, but by setting the value of k to 2, only 2 products are retrieved.
However, you can also limit the number of search results by directly specifying the number of search results in the query without explicitly specifying search_kwargs in the code.
Query Constructor Chain
To see what happens internally and to have more custom control, we can construct a retriever from scratch.
First, we need to create a query_constructor chain that generates structured queries. Here, we use the get_query_constructor_prompt function to retrieve the prompt that helps constructing queries.
To check the content of the prompt, use the prompt.format method to pass the string "dummy question" to the query parameter and print the result.
Call the query_constructor.invoke method to process the given query.
The core component of a SelfQueryRetriever is the query constructor . To build an effective retrieval system, it is essential to ensure that the query constructor is well defined.
To achieve this, you need to adjust the prompt, examples within the prompt, and attribute descriptions .
Structured Query Translator
You can also create a structured query using the structured query translator .
Structured query translator converts a query into metadata filters compatible with the syntax of the vector store with StructuredQuery object.
Use the retriever.invoke method to generate an answer for the given question.
Last updated