Self-querying

Open in ColabOpen in GitHub

Overview

SelfQueryRetriever is a retriever equipped with the capability to generate and resolve queries autonomously.

SelfQueryRetriever converts the natural language input provided by the user into a structured query using a query-constructing LLM chain . This structured query is then used to retrieve documents from the vector store .

Through this process, SelfQueryRetriever goes beyond merely comparing the user's input query with the content of stored documents semantically, and extracts filters on the metadata from the user's query and executes those filters to retrieve relevant documents.

The list of self-querying retrievers supported by LangChain can be found here.

Table of Contents

References


Environment Setup

Set up the environment. You may refer to Environment Setup for more details.

[Note]

  • langchain-opentutorial is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials.

  • You can checkout the langchain-opentutorial for more details.

Sample Data

Let's build a vector store that enables similarity search based on the descriptions and metadata of some cosmetic products.

SelfQueryRetriever

To instantiate the retriever , you need to define metadata fields and a brief description of the document contents in advance using the AttributeInfo class.

In this example, the metadata for cosmetic products is defined as follows:

  • category : String type, represents the category of the cosmetic product and takes one of the following values: ['Skincare', 'Makeup', 'Cleansers', 'Sunscreen'].

  • year : Integer type, represents the year the cosmetic product was released.

  • user_rating : Float type, represents the user rating in the range of 1 to 5.

Create retriever object with SelfQueryRetriever.from_llm method.

  • llm: Large language model

  • vectorstore: Vector store

  • document_contents: Description of the contents of the documents

  • metadata_field_info: Metadata field information

Now, let's test this SelfQueryRetriever with some example queries.

SelfQueryRetriever can also be used to retrieve items with two or more conditions.

You can also specify the number of documents to retrieve using the argument k when using SelfQueryRetriever .

This can be done by passing enable_limit=True to the constructor.

There are 3 products released in 2023, but by setting the value of k to 2, only 2 products are retrieved.

However, you can also limit the number of search results by directly specifying the number of search results in the query without explicitly specifying search_kwargs in the code.

Query Constructor Chain

To see what happens internally and to have more custom control, we can construct a retriever from scratch.

First, we need to create a query_constructor chain that generates structured queries. Here, we use the get_query_constructor_prompt function to retrieve the prompt that helps constructing queries.

To check the content of the prompt, use the prompt.format method to pass the string "dummy question" to the query parameter and print the result.

Call the query_constructor.invoke method to process the given query.

The core component of a SelfQueryRetriever is the query constructor . To build an effective retrieval system, it is essential to ensure that the query constructor is well defined.

To achieve this, you need to adjust the prompt, examples within the prompt, and attribute descriptions .

Structured Query Translator

You can also create a structured query using the structured query translator .

Structured query translator converts a query into metadata filters compatible with the syntax of the vector store with StructuredQuery object.

Use the retriever.invoke method to generate an answer for the given question.

Last updated