Long Context Reorder

Author: Minji
Peer Review:
Proofread : jishin86
This is a part of LangChain OpenTutorial

Overview

Regardless of the model's architecture, performance significantly degrades when including more than 10 retrieved documents.

Simply put, when the model needs to access relevant information in the middle of a long context, it tends to ignore the provided documents.

For more details, please refer to the following paper:

https://arxiv.org/abs/2307.03172

To avoid this issue, you can prevent performance degradation by reordering documents after retrieval.

Create a retriever that can store and search text data using the Chroma vector store. Use the retriever's invoke method to search for highly relevant documents for a given query.

Environment Setup

Set up the environment. You may refer to Environment Setup for more details.

[Note]

langchain-opentutorial is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials.
You can checkout the langchain-opentutorial for more details.

%%capture --no-stderr
!pip install langchain-opentutorial

# Configuration file for managing API keys as environment variables
from dotenv import load_dotenv

# Load API key information
load_dotenv(override=True)

True


from langchain_opentutorial import package

package.install(
    [
       "langsmith",
        "langchain",
        "langchain_openai",
        "langchain_community",
        "langchain-chroma",
    ],
    verbose=False,
    upgrade=False,
)

from langchain_opentutorial import set_env

set_env(
    {
        # "OPENAI_API_KEY": "",
        # "LANGCHAIN_API_KEY": "",
        "LANGCHAIN_TRACING_V2": "true",
        "LANGCHAIN_ENDPOINT": "https://api.smith.langchain.com",
        "LANGCHAIN_PROJECT": "04-LongContextReorder",
    }
)

Environment variables have been set successfully.

Create an instance of the LongContextReorder class named reordering.

Enter a query for the retriever to perform the search.

from langchain_core.prompts import PromptTemplate
from langchain_community.document_transformers import LongContextReorder
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings

# Get embeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

texts = [
    "This is just a random text I wrote.",
    "ChatGPT, an AI designed to converse with users, can answer various questions.",
    "iPhone, iPad, MacBook are representative products released by Apple.",
    "ChatGPT was developed by OpenAI and is continuously being improved.",
    "ChatGPT has learned from vast amounts of data to understand user questions and generate appropriate answers.",
    "Wearable devices like Apple Watch and AirPods are also part of Apple's popular product line.",
    "ChatGPT can be used to solve complex problems or suggest creative ideas.",
    "Bitcoin is also called digital gold and is gaining popularity as a store of value.",
    "ChatGPT's capabilities are continuously evolving through ongoing learning and updates.",
    "The FIFA World Cup is held every four years and is the biggest event in international football.",
]



# Create a retriever (Set K to 10)
retriever = Chroma.from_texts(texts, embedding=embeddings).as_retriever(
    search_kwargs={"k": 10}
)

query = "What can you tell me about ChatGPT?"

# Retrieves relevant documents sorted by relevance score.
docs = retriever.invoke(query)
docs

[Document(metadata={}, page_content='ChatGPT was developed by OpenAI and is continuously being improved.'),
     Document(metadata={}, page_content='ChatGPT was developed by OpenAI and is continuously being improved.'),
     Document(metadata={}, page_content='ChatGPT was developed by OpenAI and is continuously being improved.'),
     Document(metadata={}, page_content='ChatGPT was developed by OpenAI and is continuously being improved.'),
     Document(metadata={}, page_content='ChatGPT was developed by OpenAI and is continuously being improved.'),
     Document(metadata={}, page_content='ChatGPT, an AI designed to converse with users, can answer various questions.'),
     Document(metadata={}, page_content='ChatGPT, an AI designed to converse with users, can answer various questions.'),
     Document(metadata={}, page_content='ChatGPT, an AI designed to converse with users, can answer various questions.'),
     Document(metadata={}, page_content='ChatGPT, an AI designed to converse with users, can answer various questions.'),
     Document(metadata={}, page_content='ChatGPT, an AI designed to converse with users, can answer various questions.')]

Create an instance of LongContextReorder class.

Call reordering.transform_documents(docs) to reorder the document list.
Less relevant documents are positioned in the middle of the list, while more relevant documents are positioned at the beginning and end.

# Reorder the documents
# Less relevant documents are positioned in the middle, more relevant elements at start/end
reordering = LongContextReorder()
reordered_docs = reordering.transform_documents(docs)

# Verify that 4 relevant documents are positioned at start and end
reordered_docs

[Document(metadata={}, page_content='ChatGPT was developed by OpenAI and is continuously being improved.'),
     Document(metadata={}, page_content='ChatGPT was developed by OpenAI and is continuously being improved.'),
     Document(metadata={}, page_content='ChatGPT, an AI designed to converse with users, can answer various questions.'),
     Document(metadata={}, page_content='ChatGPT, an AI designed to converse with users, can answer various questions.'),
     Document(metadata={}, page_content='ChatGPT, an AI designed to converse with users, can answer various questions.'),
     Document(metadata={}, page_content='ChatGPT, an AI designed to converse with users, can answer various questions.'),
     Document(metadata={}, page_content='ChatGPT, an AI designed to converse with users, can answer various questions.'),
     Document(metadata={}, page_content='ChatGPT was developed by OpenAI and is continuously being improved.'),
     Document(metadata={}, page_content='ChatGPT was developed by OpenAI and is continuously being improved.'),
     Document(metadata={}, page_content='ChatGPT was developed by OpenAI and is continuously being improved.')]

Creating Question-Answering Chain with Context Reordering

A chain that enhances QA (Question-Answering) performance by reordering documents using LongContextReorder, which optimizes the arrangement of context for better comprehension and response accuracy.

def format_docs(docs):
    return "\n".join([doc.page_content for i, doc in enumerate(docs)])

print(format_docs(docs))

ChatGPT was developed by OpenAI and is continuously being improved.
    ChatGPT was developed by OpenAI and is continuously being improved.
    ChatGPT was developed by OpenAI and is continuously being improved.
    ChatGPT was developed by OpenAI and is continuously being improved.
    ChatGPT was developed by OpenAI and is continuously being improved.
    ChatGPT, an AI designed to converse with users, can answer various questions.
    ChatGPT, an AI designed to converse with users, can answer various questions.
    ChatGPT, an AI designed to converse with users, can answer various questions.
    ChatGPT, an AI designed to converse with users, can answer various questions.
    ChatGPT, an AI designed to converse with users, can answer various questions.

def format_docs(docs):
    return "\n".join(
        [
            f"[{i}] {doc.page_content} [source: teddylee777@gmail.com]"
            for i, doc in enumerate(docs)
        ]
    )


def reorder_documents(docs):
    # Reorder
    reordering = LongContextReorder()
    reordered_docs = reordering.transform_documents(docs)
    combined = format_docs(reordered_docs)
    print(combined)
    return combined

Prints the reordered documents.

# Define prompt template
_ = reorder_documents(docs)

[0] ChatGPT was developed by OpenAI and is continuously being improved. [source: teddylee777@gmail.com]
    [1] ChatGPT was developed by OpenAI and is continuously being improved. [source: teddylee777@gmail.com]
    [2] ChatGPT, an AI designed to converse with users, can answer various questions. [source: teddylee777@gmail.com]
    [3] ChatGPT, an AI designed to converse with users, can answer various questions. [source: teddylee777@gmail.com]
    [4] ChatGPT, an AI designed to converse with users, can answer various questions. [source: teddylee777@gmail.com]
    [5] ChatGPT, an AI designed to converse with users, can answer various questions. [source: teddylee777@gmail.com]
    [6] ChatGPT, an AI designed to converse with users, can answer various questions. [source: teddylee777@gmail.com]
    [7] ChatGPT was developed by OpenAI and is continuously being improved. [source: teddylee777@gmail.com]
    [8] ChatGPT was developed by OpenAI and is continuously being improved. [source: teddylee777@gmail.com]
    [9] ChatGPT was developed by OpenAI and is continuously being improved. [source: teddylee777@gmail.com]

from langchain.prompts import ChatPromptTemplate
from operator import itemgetter
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableLambda

# Define prompt template
template = """Given this text extracts:
{context}

-----
Please answer the following question:
{question}

Answer in the following languages: {language}
"""

# Define prompt
prompt = ChatPromptTemplate.from_template(template)

# Define Chain
chain = (
    {
        "context": itemgetter("question")
        | retriever
        | RunnableLambda(reorder_documents),  # Search context based on question
        "question": itemgetter("question"),  # Extract question
        "language": itemgetter("language"),  # Extract answer language
    }
    | prompt  # Pass values to prompt template
    | ChatOpenAI(model="gpt-4o-mini")  # Pass prompt to language model
    | StrOutputParser()  # Parse model output as string
)

Enter the query in question and language for response.

Check the search results of reordered documents.

answer = chain.invoke(
    {"question": "What can you tell me about ChatGPT?", "language": "English"}
)

[0] ChatGPT's capabilities are continuously evolving through ongoing learning and updates. [source: teddylee777@gmail.com]
    [1] ChatGPT's capabilities are continuously evolving through ongoing learning and updates. [source: teddylee777@gmail.com]
    [2] ChatGPT was developed by OpenAI and is continuously being improved. [source: teddylee777@gmail.com]
    [3] ChatGPT was developed by OpenAI and is continuously being improved. [source: teddylee777@gmail.com]
    [4] ChatGPT was developed by OpenAI and is continuously being improved. [source: teddylee777@gmail.com]
    [5] ChatGPT was developed by OpenAI and is continuously being improved. [source: teddylee777@gmail.com]
    [6] ChatGPT was developed by OpenAI and is continuously being improved. [source: teddylee777@gmail.com]
    [7] ChatGPT's capabilities are continuously evolving through ongoing learning and updates. [source: teddylee777@gmail.com]
    [8] ChatGPT's capabilities are continuously evolving through ongoing learning and updates. [source: teddylee777@gmail.com]
    [9] ChatGPT's capabilities are continuously evolving through ongoing learning and updates. [source: teddylee777@gmail.com]

Prints the response.

print(answer)

ChatGPT is an AI language model developed by OpenAI. Its capabilities are continuously evolving through ongoing learning and updates, which means it is regularly improved to enhance its performance and functionality. The model is designed to understand and generate human-like text, making it useful for a variety of applications such as conversational agents, content creation, and more.

PreviousEnsemble Retriever NextParent Document Retriever

Last updated 3 months ago

Overview

Table of Contents

Environment Setup

Create an instance of the LongContextReorder class named reordering.

Creating Question-Answering Chain with Context Reordering