Regardless of the model's architecture, performance significantly degrades when including more than 10 retrieved documents.
Simply put, when the model needs to access relevant information in the middle of a long context, it tends to ignore the provided documents.
For more details, please refer to the following paper:
https://arxiv.org/abs/2307.03172
To avoid this issue, you can prevent performance degradation by reordering documents after retrieval.
Create a retriever that can store and search text data using the Chroma vector store. Use the retriever's invoke method to search for highly relevant documents for a given query.
Create an instance of the LongContextReorder class named reordering.
Enter a query for the retriever to perform the search.
from langchain_core.prompts import PromptTemplatefrom langchain_community.document_transformers import LongContextReorderfrom langchain_community.vectorstores import Chromafrom langchain_openai import OpenAIEmbeddings# Get embeddingsembeddings =OpenAIEmbeddings(model="text-embedding-3-small")texts = ["This is just a random text I wrote.","ChatGPT, an AI designed to converse with users, can answer various questions.","iPhone, iPad, MacBook are representative products released by Apple.","ChatGPT was developed by OpenAI and is continuously being improved.","ChatGPT has learned from vast amounts of data to understand user questions and generate appropriate answers.","Wearable devices like Apple Watch and AirPods are also part of Apple's popular product line.","ChatGPT can be used to solve complex problems or suggest creative ideas.","Bitcoin is also called digital gold and is gaining popularity as a store of value.","ChatGPT's capabilities are continuously evolving through ongoing learning and updates.","The FIFA World Cup is held every four years and is the biggest event in international football.",]# Create a retriever (Set K to 10)retriever = Chroma.from_texts(texts, embedding=embeddings).as_retriever( search_kwargs={"k": 10})
query ="What can you tell me about ChatGPT?"# Retrieves relevant documents sorted by relevance score.docs = retriever.invoke(query)docs
[Document(metadata={}, page_content='ChatGPT was developed by OpenAI and is continuously being improved.'),
Document(metadata={}, page_content='ChatGPT was developed by OpenAI and is continuously being improved.'),
Document(metadata={}, page_content='ChatGPT was developed by OpenAI and is continuously being improved.'),
Document(metadata={}, page_content='ChatGPT was developed by OpenAI and is continuously being improved.'),
Document(metadata={}, page_content='ChatGPT was developed by OpenAI and is continuously being improved.'),
Document(metadata={}, page_content='ChatGPT, an AI designed to converse with users, can answer various questions.'),
Document(metadata={}, page_content='ChatGPT, an AI designed to converse with users, can answer various questions.'),
Document(metadata={}, page_content='ChatGPT, an AI designed to converse with users, can answer various questions.'),
Document(metadata={}, page_content='ChatGPT, an AI designed to converse with users, can answer various questions.'),
Document(metadata={}, page_content='ChatGPT, an AI designed to converse with users, can answer various questions.')]
Create an instance of LongContextReorder class.
Call reordering.transform_documents(docs) to reorder the document list.
Less relevant documents are positioned in the middle of the list, while more relevant documents are positioned at the beginning and end.
# Reorder the documents# Less relevant documents are positioned in the middle, more relevant elements at start/endreordering =LongContextReorder()reordered_docs = reordering.transform_documents(docs)# Verify that 4 relevant documents are positioned at start and endreordered_docs
[Document(metadata={}, page_content='ChatGPT was developed by OpenAI and is continuously being improved.'),
Document(metadata={}, page_content='ChatGPT was developed by OpenAI and is continuously being improved.'),
Document(metadata={}, page_content='ChatGPT, an AI designed to converse with users, can answer various questions.'),
Document(metadata={}, page_content='ChatGPT, an AI designed to converse with users, can answer various questions.'),
Document(metadata={}, page_content='ChatGPT, an AI designed to converse with users, can answer various questions.'),
Document(metadata={}, page_content='ChatGPT, an AI designed to converse with users, can answer various questions.'),
Document(metadata={}, page_content='ChatGPT, an AI designed to converse with users, can answer various questions.'),
Document(metadata={}, page_content='ChatGPT was developed by OpenAI and is continuously being improved.'),
Document(metadata={}, page_content='ChatGPT was developed by OpenAI and is continuously being improved.'),
Document(metadata={}, page_content='ChatGPT was developed by OpenAI and is continuously being improved.')]
Creating Question-Answering Chain with Context Reordering
A chain that enhances QA (Question-Answering) performance by reordering documents using LongContextReorder, which optimizes the arrangement of context for better comprehension and response accuracy.
defformat_docs(docs):return"\n".join([doc.page_content for i, doc inenumerate(docs)])
print(format_docs(docs))
ChatGPT was developed by OpenAI and is continuously being improved.
ChatGPT was developed by OpenAI and is continuously being improved.
ChatGPT was developed by OpenAI and is continuously being improved.
ChatGPT was developed by OpenAI and is continuously being improved.
ChatGPT was developed by OpenAI and is continuously being improved.
ChatGPT, an AI designed to converse with users, can answer various questions.
ChatGPT, an AI designed to converse with users, can answer various questions.
ChatGPT, an AI designed to converse with users, can answer various questions.
ChatGPT, an AI designed to converse with users, can answer various questions.
ChatGPT, an AI designed to converse with users, can answer various questions.
[0] ChatGPT was developed by OpenAI and is continuously being improved. [source: teddylee777@gmail.com]
[1] ChatGPT was developed by OpenAI and is continuously being improved. [source: teddylee777@gmail.com]
[2] ChatGPT, an AI designed to converse with users, can answer various questions. [source: teddylee777@gmail.com]
[3] ChatGPT, an AI designed to converse with users, can answer various questions. [source: teddylee777@gmail.com]
[4] ChatGPT, an AI designed to converse with users, can answer various questions. [source: teddylee777@gmail.com]
[5] ChatGPT, an AI designed to converse with users, can answer various questions. [source: teddylee777@gmail.com]
[6] ChatGPT, an AI designed to converse with users, can answer various questions. [source: teddylee777@gmail.com]
[7] ChatGPT was developed by OpenAI and is continuously being improved. [source: teddylee777@gmail.com]
[8] ChatGPT was developed by OpenAI and is continuously being improved. [source: teddylee777@gmail.com]
[9] ChatGPT was developed by OpenAI and is continuously being improved. [source: teddylee777@gmail.com]
from langchain.prompts import ChatPromptTemplatefrom operator import itemgetterfrom langchain_openai import ChatOpenAIfrom langchain_core.output_parsers import StrOutputParserfrom langchain_core.runnables import RunnableLambda# Define prompt templatetemplate ="""Given this text extracts:{context}-----Please answer the following question:{question}Answer in the following languages: {language}"""# Define promptprompt = ChatPromptTemplate.from_template(template)# Define Chainchain = ({"context":itemgetter("question")| retriever|RunnableLambda(reorder_documents),# Search context based on question"question":itemgetter("question"),# Extract question"language":itemgetter("language"),# Extract answer language}| prompt # Pass values to prompt template|ChatOpenAI(model="gpt-4o-mini")# Pass prompt to language model|StrOutputParser()# Parse model output as string)
Enter the query in question and language for response.
Check the search results of reordered documents.
answer = chain.invoke( {"question": "What can you tell me about ChatGPT?", "language": "English"})
[0] ChatGPT's capabilities are continuously evolving through ongoing learning and updates. [source: teddylee777@gmail.com]
[1] ChatGPT's capabilities are continuously evolving through ongoing learning and updates. [source: teddylee777@gmail.com]
[2] ChatGPT was developed by OpenAI and is continuously being improved. [source: teddylee777@gmail.com]
[3] ChatGPT was developed by OpenAI and is continuously being improved. [source: teddylee777@gmail.com]
[4] ChatGPT was developed by OpenAI and is continuously being improved. [source: teddylee777@gmail.com]
[5] ChatGPT was developed by OpenAI and is continuously being improved. [source: teddylee777@gmail.com]
[6] ChatGPT was developed by OpenAI and is continuously being improved. [source: teddylee777@gmail.com]
[7] ChatGPT's capabilities are continuously evolving through ongoing learning and updates. [source: teddylee777@gmail.com]
[8] ChatGPT's capabilities are continuously evolving through ongoing learning and updates. [source: teddylee777@gmail.com]
[9] ChatGPT's capabilities are continuously evolving through ongoing learning and updates. [source: teddylee777@gmail.com]
Prints the response.
print(answer)
ChatGPT is an AI language model developed by OpenAI. Its capabilities are continuously evolving through ongoing learning and updates, which means it is regularly improved to enhance its performance and functionality. The model is designed to understand and generate human-like text, making it useful for a variety of applications such as conversational agents, content creation, and more.