LangChain OpenTutorial
  • 🦜️🔗 The LangChain Open Tutorial for Everyone
  • 01-Basic
    • Getting Started on Windows
    • 02-Getting-Started-Mac
    • OpenAI API Key Generation and Testing Guide
    • LangSmith Tracking Setup
    • Using the OpenAI API (GPT-4o Multimodal)
    • Basic Example: Prompt+Model+OutputParser
    • LCEL Interface
    • Runnable
  • 02-Prompt
    • Prompt Template
    • Few-Shot Templates
    • LangChain Hub
    • Personal Prompts for LangChain
    • Prompt Caching
  • 03-OutputParser
    • PydanticOutputParser
    • PydanticOutputParser
    • CommaSeparatedListOutputParser
    • Structured Output Parser
    • JsonOutputParser
    • PandasDataFrameOutputParser
    • DatetimeOutputParser
    • EnumOutputParser
    • Output Fixing Parser
  • 04-Model
    • Using Various LLM Models
    • Chat Models
    • Caching
    • Caching VLLM
    • Model Serialization
    • Check Token Usage
    • Google Generative AI
    • Huggingface Endpoints
    • HuggingFace Local
    • HuggingFace Pipeline
    • ChatOllama
    • GPT4ALL
    • Video Q&A LLM (Gemini)
  • 05-Memory
    • ConversationBufferMemory
    • ConversationBufferWindowMemory
    • ConversationTokenBufferMemory
    • ConversationEntityMemory
    • ConversationKGMemory
    • ConversationSummaryMemory
    • VectorStoreRetrieverMemory
    • LCEL (Remembering Conversation History): Adding Memory
    • Memory Using SQLite
    • Conversation With History
  • 06-DocumentLoader
    • Document & Document Loader
    • PDF Loader
    • WebBaseLoader
    • CSV Loader
    • Excel File Loading in LangChain
    • Microsoft Word(doc, docx) With Langchain
    • Microsoft PowerPoint
    • TXT Loader
    • JSON
    • Arxiv Loader
    • UpstageDocumentParseLoader
    • LlamaParse
    • HWP (Hangeul) Loader
  • 07-TextSplitter
    • Character Text Splitter
    • 02. RecursiveCharacterTextSplitter
    • Text Splitting Methods in NLP
    • TokenTextSplitter
    • SemanticChunker
    • Split code with Langchain
    • MarkdownHeaderTextSplitter
    • HTMLHeaderTextSplitter
    • RecursiveJsonSplitter
  • 08-Embedding
    • OpenAI Embeddings
    • CacheBackedEmbeddings
    • HuggingFace Embeddings
    • Upstage
    • Ollama Embeddings With Langchain
    • LlamaCpp Embeddings With Langchain
    • GPT4ALL
    • Multimodal Embeddings With Langchain
  • 09-VectorStore
    • Vector Stores
    • Chroma
    • Faiss
    • Pinecone
    • Qdrant
    • Elasticsearch
    • MongoDB Atlas
    • PGVector
    • Neo4j
    • Weaviate
    • Faiss
    • {VectorStore Name}
  • 10-Retriever
    • VectorStore-backed Retriever
    • Contextual Compression Retriever
    • Ensemble Retriever
    • Long Context Reorder
    • Parent Document Retriever
    • MultiQueryRetriever
    • MultiVectorRetriever
    • Self-querying
    • TimeWeightedVectorStoreRetriever
    • TimeWeightedVectorStoreRetriever
    • Kiwi BM25 Retriever
    • Ensemble Retriever with Convex Combination (CC)
  • 11-Reranker
    • Cross Encoder Reranker
    • JinaReranker
    • FlashRank Reranker
  • 12-RAG
    • Understanding the basic structure of RAG
    • RAG Basic WebBaseLoader
    • Exploring RAG in LangChain
    • RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
    • Conversation-With-History
    • Translation
    • Multi Modal RAG
  • 13-LangChain-Expression-Language
    • RunnablePassthrough
    • Inspect Runnables
    • RunnableLambda
    • Routing
    • Runnable Parallel
    • Configure-Runtime-Chain-Components
    • Creating Runnable objects with chain decorator
    • RunnableWithMessageHistory
    • Generator
    • Binding
    • Fallbacks
    • RunnableRetry
    • WithListeners
    • How to stream runnables
  • 14-Chains
    • Summarization
    • SQL
    • Structured Output Chain
    • StructuredDataChat
  • 15-Agent
    • Tools
    • Bind Tools
    • Tool Calling Agent
    • Tool Calling Agent with More LLM Models
    • Iteration-human-in-the-loop
    • Agentic RAG
    • CSV/Excel Analysis Agent
    • Agent-with-Toolkits-File-Management
    • Make Report Using RAG, Web searching, Image generation Agent
    • TwoAgentDebateWithTools
    • React Agent
  • 16-Evaluations
    • Generate synthetic test dataset (with RAGAS)
    • Evaluation using RAGAS
    • HF-Upload
    • LangSmith-Dataset
    • LLM-as-Judge
    • Embedding-based Evaluator(embedding_distance)
    • LangSmith Custom LLM Evaluation
    • Heuristic Evaluation
    • Compare experiment evaluations
    • Summary Evaluators
    • Groundedness Evaluation
    • Pairwise Evaluation
    • LangSmith Repeat Evaluation
    • LangSmith Online Evaluation
    • LangFuse Online Evaluation
  • 17-LangGraph
    • 01-Core-Features
      • Understanding Common Python Syntax Used in LangGraph
      • Title
      • Building a Basic Chatbot with LangGraph
      • Building an Agent with LangGraph
      • Agent with Memory
      • LangGraph Streaming Outputs
      • Human-in-the-loop
      • LangGraph Manual State Update
      • Asking Humans for Help: Customizing State in LangGraph
      • DeleteMessages
      • DeleteMessages
      • LangGraph ToolNode
      • LangGraph ToolNode
      • Branch Creation for Parallel Node Execution
      • Conversation Summaries with LangGraph
      • Conversation Summaries with LangGraph
      • LangGrpah Subgraph
      • How to transform the input and output of a subgraph
      • LangGraph Streaming Mode
      • Errors
      • A Long-Term Memory Agent
    • 02-Structures
      • LangGraph-Building-Graphs
      • Naive RAG
      • Add Groundedness Check
      • Adding a Web Search Module
      • LangGraph-Add-Query-Rewrite
      • Agentic RAG
      • Adaptive RAG
      • Multi-Agent Structures (1)
      • Multi Agent Structures (2)
    • 03-Use-Cases
      • LangGraph Agent Simulation
      • Meta Prompt Generator based on User Requirements
      • CRAG: Corrective RAG
      • Plan-and-Execute
      • Multi Agent Collaboration Network
      • Multi Agent Collaboration Network
      • Multi-Agent Supervisor
      • 08-LangGraph-Hierarchical-Multi-Agent-Teams
      • 08-LangGraph-Hierarchical-Multi-Agent-Teams
      • SQL-Agent
      • 10-LangGraph-Research-Assistant
      • LangGraph Code Assistant
      • Deploy on LangGraph Cloud
      • Tree of Thoughts (ToT)
      • Ollama Deep Researcher (Deepseek-R1)
      • Functional API
      • Reflection in LangGraph
  • 19-Cookbook
    • 01-SQL
      • TextToSQL
      • SpeechToSQL
    • 02-RecommendationSystem
      • ResumeRecommendationReview
    • 03-GraphDB
      • Movie QA System with Graph Database
      • 05-TitanicQASystem
      • Real-Time GraphRAG QA
    • 04-GraphRAG
      • Academic Search System
      • Academic QA System with GraphRAG
    • 05-AIMemoryManagementSystem
      • ConversationMemoryManagementSystem
    • 06-Multimodal
      • Multimodal RAG
      • Shopping QnA
    • 07-Agent
      • 14-MoARAG
      • CoT Based Smart Web Search
      • 16-MultiAgentShoppingMallSystem
      • Agent-Based Dynamic Slot Filling
      • Code Debugging System
      • New Employee Onboarding Chatbot
      • 20-LangGraphStudio-MultiAgent
      • Multi-Agent Scheduler System
    • 08-Serving
      • FastAPI Serving
      • Sending Requests to Remote Graph Server
      • Building a Agent API with LangServe: Integrating Currency Exchange and Trip Planning
    • 08-SyntheticDataset
      • Synthetic Dataset Generation using RAG
    • 09-Monitoring
      • Langfuse Selfhosting
Powered by GitBook
On this page
  • Overview
  • Table of Contents
  • Environment Setup
  • Create an instance of the LongContextReorder class named reordering.
  • Creating Question-Answering Chain with Context Reordering
  1. 10-Retriever

Long Context Reorder

PreviousEnsemble RetrieverNextParent Document Retriever

Last updated 28 days ago

  • Author:

  • Peer Review:

  • Proofread :

  • This is a part of

Overview

Regardless of the model's architecture, performance significantly degrades when including more than 10 retrieved documents.

Simply put, when the model needs to access relevant information in the middle of a long context, it tends to ignore the provided documents.

For more details, please refer to the following paper:

  • https://arxiv.org/abs/2307.03172

To avoid this issue, you can prevent performance degradation by reordering documents after retrieval.

Create a retriever that can store and search text data using the Chroma vector store. Use the retriever's invoke method to search for highly relevant documents for a given query.

Table of Contents


Environment Setup

[Note]

  • langchain-opentutorial is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials.

%%capture --no-stderr
!pip install langchain-opentutorial
# Configuration file for managing API keys as environment variables
from dotenv import load_dotenv

# Load API key information
load_dotenv(override=True)
True

from langchain_opentutorial import package

package.install(
    [
       "langsmith",
        "langchain",
        "langchain_openai",
        "langchain_community",
        "langchain-chroma",
    ],
    verbose=False,
    upgrade=False,
)
from langchain_opentutorial import set_env

set_env(
    {
        # "OPENAI_API_KEY": "",
        # "LANGCHAIN_API_KEY": "",
        "LANGCHAIN_TRACING_V2": "true",
        "LANGCHAIN_ENDPOINT": "https://api.smith.langchain.com",
        "LANGCHAIN_PROJECT": "04-LongContextReorder",
    }
)
Environment variables have been set successfully.

Create an instance of the LongContextReorder class named reordering.

Enter a query for the retriever to perform the search.

from langchain_core.prompts import PromptTemplate
from langchain_community.document_transformers import LongContextReorder
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings

# Get embeddings
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

texts = [
    "This is just a random text I wrote.",
    "ChatGPT, an AI designed to converse with users, can answer various questions.",
    "iPhone, iPad, MacBook are representative products released by Apple.",
    "ChatGPT was developed by OpenAI and is continuously being improved.",
    "ChatGPT has learned from vast amounts of data to understand user questions and generate appropriate answers.",
    "Wearable devices like Apple Watch and AirPods are also part of Apple's popular product line.",
    "ChatGPT can be used to solve complex problems or suggest creative ideas.",
    "Bitcoin is also called digital gold and is gaining popularity as a store of value.",
    "ChatGPT's capabilities are continuously evolving through ongoing learning and updates.",
    "The FIFA World Cup is held every four years and is the biggest event in international football.",
]



# Create a retriever (Set K to 10)
retriever = Chroma.from_texts(texts, embedding=embeddings).as_retriever(
    search_kwargs={"k": 10}
)
query = "What can you tell me about ChatGPT?"

# Retrieves relevant documents sorted by relevance score.
docs = retriever.invoke(query)
docs
[Document(metadata={}, page_content='ChatGPT was developed by OpenAI and is continuously being improved.'),
     Document(metadata={}, page_content='ChatGPT was developed by OpenAI and is continuously being improved.'),
     Document(metadata={}, page_content='ChatGPT was developed by OpenAI and is continuously being improved.'),
     Document(metadata={}, page_content='ChatGPT was developed by OpenAI and is continuously being improved.'),
     Document(metadata={}, page_content='ChatGPT was developed by OpenAI and is continuously being improved.'),
     Document(metadata={}, page_content='ChatGPT, an AI designed to converse with users, can answer various questions.'),
     Document(metadata={}, page_content='ChatGPT, an AI designed to converse with users, can answer various questions.'),
     Document(metadata={}, page_content='ChatGPT, an AI designed to converse with users, can answer various questions.'),
     Document(metadata={}, page_content='ChatGPT, an AI designed to converse with users, can answer various questions.'),
     Document(metadata={}, page_content='ChatGPT, an AI designed to converse with users, can answer various questions.')]

Create an instance of LongContextReorder class.

  • Call reordering.transform_documents(docs) to reorder the document list.

  • Less relevant documents are positioned in the middle of the list, while more relevant documents are positioned at the beginning and end.

# Reorder the documents
# Less relevant documents are positioned in the middle, more relevant elements at start/end
reordering = LongContextReorder()
reordered_docs = reordering.transform_documents(docs)

# Verify that 4 relevant documents are positioned at start and end
reordered_docs
[Document(metadata={}, page_content='ChatGPT was developed by OpenAI and is continuously being improved.'),
     Document(metadata={}, page_content='ChatGPT was developed by OpenAI and is continuously being improved.'),
     Document(metadata={}, page_content='ChatGPT, an AI designed to converse with users, can answer various questions.'),
     Document(metadata={}, page_content='ChatGPT, an AI designed to converse with users, can answer various questions.'),
     Document(metadata={}, page_content='ChatGPT, an AI designed to converse with users, can answer various questions.'),
     Document(metadata={}, page_content='ChatGPT, an AI designed to converse with users, can answer various questions.'),
     Document(metadata={}, page_content='ChatGPT, an AI designed to converse with users, can answer various questions.'),
     Document(metadata={}, page_content='ChatGPT was developed by OpenAI and is continuously being improved.'),
     Document(metadata={}, page_content='ChatGPT was developed by OpenAI and is continuously being improved.'),
     Document(metadata={}, page_content='ChatGPT was developed by OpenAI and is continuously being improved.')]

Creating Question-Answering Chain with Context Reordering

A chain that enhances QA (Question-Answering) performance by reordering documents using LongContextReorder, which optimizes the arrangement of context for better comprehension and response accuracy.

def format_docs(docs):
    return "\n".join([doc.page_content for i, doc in enumerate(docs)])
print(format_docs(docs))
ChatGPT was developed by OpenAI and is continuously being improved.
    ChatGPT was developed by OpenAI and is continuously being improved.
    ChatGPT was developed by OpenAI and is continuously being improved.
    ChatGPT was developed by OpenAI and is continuously being improved.
    ChatGPT was developed by OpenAI and is continuously being improved.
    ChatGPT, an AI designed to converse with users, can answer various questions.
    ChatGPT, an AI designed to converse with users, can answer various questions.
    ChatGPT, an AI designed to converse with users, can answer various questions.
    ChatGPT, an AI designed to converse with users, can answer various questions.
    ChatGPT, an AI designed to converse with users, can answer various questions.
def format_docs(docs):
    return "\n".join(
        [
            f"[{i}] {doc.page_content} [source: teddylee777@gmail.com]"
            for i, doc in enumerate(docs)
        ]
    )


def reorder_documents(docs):
    # Reorder
    reordering = LongContextReorder()
    reordered_docs = reordering.transform_documents(docs)
    combined = format_docs(reordered_docs)
    print(combined)
    return combined

Prints the reordered documents.

# Define prompt template
_ = reorder_documents(docs)
[0] ChatGPT was developed by OpenAI and is continuously being improved. [source: teddylee777@gmail.com]
    [1] ChatGPT was developed by OpenAI and is continuously being improved. [source: teddylee777@gmail.com]
    [2] ChatGPT, an AI designed to converse with users, can answer various questions. [source: teddylee777@gmail.com]
    [3] ChatGPT, an AI designed to converse with users, can answer various questions. [source: teddylee777@gmail.com]
    [4] ChatGPT, an AI designed to converse with users, can answer various questions. [source: teddylee777@gmail.com]
    [5] ChatGPT, an AI designed to converse with users, can answer various questions. [source: teddylee777@gmail.com]
    [6] ChatGPT, an AI designed to converse with users, can answer various questions. [source: teddylee777@gmail.com]
    [7] ChatGPT was developed by OpenAI and is continuously being improved. [source: teddylee777@gmail.com]
    [8] ChatGPT was developed by OpenAI and is continuously being improved. [source: teddylee777@gmail.com]
    [9] ChatGPT was developed by OpenAI and is continuously being improved. [source: teddylee777@gmail.com]
from langchain.prompts import ChatPromptTemplate
from operator import itemgetter
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnableLambda

# Define prompt template
template = """Given this text extracts:
{context}

-----
Please answer the following question:
{question}

Answer in the following languages: {language}
"""

# Define prompt
prompt = ChatPromptTemplate.from_template(template)

# Define Chain
chain = (
    {
        "context": itemgetter("question")
        | retriever
        | RunnableLambda(reorder_documents),  # Search context based on question
        "question": itemgetter("question"),  # Extract question
        "language": itemgetter("language"),  # Extract answer language
    }
    | prompt  # Pass values to prompt template
    | ChatOpenAI(model="gpt-4o-mini")  # Pass prompt to language model
    | StrOutputParser()  # Parse model output as string
)

Enter the query in question and language for response.

Check the search results of reordered documents.

answer = chain.invoke(
    {"question": "What can you tell me about ChatGPT?", "language": "English"}
)
[0] ChatGPT's capabilities are continuously evolving through ongoing learning and updates. [source: teddylee777@gmail.com]
    [1] ChatGPT's capabilities are continuously evolving through ongoing learning and updates. [source: teddylee777@gmail.com]
    [2] ChatGPT was developed by OpenAI and is continuously being improved. [source: teddylee777@gmail.com]
    [3] ChatGPT was developed by OpenAI and is continuously being improved. [source: teddylee777@gmail.com]
    [4] ChatGPT was developed by OpenAI and is continuously being improved. [source: teddylee777@gmail.com]
    [5] ChatGPT was developed by OpenAI and is continuously being improved. [source: teddylee777@gmail.com]
    [6] ChatGPT was developed by OpenAI and is continuously being improved. [source: teddylee777@gmail.com]
    [7] ChatGPT's capabilities are continuously evolving through ongoing learning and updates. [source: teddylee777@gmail.com]
    [8] ChatGPT's capabilities are continuously evolving through ongoing learning and updates. [source: teddylee777@gmail.com]
    [9] ChatGPT's capabilities are continuously evolving through ongoing learning and updates. [source: teddylee777@gmail.com]

Prints the response.

print(answer)
ChatGPT is an AI language model developed by OpenAI. Its capabilities are continuously evolving through ongoing learning and updates, which means it is regularly improved to enhance its performance and functionality. The model is designed to understand and generate human-like text, making it useful for a variety of applications such as conversational agents, content creation, and more.

Set up the environment. You may refer to for more details.

You can checkout the for more details.

Environment Setup
langchain-opentutorial
Minji
jishin86
LangChain OpenTutorial
Overview
Environment Setup
Create an instance of the LongContextReorder class named reordering
Creating Question-Answering Chain with Context Reordering