Conversation Summaries with LangGraph

Open in Colab Open in GitHub

Overview

One of the most common use cases of conversation persistence is keeping track of a conversation's history. By summarizing and referencing past messages, we can maintain essential context without overloading the system with the entire conversation. This becomes especially important for long conversations, where a large context window can lead to increased computational costs and potential inaccuracies.

In this tutorial, we will explore how to summarize a conversation and integrate that summary into a new conversation state while removing older messages. This approach helps manage the conversation length within a limited context window, preventing inadvertent increases in cost or inference time.

Key Steps:

  1. Detect if a conversation is too long (e.g., based on the number of messages).

  2. If it exceeds a threshold, summarize the conversation so far.

  3. Remove older messages and store only the summary (plus the most recent messages).

This tutorial will guide you through setting up a conversation flow that automatically summarizes older messages and retains only the recent conversation turns and the summary.

Table of Contents

References


Environment Setup

Set up the environment. You may refer to Environment Setup for more details.

[Note]

  • langchain-opentutorial is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials.

  • You can checkout the langchain-opentutorial for more details.

You can alternatively set OPENAI_API_KEY in .env file and load it.

[Note] This is not necessary if you've already set OPENAI_API_KEY in previous steps.

Checking the Conversation Length

Below, we set up our custom State to store both messages and summaries. We'll also define a helper function in a separate cell to determine if the conversation has exceeded a certain length.

  • The State class extends MessagesState, holding messages and summary.

  • We initialize a ChatOpenAI model with the name gpt-4o-mini.

Here is a separate code cell for the should_continue function, which checks if the conversation has more than 6 messages.

Summarizing and Managing the Conversation

When the conversation exceeds the threshold, we summarize it and remove older messages to preserve only a recent segment and the overall summary. We also create a node (ask_llm) to handle new messages, optionally including the existing summary.

  • ask_llm checks if a summary exists and prepends it as a SystemMessage if so.

  • summarize_conversation either creates or extends a summary and removes older messages.

Building Graph Workflow

Here, we construct a StateGraph, add our nodes, and compile the application. We also use visualize_graph(app) to see how the workflow is structured.

  • Use StateGraph to define nodes and edges.

  • Use visualize_graph(app) to see the workflow.

Below is an example code snippet that visualizes the current workflow graph. We define custom node styles for the graph and then use IPython utilities to display the rendered image inline. The NodeStyles data class customizes fill colors, stroke styles, and overall appearance of the nodes.

png

Running Workflow

You can interact with the application by sending messages. Once the conversation exceeds 6 messages, it automatically summarizes and shortens.

  • The app.stream method handles streaming of messages and triggers the nodes accordingly.

  • Check the internal state with app.get_state(config) to see how messages and summaries are updated.

Here is a helper function, print_update, which prints updates in real time during streaming.

Below is a single code cell demonstrating how we handle user messages and process them through streaming mode. We import HumanMessage, configure the session with a thread ID, and send three user messages in sequence. After each message, the updates are streamed and printed in real time.

So far, no summary has been generated because there are only six messages. Once the conversation exceeds six messages, summarization will be triggered.

You can see that there are only six messages in the list, which is why no summary has been created yet. Let's send another message to exceed that threshold.

Because the conversation now has more than six messages, summarization will be triggered.

During this process, old messages are removed, and only the summary plus the last two messages remain in the conversation state.

As you can see, the summary has been added, and the older messages have been replaced by RemoveMessage actions. Only the most recent messages and the newly created summary remain.

You can now continue the conversation, and despite only having the last two messages visible, the system still retains the overall context through the summary.

Even though the older messages were removed from the conversation history, the summary contains the essential context. The model can respond accurately based on the stored summary.

Last updated