LangGraph Streaming Mode

Open in ColabOpen in GitHub

Overview

This tutorial demonstrates LangGraph's streaming capabilities by building an AI news search system.

It covers three key streaming modes: values, updates, and messages, each serving different output monitoring needs.

The tutorial also explores advanced features including subgraphs and tag-based filtering for enhanced control over real-time AI outputs.

Table of Contents

References


Environment Setup

Setting up your environment is the first step. See the Environment Setup guide for more details.

[Note]

The langchain-opentutorial is a package of easy-to-use environment setup guidance, useful functions and utilities for tutorials. Check out the langchain-opentutorial for more details.

You can set API keys in a .env file or set them manually.

[Note] If you’re not using the .env file, no worries! Just enter the keys directly in the cell below, and you’re good to go.

Introduction to Streaming Modes

LangGraph supports multiple streaming modes. The main ones are:

  • values: This streaming mode streams back values of the graph. This is the full state of the graph after each node is called.

  • updates: This streaming mode streams back updates to the graph. This is the update to the state of the graph after each node is called.

  • messages: This streaming mode streams LLM tokens from nodes as they are produced.

Defining the Graph

We'll create a simple agent that can search news and process the results.

First, we'll define a class to fetch Google News search results:

Now that we have our news fetching functionality, let's build the graph structure using LangGraph.

We'll create states, define tools, and establish the connections between different components:

Visualize the graph.

png

Step-by-step output of a node

Streaming mode

  • values: Output current status value for each step

  • updates: Output only status updates for each step (default)

  • messages: Output messages for each step

Streaming here does not mean token-by-token streaming of LLM output, but rather step-by-step output.

Values Mode (stream_mode="values")

The values mode streams the complete state after each node execution.

A chunk is a tuple with two elements.

  • key: key of State

  • value: value of State

Synchronous Streaming

Asynchronous Streaming

The astream() method runs the graph through asynchronous stream processing and generates chunked responses in value mode.

It uses an async for statement to perform asynchronous stream processing.

If you only want to see the final result, do the following:

Updates Mode (stream_mode="updates")

The updates mode streams only the changes to the state after each node execution.

The output is a dictionary with the node names as keys and the updated values as values.

A chunk is a tuple with two elements.

  • key: the name of the Node

  • value: The output value from that Node step, i.e. a dictionary with multiple key-value pairs

Synchronous Streaming

Asynchronous Streaming

Messages Mode (stream_mode="messages")

The messages mode streams individual messages from each node.

A chunk is a tuple with two elements.

  • chunk_msg: real-time output message

  • metadata: Node information

Synchronous Streaming

Asynchronous Streaming

Advanced Streaming Features

Streaming output to a specific node

If you want to output for a specific Node, you can set it via stream_mode="messages".

When setting stream_mode="messages", you will receive messages in the form of (chunk_msg, metadata).

  • chunk_msg: the real-time output message

  • metadata: the node information

You can use metadata["langgraph_node"] to output only messages from a specific node.

You can see the node information by outputting metadata.

png

Filtering with Tags

If your LLM's output comes from multiple places, you may only want to output messages from a specific node.

In this case, you can add tags to select only the nodes you want to output.

Here's how to add tags to your LLM. Tags can be added as a list.

This allows you to filter events more precisely, keeping only events that occurred in that model.

The example below outputs only if the WANT_TO_STREAM tag is present.

Tool Call Streaming

  • AIMessageChunk: Output messages in real-time, tokenized units.

  • tool_call_chunks: Tool call chunks. If tool_call_chunks exists, output tool call chunks cumulatively. (Tool tokens are determined by looking at this property)

Working with Subgraphs

In this part, we'll learn how to structure your graph using Subgraphs.

Subgraphs is a feature that allows you to define parts of a graph as subgraphs.

Example Flow

  1. Subgraphs reuse the existing ability to search for the latest news.

  2. The Parent Graph adds the ability to generate social media posts based on the latest news retrieved.

We can visualize the subgraph flow using xray option.

png

Now let's see how our graph processes and outputs the results when searching for AI news.

When including Subgraphs output

You can also include the output of Subgraphs via subgraphs=True.

The output will be in the form (namespace, chunk).

Streaming LLM Output Token by Token Inside Subgraphs

kind indicates the type of event.

See the LangChain astream_events() reference for all event types.

For streaming output of only specific tags

ONLY_STREAM_TAGS allows you to set only the tags you want to stream output.

Here we see that "WANT_TO_STREAM2" is excluded from the output and only "WANT_TO_STREAM" is output.

Last updated