Check Token Usage
Author: Haseom Shin
Proofread : Two-Jay
This is a part of LangChain Open Tutorial
Overview
This tutorial covers how to track and monitor token usage with LangChain and OpenAI API.
Token usage tracking is crucial for managing API costs and optimizing resource utilization.
In this tutorial, we will create a simple example to measure and monitor token consumption during OpenAI API calls using LangChain's CallbackHandler.

Table of Contents
References
Environment Setup
Set up the environment. You may refer to Environment Setup for more details.
[Note]
langchain-opentutorialis a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials.You can checkout the
langchain-opentutorialfor more details.
%%capture --no-stderr
%pip install langchain-opentutorial# Install required packages
from langchain_opentutorial import package
package.install(
[
"langsmith",
"langchain",
"langchain_openai",
"langchain_community",
],
verbose=False,
upgrade=False,
) [notice] A new release of pip is available: 24.2 -> 24.3.1
[notice] To update, run: pip install --upgrade pip# Set environment variables
from langchain_opentutorial import set_env
set_env(
{
"OPENAI_API_KEY": "",
"LANGCHAIN_API_KEY": "",
"LANGCHAIN_TRACING_V2": "true",
"LANGCHAIN_ENDPOINT": "https://api.smith.langchain.com",
"LANGCHAIN_PROJECT": "04-CheckTokenUsage",
}
)You can alternatively set OPENAI_API_KEY in .env file and load it.
[Note] This is not necessary if you've already set OPENAI_API_KEY in previous steps.
from dotenv import load_dotenv
load_dotenv()TrueLet's setup ChatOpenAI with gpt-4o model.
from langchain_openai import ChatOpenAI
# Load the model
llm = ChatOpenAI(model_name="gpt-4o")Implementing Check Token Usage
if you want to check token usage, you can use get_openai_callback function.
# callback to track it
from langchain_community.callbacks.manager import get_openai_callback
with get_openai_callback() as cb:
result = llm.invoke("where is the capital of United States?")
print(cb)Tokens Used: 28
Prompt Tokens: 15
Prompt Tokens Cached: 0
Completion Tokens: 13
Reasoning Tokens: 0
Successful Requests: 1
Total Cost (USD): $0.00016749999999999998# callback to track it
with get_openai_callback() as cb:
result = llm.invoke("where is the capital of United States?")
print(f"Total tokens used: \t\t{cb.total_tokens}")
print(f"Tokens used in prompt: \t\t{cb.prompt_tokens}")
print(f"Tokens used in completion: \t{cb.completion_tokens}")
print(f"Cost: \t\t\t\t${cb.total_cost}")Total tokens used: 28
Tokens used in prompt: 15
Tokens used in completion: 13
Cost: $0.00016749999999999998Monitoring Token Usage
Token usage monitoring is crucial for managing costs and resources when using the OpenAI API. LangChain provides an easy way to track this through get_openai_callback().
In this section, we'll explore token usage monitoring through three key scenarios:
Single Query Monitoring:
Track token usage for individual API calls
Distinguish between prompt and completion tokens
Calculate costs
Multiple Queries Monitoring:
Track cumulative token usage across multiple API calls
Analyze total costs
Note: Token usage monitoring is currently only supported for OpenAI API.
# 1. Single Query Monitoring
print("1. Single Query Monitoring")
print("-" * 40)
with get_openai_callback() as cb:
response = llm.invoke("What is the capital of France?")
print(f"Response: {response.content}")
print("-" * 40)
print(f"Token Usage Statistics:")
print(f"Prompt Tokens: \t\t{cb.prompt_tokens}")
print(f"Completion Tokens: \t{cb.completion_tokens}")
print(f"Total Tokens: \t\t{cb.total_tokens}")
print(f"Cost: \t\t\t${cb.total_cost:.4f}\n")1. Single Query Monitoring
----------------------------------------
Response: The capital of France is Paris.
----------------------------------------
Token Usage Statistics:
Prompt Tokens: 14
Completion Tokens: 8
Total Tokens: 22
Cost: $0.0001
# 2. Multiple Queries Monitoring
print("2. Multiple Queries Monitoring")
print("-" * 40)
with get_openai_callback() as cb:
# First query
response1 = llm.invoke("What is Python?")
# Second query
response2 = llm.invoke("What is JavaScript?")
print(f"Response 1: {response1.content[:100]}...")
print("-" * 40)
print(f"Response 2: {response2.content[:100]}...")
print("-" * 40)
print("Cumulative Statistics:")
print(f"Total Prompt Tokens: \t\t{cb.prompt_tokens}")
print(f"Total Completion Tokens: \t{cb.completion_tokens}")
print(f"Total Tokens: \t\t\t{cb.total_tokens}")
print(f"Total Cost: \t\t\t${cb.total_cost:.4f}\n")2. Multiple Queries Monitoring
----------------------------------------
Response 1: Python is a high-level, interpreted programming language known for its readability, simplicity, and ...
----------------------------------------
Response 2: JavaScript is a high-level, dynamic, untyped, and interpreted programming language that is widely us...
----------------------------------------
Cumulative Statistics:
Total Prompt Tokens: 23
Total Completion Tokens: 596
Total Tokens: 619
Total Cost: $0.0060
Last updated