Check Token Usage

Overview

This tutorial covers how to track and monitor token usage with LangChain and OpenAI API.

Token usage tracking is crucial for managing API costs and optimizing resource utilization.

In this tutorial, we will create a simple example to measure and monitor token consumption during OpenAI API calls using LangChain's CallbackHandler.

example

Table of Contents

References


Environment Setup

Set up the environment. You may refer to Environment Setup for more details.

[Note]

  • langchain-opentutorial is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials.

  • You can checkout the langchain-opentutorial for more details.

%%capture --no-stderr
%pip install langchain-opentutorial
# Install required packages
from langchain_opentutorial import package

package.install(
    [
        "langsmith",
        "langchain",
        "langchain_openai",
        "langchain_community",
    ],
    verbose=False,
    upgrade=False,
)
    [notice] A new release of pip is available: 24.2 -> 24.3.1
    [notice] To update, run: pip install --upgrade pip
# Set environment variables
from langchain_opentutorial import set_env

set_env(
    {
        "OPENAI_API_KEY": "",
        "LANGCHAIN_API_KEY": "",
        "LANGCHAIN_TRACING_V2": "true",
        "LANGCHAIN_ENDPOINT": "https://api.smith.langchain.com",
        "LANGCHAIN_PROJECT": "04-CheckTokenUsage",
    }
)

You can alternatively set OPENAI_API_KEY in .env file and load it.

[Note] This is not necessary if you've already set OPENAI_API_KEY in previous steps.

from dotenv import load_dotenv

load_dotenv()
True

Let's setup ChatOpenAI with gpt-4o model.

from langchain_openai import ChatOpenAI

# Load the model
llm = ChatOpenAI(model_name="gpt-4o")

Implementing Check Token Usage

if you want to check token usage, you can use get_openai_callback function.

# callback to track it
from langchain_community.callbacks.manager import get_openai_callback

with get_openai_callback() as cb:
    result = llm.invoke("where is the capital of United States?")
    print(cb)
Tokens Used: 28
    	Prompt Tokens: 15
    		Prompt Tokens Cached: 0
    	Completion Tokens: 13
    		Reasoning Tokens: 0
    Successful Requests: 1
    Total Cost (USD): $0.00016749999999999998
# callback to track it
with get_openai_callback() as cb:
    result = llm.invoke("where is the capital of United States?")
    print(f"Total tokens used: \t\t{cb.total_tokens}")
    print(f"Tokens used in prompt: \t\t{cb.prompt_tokens}")
    print(f"Tokens used in completion: \t{cb.completion_tokens}")
    print(f"Cost: \t\t\t\t${cb.total_cost}")
Total tokens used: 		28
    Tokens used in prompt: 		15
    Tokens used in completion: 	13
    Cost: 				$0.00016749999999999998

Monitoring Token Usage

Token usage monitoring is crucial for managing costs and resources when using the OpenAI API. LangChain provides an easy way to track this through get_openai_callback().

In this section, we'll explore token usage monitoring through three key scenarios:

  1. Single Query Monitoring:

    • Track token usage for individual API calls

    • Distinguish between prompt and completion tokens

    • Calculate costs

  2. Multiple Queries Monitoring:

    • Track cumulative token usage across multiple API calls

    • Analyze total costs

Note: Token usage monitoring is currently only supported for OpenAI API.

# 1. Single Query Monitoring
print("1. Single Query Monitoring")
print("-" * 40)

with get_openai_callback() as cb:
    response = llm.invoke("What is the capital of France?")
    print(f"Response: {response.content}")
    print("-" * 40)
    print(f"Token Usage Statistics:")
    print(f"Prompt Tokens: \t\t{cb.prompt_tokens}")
    print(f"Completion Tokens: \t{cb.completion_tokens}")
    print(f"Total Tokens: \t\t{cb.total_tokens}")
    print(f"Cost: \t\t\t${cb.total_cost:.4f}\n")
1. Single Query Monitoring
    ----------------------------------------
    Response: The capital of France is Paris.
    ----------------------------------------
    Token Usage Statistics:
    Prompt Tokens: 		14
    Completion Tokens: 	8
    Total Tokens: 		22
    Cost: 			$0.0001
    
# 2. Multiple Queries Monitoring
print("2. Multiple Queries Monitoring")
print("-" * 40)

with get_openai_callback() as cb:
    # First query
    response1 = llm.invoke("What is Python?")
    # Second query
    response2 = llm.invoke("What is JavaScript?")

    print(f"Response 1: {response1.content[:100]}...")
    print("-" * 40)
    print(f"Response 2: {response2.content[:100]}...")
    print("-" * 40)
    print("Cumulative Statistics:")
    print(f"Total Prompt Tokens: \t\t{cb.prompt_tokens}")
    print(f"Total Completion Tokens: \t{cb.completion_tokens}")
    print(f"Total Tokens: \t\t\t{cb.total_tokens}")
    print(f"Total Cost: \t\t\t${cb.total_cost:.4f}\n")
2. Multiple Queries Monitoring
    ----------------------------------------
    Response 1: Python is a high-level, interpreted programming language known for its readability, simplicity, and ...
    ----------------------------------------
    Response 2: JavaScript is a high-level, dynamic, untyped, and interpreted programming language that is widely us...
    ----------------------------------------
    Cumulative Statistics:
    Total Prompt Tokens: 		23
    Total Completion Tokens: 	596
    Total Tokens: 			619
    Total Cost: 			$0.0060
    

Last updated