You can alternatively set OPENAI_API_KEY in .env file and load it.
[Note] This is not necessary if you've already set OPENAI_API_KEY in previous steps.
from dotenv import load_dotenv
load_dotenv()
True
Let's setup ChatOpenAI with gpt-4o model.
from langchain_openai import ChatOpenAI
# Load the model
llm = ChatOpenAI(model_name="gpt-4o")
Implementing Check Token Usage
if you want to check token usage, you can use get_openai_callback function.
# callback to track it
from langchain_community.callbacks.manager import get_openai_callback
with get_openai_callback() as cb:
result = llm.invoke("where is the capital of United States?")
print(cb)
# callback to track it
with get_openai_callback() as cb:
result = llm.invoke("where is the capital of United States?")
print(f"Total tokens used: \t\t{cb.total_tokens}")
print(f"Tokens used in prompt: \t\t{cb.prompt_tokens}")
print(f"Tokens used in completion: \t{cb.completion_tokens}")
print(f"Cost: \t\t\t\t${cb.total_cost}")
Total tokens used: 28
Tokens used in prompt: 15
Tokens used in completion: 13
Cost: $0.00016749999999999998
Monitoring Token Usage
Token usage monitoring is crucial for managing costs and resources when using the OpenAI API. LangChain provides an easy way to track this through get_openai_callback().
In this section, we'll explore token usage monitoring through three key scenarios:
Single Query Monitoring:
Track token usage for individual API calls
Distinguish between prompt and completion tokens
Calculate costs
Multiple Queries Monitoring:
Track cumulative token usage across multiple API calls
Analyze total costs
Note: Token usage monitoring is currently only supported for OpenAI API.
# 1. Single Query Monitoring
print("1. Single Query Monitoring")
print("-" * 40)
with get_openai_callback() as cb:
response = llm.invoke("What is the capital of France?")
print(f"Response: {response.content}")
print("-" * 40)
print(f"Token Usage Statistics:")
print(f"Prompt Tokens: \t\t{cb.prompt_tokens}")
print(f"Completion Tokens: \t{cb.completion_tokens}")
print(f"Total Tokens: \t\t{cb.total_tokens}")
print(f"Cost: \t\t\t${cb.total_cost:.4f}\n")
1. Single Query Monitoring
----------------------------------------
Response: The capital of France is Paris.
----------------------------------------
Token Usage Statistics:
Prompt Tokens: 14
Completion Tokens: 8
Total Tokens: 22
Cost: $0.0001
# 2. Multiple Queries Monitoring
print("2. Multiple Queries Monitoring")
print("-" * 40)
with get_openai_callback() as cb:
# First query
response1 = llm.invoke("What is Python?")
# Second query
response2 = llm.invoke("What is JavaScript?")
print(f"Response 1: {response1.content[:100]}...")
print("-" * 40)
print(f"Response 2: {response2.content[:100]}...")
print("-" * 40)
print("Cumulative Statistics:")
print(f"Total Prompt Tokens: \t\t{cb.prompt_tokens}")
print(f"Total Completion Tokens: \t{cb.completion_tokens}")
print(f"Total Tokens: \t\t\t{cb.total_tokens}")
print(f"Total Cost: \t\t\t${cb.total_cost:.4f}\n")
2. Multiple Queries Monitoring
----------------------------------------
Response 1: Python is a high-level, interpreted programming language known for its readability, simplicity, and ...
----------------------------------------
Response 2: JavaScript is a high-level, dynamic, untyped, and interpreted programming language that is widely us...
----------------------------------------
Cumulative Statistics:
Total Prompt Tokens: 23
Total Completion Tokens: 596
Total Tokens: 619
Total Cost: $0.0060