The StructuredOutputParser is a valuable tool for formatting Large Language Model (LLM) responses into dictionary structures, enabling the return of multiple fields as key/value pairs.
While Pydantic and JSON parsers offer robust capabilities, the StructuredOutputParser is particularly effective for less powerful models, such as local models with fewer parameters. It is especially beneficial for models with lower intelligence compared to advanced models like GPT or Claude.
By utilizing the StructuredOutputParser, developers can maintain data integrity and consistency across various LLM applications, even when operating with models that have reduced parameter counts.
Table of Contents
References
Environment Setup
[Note]
langchain-opentutorial is a package that provides a set of easy-to-use environment setup along with useful functions and utilities for tutorials.
You can alternatively setOPENAI_API_KEY in .env file and load it.
[Note] This is not necessary if you've already set OPENAI_API_KEY in previous steps.
from dotenv import load_dotenv
load_dotenv(override=True)
False
Implementing the StructuredOutputParser
Using ResponseSchema with StructuredOutputParser
Define a response schema using the ResponseSchema class to include the answer to the user's question and a description of the source (website) used.
Initialize StructuredOutputParser with response_schemas to structure the output according to the defined response schema.
[Note]
When using local models, Pydantic parsers may frequently fail to work properly. In such cases, using StructuredOutputParser can be a good alternative solution.
from langchain.output_parsers import ResponseSchema, StructuredOutputParser
# Response to the user's question
response_schemas = [
ResponseSchema(name="answer", description="Answer to the user's question"),
ResponseSchema(
name="source",
description="The `source` used to answer the user's question, which should be a `website URL`.",
),
]
# Initialize the structured output parser based on the response schemas
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)
Embedding Response Schemas into Prompts
Create a PromptTemplate to format user questions and embed parsing instructions for structured outputs.
from langchain_core.prompts import PromptTemplate
# Parse the format instructions.
format_instructions = output_parser.get_format_instructions()
prompt = PromptTemplate(
# Set up the template to answer the user's question as well as possible.
template="answer the user's question as well as possible.\n{format_instructions}\n{question}",
# Use 'question' as the input variable.
input_variables=["question"],
# Use 'format_instructions' as a partial variable.
partial_variables={"format_instructions": format_instructions},
)
Integrating with ChatOpenAI and Running the Chain
Combine the PromptTemplate , ChatOpenAI model , and StructuredOutputParser into a chain . Finally, run the chain with a specific question to produce results.
from langchain_openai import ChatOpenAI
model = ChatOpenAI(temperature=0) # Initialize the ChatOpenAI model
chain = prompt | model | output_parser # Connect the prompt, model, and output parser
# Ask the question, "What is the largest desert in the world?"
chain.invoke({"question": "What is the largest desert in the world?"})
{'answer': 'The largest desert in the world is the Antarctic Desert.',
'source': 'https://www.worldatlas.com/articles/what-is-the-largest-desert-in-the-world.html'}
Using Streamed Outputs
Use the chain.stream method to receive a streaming response to the question , "How many players are on a soccer team?"
for s in chain.stream({"question": "How many players are on a soccer team?"}):
# Stream the output
print(s)
{'answer': 'A standard soccer team consists of 11 players on the field at a time.', 'source': 'https://www.fifa.com/who-we-are/news/what-are-the-rules-of-football-2040008'}
Set up the environment. You may refer to for more details.