The StructuredOutputParser is a valuable tool for formatting Large Language Model (LLM) responses into dictionary structures, enabling the return of multiple fields as key/value pairs. hile Pydantic and JSON parsers offer robust capabilities, the StructuredOutputParser is particularly effective for less powerful models, such as local models with fewer parameters. It is especially beneficial for models with lower intelligence compared to advanced models like GPT or Claude. By utilizing the StructuredOutputParser, developers can maintain data integrity and consistency across various LLM applications, even when operating with models that have reduced parameter counts.
You can alternatively set OPENAI_API_KEY in .env file and load it.
[Note] This is not necessary if you've already set OPENAI_API_KEY in previous steps.
from dotenv import load_dotenvload_dotenv(override=True)
False
Implementing Structured Output Parser
Using ResponseSchema with StructuredOutputParser
Define a response schema using the ResponseSchema class to include the answer to the user's question and a description of the source (website) used.
Initialize StructuredOutputParser with response_schemas to structure the output according to the defined response schema.
[Note] When using local models, Pydantic parsers may frequently fail to work properly. In such cases, using StructuredOutputParser can be a good alternative solution.
from langchain.output_parsers import ResponseSchema, StructuredOutputParser# Response to the user's questionresponse_schemas = [ResponseSchema(name="answer", description="Answer to the user's question"),ResponseSchema( name="source", description="The `source` used to answer the user's question, which should be a `website URL`.", ),]# Initialize the structured output parser based on the response schemasoutput_parser = StructuredOutputParser.from_response_schemas(response_schemas)
Embedding Response Schemas into Prompts
Create a PromptTemplate to format user questions and embed parsing instructions for structured outputs.
from langchain_core.prompts import PromptTemplate# Parse the format instructions.format_instructions = output_parser.get_format_instructions()prompt =PromptTemplate(# Set up the template to answer the user's question as best as possible. template="answer the users question as best as possible.\n{format_instructions}\n{question}",# Use 'question' as the input variable. input_variables=["question"],# Use 'format_instructions' as a partial variable. partial_variables={"format_instructions": format_instructions},)
Integrating with ChatOpenAI and Running the Chain
Combine the PromptTemplate, ChatOpenAI model, and StructuredOutputParser into a chain. Finally, run the chain with a specific question to produce results.
from langchain_openai import ChatOpenAImodel =ChatOpenAI(temperature=0)# Initialize the ChatOpenAI modelchain = prompt | model | output_parser # Connect the prompt, model, and output parser# Ask the question, "What is the largest desert in the world?"chain.invoke({"question": "What is the largest desert in the world?"})
{'answer': 'The largest desert in the world is the Antarctic Desert.',
'source': 'https://www.worldatlas.com/articles/what-is-the-largest-desert-in-the-world.html'}
Using Streamed Outputs
Use the chain.stream method to receive a streaming response to the question, "How many players are on a soccer team?"
for s in chain.stream({"question": "How many players are on a soccer team?"}):# Stream the outputprint(s)
{'answer': 'A standard soccer team consists of 11 players on the field at a time.', 'source': 'https://www.fifa.com/who-we-are/news/what-are-the-rules-of-football-2040008'}