This tutorial covers how to perform PydanticOutputParser using pydantic.
The PydanticOutputParser is a class that helps transform the output of a language model into structured information. This class can provide the information you need in a clear and organized form instead of a simple text response.
By utilizing this class, you transform the output of your language model to fit a specific data model, making it easier to process and utilize the information.
Main Method
A PydanticOutputParser primarily requires the implementation of two core methods.
get_format_instructions(): Provide instructions that define the format of the information that the language model should output. For example, you can return instructions as a string that describes the fields of data that the language model should output and how they should be formatted. These instructions are very important for the language model to structure the output and transform it to fit your specific data model.
parse(): Takes the output of the language model (assumed to be a string) and analyzes and transforms it into a specific structure. Use a tool like Pydantic to validate the input string against a predefined schema and transform it into a data structure that follows that schema.
Below is an example of an email conversation stored in the variable email_conversation .
email_conversation ="""From: John (John@bikecorporation.me)To: Kim (Kim@teddyinternational.me)Subject: “ZENESIS” bike distribution cooperation and meeting schedule proposalDear Mr. Kim,I am John, Senior Executive Director at Bike Corporation. I recently learned about your new bicycle model, "ZENESIS," through your press release. Bike Corporation is a company that leads innovation and quality in the field of bicycle manufacturing and distribution, with long-time experience and expertise in this field.We would like to request a detailed brochure for the ZENESIS model. In particular, we need information on technical specifications, battery performance, and design aspects. This information will help us further refine our proposed distribution strategy and marketing plan.Additionally, to discuss the possibilities for collaboration in more detail, I propose a meeting next Tuesday, January 15th, at 10:00 AM. Would it be possible to meet at your office to have this discussion?Thank you.Best regards,JohnSenior Executive DirectorBike Corporation"""
Example of not using an output parser(PydanticOutputParser).
from itertools import chainfrom langchain_core.prompts import PromptTemplatefrom langchain_core.messages import AIMessageChunkfrom langchain_core.output_parsers import StrOutputParserprompt = PromptTemplate.from_template("Please extract the important parts of the following email.\n\n{email_conversation}")llm =ChatOpenAI(temperature=0, model_name="gpt-4o-mini")chain = prompt | llm |StrOutputParser()answer = chain.stream({"email_conversation": email_conversation})# A function for real-time output (streaming)defstream_response(response,return_output=False):""" Streams the response from the AI model, processing and printing each chunk. This function iterates over each item in the 'response' iterable. If an item is an instance of AIMessageChunk, it extracts and prints the content. If the item is a string, it prints the string directly. Optionally, the function can return the concatenated string of all response chunks. Args: - response (iterable): An iterable of response chunks, which can be AIMessageChunk objects or strings. - return_output (bool, optional): If True, the function returns the concatenated response string. The default is False. Returns: - str: If `return_output` is True, the concatenated response string. Otherwise, nothing is returned. """ answer =""for token in response:ifisinstance(token, AIMessageChunk): answer += token.contentprint(token.content, end="", flush=True)elifisinstance(token, str): answer += tokenprint(token, end="", flush=True)if return_output:return answeroutput =stream_response(answer, return_output=True)
**Important Parts of the Email:**
- **Sender:** John (Senior Executive Director, Bike Corporation)
- **Recipient:** Kim (Teddy International)
- **Subject:** ZENESIS bike distribution cooperation and meeting schedule proposal
- **Request:** Detailed brochure for the ZENESIS model, specifically information on:
- Technical specifications
- Battery performance
- Design aspects
- **Purpose:** To refine distribution strategy and marketing plan for ZENESIS.
- **Proposed Meeting:**
- Date: Tuesday, January 15th
- Time: 10:00 AM
- Location: Kim's office
- **Closing:** Thank you and best regards.
**Important Parts of the Email:**
- **Sender:** John (Senior Executive Director, Bike Corporation)
- **Recipient:** Kim (Teddy International)
- **Subject:** ZENESIS bike distribution cooperation and meeting schedule proposal
- **Request:** Detailed brochure for the ZENESIS model, specifically information on:
- Technical specifications
- Battery performance
- Design aspects
- **Purpose:** To refine distribution strategy and marketing plan for ZENESIS.
- **Proposed Meeting:**
- Date: Tuesday, January 15th
- Time: 10:00 AM
- Location: Kim's office
- **Closing:** Thank you and best regards.
print(output)
**Important Parts of the Email:**
- **Sender:** John (Senior Executive Director, Bike Corporation)
- **Recipient:** Kim (Teddy International)
- **Subject:** ZENESIS bike distribution cooperation and meeting schedule proposal
- **Request:** Detailed brochure for the ZENESIS model, specifically information on:
- Technical specifications
- Battery performance
- Design aspects
- **Purpose:** To refine distribution strategy and marketing plan for ZENESIS.
- **Proposed Meeting:**
- Date: Tuesday, January 15th
- Time: 10:00 AM
- Location: Kim's office
- **Closing:** Thank you and best regards.
Use_PydanticOutputParser
When provided with email content like the one above, we will parse the email information using the class defined in the Pydantic style below.
For reference, the description inside the Field serves as guidance for extracting key information from text-based responses. LLMs rely on this description to extract the required information. Therefore, it is crucial that this description is accurate and clear.
classEmailSummary(BaseModel): person:str=Field(description="The sender of the email") email:str=Field(description="The email address of the sender") subject:str=Field(description="The subject of the email") summary:str=Field(description="A summary of the email content") date:str=Field( description="The meeting date and time mentioned in the email content" )# Create PydanticOutputParserparser =PydanticOutputParser(pydantic_object=EmailSummary)
# Print the instruction.print(parser.get_format_instructions())
The output should be formatted as a JSON instance that conforms to the JSON schema below.
As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.
Here is the output schema:
```
{"properties": {"person": {"description": "The sender of the email", "title": "Person", "type": "string"}, "email": {"description": "The email address of the sender", "title": "Email", "type": "string"}, "subject": {"description": "The subject of the email", "title": "Subject", "type": "string"}, "summary": {"description": "A summary of the email content", "title": "Summary", "type": "string"}, "date": {"description": "The meeting date and time mentioned in the email content", "title": "Date", "type": "string"}}, "required": ["person", "email", "subject", "summary", "date"]}
```
Defining the prompt:
question: Receives the user's question.
email_conversation: Inputs the content of the email conversation.
format: Specifies the format.
prompt = PromptTemplate.from_template("""You are a helpful assistant. QUESTION:{question}EMAIL CONVERSATION:{email_conversation}FORMAT:{format}""")# Add partial formatting of PydanticOutputParser to formatprompt = prompt.partial(format=parser.get_format_instructions())
Next, create a Chain.
# Create a chain.chain = prompt | llm
Execute the chain and review the results.
# Execute the chain and print the result.response = chain.stream( {"email_conversation": email_conversation,"question": "Extract the main content of the email.", })# The result is provided in JSON format.output =stream_response(response, return_output=True)
```json
{
"person": "John",
"email": "John@bikecorporation.me",
"subject": "ZENESIS bike distribution cooperation and meeting schedule proposal",
"summary": "John from Bike Corporation requests a detailed brochure for the ZENESIS bike model, including technical specifications, battery performance, and design aspects. He also proposes a meeting on January 15th at 10:00 AM to discuss collaboration possibilities.",
"date": "January 15th, 10:00 AM"
}
```
Finally, use the parser to parse the results and convert them into an EmailSummary object.
# Parse the results using PydanticOutputParser.structured_output = parser.parse(output)print(structured_output)
person='John' email='John@bikecorporation.me' subject='ZENESIS bike distribution cooperation and meeting schedule proposal' summary='John from Bike Corporation requests a detailed brochure for the ZENESIS bike model, including technical specifications, battery performance, and design aspects. He also proposes a meeting on January 15th at 10:00 AM to discuss collaboration possibilities.' date='January 15th, 10:00 AM'
Create chain with parser
You can generate the output as a Pydantic object that you define.
# Reconstruct the entire chain by adding an output parser.chain = prompt | llm | parser
# Execute the chain and print the results.response = chain.invoke( {"email_conversation": email_conversation,"question": "Extract the main content of the email.", })# The results are output in the form of an EmailSummary object.print(response)
person='John' email='John@bikecorporation.me' subject='ZENESIS bike distribution cooperation and meeting schedule proposal' summary='John from Bike Corporation requests a detailed brochure for the ZENESIS bike model, including technical specifications, battery performance, and design aspects. He also proposes a meeting on January 15th at 10:00 AM to discuss collaboration possibilities.' date='January 15th, 10:00 AM'
with_structured_output()
By using .with_structured_output(Pydantic), you can add an output parser and convert the output into a Pydantic object.
# Call the `invoke()` function to print the result.answer = llm_with_structured.invoke(email_conversation)answer
EmailSummary(person='John', email='John@bikecorporation.me', subject='“ZENESIS” bike distribution cooperation and meeting schedule proposal', summary='John, Senior Executive Director at Bike Corporation, is interested in the ZENESIS bicycle model and requests a detailed brochure including technical specifications, battery performance, and design aspects. He proposes a meeting to discuss potential collaboration.', date='Tuesday, January 15th, at 10:00 AM')
Note
One thing to note is that the .with_structured_output() function does not support the stream() function.