LangChain OpenTutorial
  • 🦜️🔗 The LangChain Open Tutorial for Everyone
  • 01-Basic
    • Getting Started on Windows
    • 02-Getting-Started-Mac
    • OpenAI API Key Generation and Testing Guide
    • LangSmith Tracking Setup
    • Using the OpenAI API (GPT-4o Multimodal)
    • Basic Example: Prompt+Model+OutputParser
    • LCEL Interface
    • Runnable
  • 02-Prompt
    • Prompt Template
    • Few-Shot Templates
    • LangChain Hub
    • Personal Prompts for LangChain
    • Prompt Caching
  • 03-OutputParser
    • PydanticOutputParser
    • PydanticOutputParser
    • CommaSeparatedListOutputParser
    • Structured Output Parser
    • JsonOutputParser
    • PandasDataFrameOutputParser
    • DatetimeOutputParser
    • EnumOutputParser
    • Output Fixing Parser
  • 04-Model
    • Using Various LLM Models
    • Chat Models
    • Caching
    • Caching VLLM
    • Model Serialization
    • Check Token Usage
    • Google Generative AI
    • Huggingface Endpoints
    • HuggingFace Local
    • HuggingFace Pipeline
    • ChatOllama
    • GPT4ALL
    • Video Q&A LLM (Gemini)
  • 05-Memory
    • ConversationBufferMemory
    • ConversationBufferWindowMemory
    • ConversationTokenBufferMemory
    • ConversationEntityMemory
    • ConversationKGMemory
    • ConversationSummaryMemory
    • VectorStoreRetrieverMemory
    • LCEL (Remembering Conversation History): Adding Memory
    • Memory Using SQLite
    • Conversation With History
  • 06-DocumentLoader
    • Document & Document Loader
    • PDF Loader
    • WebBaseLoader
    • CSV Loader
    • Excel File Loading in LangChain
    • Microsoft Word(doc, docx) With Langchain
    • Microsoft PowerPoint
    • TXT Loader
    • JSON
    • Arxiv Loader
    • UpstageDocumentParseLoader
    • LlamaParse
    • HWP (Hangeul) Loader
  • 07-TextSplitter
    • Character Text Splitter
    • 02. RecursiveCharacterTextSplitter
    • Text Splitting Methods in NLP
    • TokenTextSplitter
    • SemanticChunker
    • Split code with Langchain
    • MarkdownHeaderTextSplitter
    • HTMLHeaderTextSplitter
    • RecursiveJsonSplitter
  • 08-Embedding
    • OpenAI Embeddings
    • CacheBackedEmbeddings
    • HuggingFace Embeddings
    • Upstage
    • Ollama Embeddings With Langchain
    • LlamaCpp Embeddings With Langchain
    • GPT4ALL
    • Multimodal Embeddings With Langchain
  • 09-VectorStore
    • Vector Stores
    • Chroma
    • Faiss
    • Pinecone
    • Qdrant
    • Elasticsearch
    • MongoDB Atlas
    • PGVector
    • Neo4j
    • Weaviate
    • Faiss
    • {VectorStore Name}
  • 10-Retriever
    • VectorStore-backed Retriever
    • Contextual Compression Retriever
    • Ensemble Retriever
    • Long Context Reorder
    • Parent Document Retriever
    • MultiQueryRetriever
    • MultiVectorRetriever
    • Self-querying
    • TimeWeightedVectorStoreRetriever
    • TimeWeightedVectorStoreRetriever
    • Kiwi BM25 Retriever
    • Ensemble Retriever with Convex Combination (CC)
  • 11-Reranker
    • Cross Encoder Reranker
    • JinaReranker
    • FlashRank Reranker
  • 12-RAG
    • Understanding the basic structure of RAG
    • RAG Basic WebBaseLoader
    • Exploring RAG in LangChain
    • RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
    • Conversation-With-History
    • Translation
    • Multi Modal RAG
  • 13-LangChain-Expression-Language
    • RunnablePassthrough
    • Inspect Runnables
    • RunnableLambda
    • Routing
    • Runnable Parallel
    • Configure-Runtime-Chain-Components
    • Creating Runnable objects with chain decorator
    • RunnableWithMessageHistory
    • Generator
    • Binding
    • Fallbacks
    • RunnableRetry
    • WithListeners
    • How to stream runnables
  • 14-Chains
    • Summarization
    • SQL
    • Structured Output Chain
    • StructuredDataChat
  • 15-Agent
    • Tools
    • Bind Tools
    • Tool Calling Agent
    • Tool Calling Agent with More LLM Models
    • Iteration-human-in-the-loop
    • Agentic RAG
    • CSV/Excel Analysis Agent
    • Agent-with-Toolkits-File-Management
    • Make Report Using RAG, Web searching, Image generation Agent
    • TwoAgentDebateWithTools
    • React Agent
  • 16-Evaluations
    • Generate synthetic test dataset (with RAGAS)
    • Evaluation using RAGAS
    • HF-Upload
    • LangSmith-Dataset
    • LLM-as-Judge
    • Embedding-based Evaluator(embedding_distance)
    • LangSmith Custom LLM Evaluation
    • Heuristic Evaluation
    • Compare experiment evaluations
    • Summary Evaluators
    • Groundedness Evaluation
    • Pairwise Evaluation
    • LangSmith Repeat Evaluation
    • LangSmith Online Evaluation
    • LangFuse Online Evaluation
  • 17-LangGraph
    • 01-Core-Features
      • Understanding Common Python Syntax Used in LangGraph
      • Title
      • Building a Basic Chatbot with LangGraph
      • Building an Agent with LangGraph
      • Agent with Memory
      • LangGraph Streaming Outputs
      • Human-in-the-loop
      • LangGraph Manual State Update
      • Asking Humans for Help: Customizing State in LangGraph
      • DeleteMessages
      • DeleteMessages
      • LangGraph ToolNode
      • LangGraph ToolNode
      • Branch Creation for Parallel Node Execution
      • Conversation Summaries with LangGraph
      • Conversation Summaries with LangGraph
      • LangGrpah Subgraph
      • How to transform the input and output of a subgraph
      • LangGraph Streaming Mode
      • Errors
      • A Long-Term Memory Agent
    • 02-Structures
      • LangGraph-Building-Graphs
      • Naive RAG
      • Add Groundedness Check
      • Adding a Web Search Module
      • LangGraph-Add-Query-Rewrite
      • Agentic RAG
      • Adaptive RAG
      • Multi-Agent Structures (1)
      • Multi Agent Structures (2)
    • 03-Use-Cases
      • LangGraph Agent Simulation
      • Meta Prompt Generator based on User Requirements
      • CRAG: Corrective RAG
      • Plan-and-Execute
      • Multi Agent Collaboration Network
      • Multi Agent Collaboration Network
      • Multi-Agent Supervisor
      • 08-LangGraph-Hierarchical-Multi-Agent-Teams
      • 08-LangGraph-Hierarchical-Multi-Agent-Teams
      • SQL-Agent
      • 10-LangGraph-Research-Assistant
      • LangGraph Code Assistant
      • Deploy on LangGraph Cloud
      • Tree of Thoughts (ToT)
      • Ollama Deep Researcher (Deepseek-R1)
      • Functional API
      • Reflection in LangGraph
  • 19-Cookbook
    • 01-SQL
      • TextToSQL
      • SpeechToSQL
    • 02-RecommendationSystem
      • ResumeRecommendationReview
    • 03-GraphDB
      • Movie QA System with Graph Database
      • 05-TitanicQASystem
      • Real-Time GraphRAG QA
    • 04-GraphRAG
      • Academic Search System
      • Academic QA System with GraphRAG
    • 05-AIMemoryManagementSystem
      • ConversationMemoryManagementSystem
    • 06-Multimodal
      • Multimodal RAG
      • Shopping QnA
    • 07-Agent
      • 14-MoARAG
      • CoT Based Smart Web Search
      • 16-MultiAgentShoppingMallSystem
      • Agent-Based Dynamic Slot Filling
      • Code Debugging System
      • New Employee Onboarding Chatbot
      • 20-LangGraphStudio-MultiAgent
      • Multi-Agent Scheduler System
    • 08-Serving
      • FastAPI Serving
      • Sending Requests to Remote Graph Server
      • Building a Agent API with LangServe: Integrating Currency Exchange and Trip Planning
    • 08-SyntheticDataset
      • Synthetic Dataset Generation using RAG
    • 09-Monitoring
      • Langfuse Selfhosting
Powered by GitBook
On this page
  • Overview
  • Main Method
  • Table of Contents
  • References
  • Environment Setup
  • Using the PydanticOutputParser
  • Creating a Chain Including the PydanticOutputParser
  • with_structured_output()
  1. 03-OutputParser

PydanticOutputParser

PreviousPydanticOutputParserNextCommaSeparatedListOutputParser

Last updated 28 days ago

  • Author:

  • Peer Review: ,

  • Proofread :

  • This is a part of

Overview

This tutorial covers how to perform PydanticOutputParser using pydantic.

The PydanticOutputParser is a class that helps transform the output of a language model into structured information . This class can provide the information you need in a clear and organized form instead of a simple text response.

By utilizing this class, you can transform the output of your language model to fit a specific data model, making it easier to process and utilize the information

Main Method

A PydanticOutputParser primarily requires the implementation of two core methods.

  1. get_format_instructions(): Provide instructions that define the format of the information that the language model should output. For example, you can return instructions as a string that describes the fields of data that the language model should output and how they should be formatted. These instructions are very important for the language model to structure the output and transform it to fit your specific data model.

  2. parse(): Takes the output of the language model (assumed to be a string) and analyzes and transforms it into a specific structure. Use a tool like Pydantic to validate the input string against a predefined schema and transform it into a data structure that follows that schema.

Table of Contents

References


Environment Setup

[Note]

  • langchain-opentutorial is a package that provides a set of easy-to-use environment setups, useful functions, and utilities for tutorials.

%%capture --no-stderr
%pip install langchain-opentutorial
# Install required packages
from langchain_opentutorial import package

package.install(
    [
        "langsmith",
        "langchain",
        "langchain_core",
        "langchain_openai",
        "pydantic",
        "itertools",
    ],
    verbose=False,
    upgrade=False,
)
# Set environment variables
from langchain_opentutorial import set_env

set_env(
    {
        "OPENAI_API_KEY": "",
        "LANGCHAIN_API_KEY": "",
        "LANGCHAIN_TRACING_V2": "true",
        "LANGCHAIN_ENDPOINT": "https://api.smith.langchain.com",
        "LANGCHAIN_PROJECT": "01-PydanticOuputParser",
    }
)
Environment variables have been set successfully.

Environment variables have been set successfully. You can alternatively set API keys, such as OPENAI_API_KEY in a .env file and load them.

[Note] This is not necessary if you've already set the required API keys in previous steps.

# Load API keys from .env file
from dotenv import load_dotenv

load_dotenv(override=True)
True
# Import the required libraries
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field


llm = ChatOpenAI(temperature=0, model_name="gpt-4o-mini")

Below is an example of an email conversation stored in the variable email_conversation .

email_conversation = """
From: John (John@bikecorporation.me)
To: Kim (Kim@teddyinternational.me)
Subject: “ZENESIS” bike distribution cooperation and meeting schedule proposal
Dear Mr. Kim,

I am John, Senior Executive Director at Bike Corporation. I recently learned about your new bicycle model, "ZENESIS," through your press release. Bike Corporation is a company that leads innovation and quality in the field of bicycle manufacturing and distribution, with long-time experience and expertise in this field.

We would like to request a detailed brochure for the ZENESIS model. In particular, we need information on technical specifications, battery performance, and design aspects. This information will help us further refine our proposed distribution strategy and marketing plan.

Additionally, to discuss the possibilities for collaboration in more detail, I propose a meeting next Tuesday, January 15th, at 10:00 AM. Would it be possible to meet at your office to have this discussion?

Thank you.

Best regards,
John
Senior Executive Director
Bike Corporation
"""

Let's create a chain that utilizes the basic StrOutputParser.

from itertools import chain
from langchain_core.prompts import PromptTemplate
from langchain_core.messages import AIMessageChunk
from langchain_core.output_parsers import StrOutputParser

prompt = PromptTemplate.from_template(
    "Please extract the important parts of the following email.\n\n{email_conversation}"
)

llm = ChatOpenAI(temperature=0, model_name="gpt-4o-mini")

chain = prompt | llm | StrOutputParser()

answer = chain.stream({"email_conversation": email_conversation})


#  A function for real-time output (streaming)
def stream_response(response, return_output=False):
    """
    Streams the response from the AI model, processing and printing each chunk.

    This function iterates over each item in the 'response' iterable. If an item is an instance of AIMessageChunk, it extracts and prints the content.
    If the item is a string, it prints the string directly.
    Optionally, the function can return the concatenated string of all response chunks.

    Args:
    - response (iterable): An iterable of response chunks, which can be AIMessageChunk objects or strings.
    - return_output (bool, optional): If True, the function returns the concatenated response string. The default is False.

    Returns:
    - str: If ```return_output``` is True, the concatenated response string. Otherwise, nothing is returned.
    """
    answer = ""
    for token in response:
        if isinstance(token, AIMessageChunk):
            answer += token.content
            print(token.content, end="", flush=True)
        elif isinstance(token, str):
            answer += token
            print(token, end="", flush=True)
    if return_output:
        return answer


output = stream_response(answer, return_output=True)
**Important Parts of the Email:**
    
    - **Sender:** John (Senior Executive Director, Bike Corporation)
    - **Recipient:** Kim (Teddy International)
    - **Subject:** ZENESIS bike distribution cooperation and meeting schedule proposal
    - **Request:** Detailed brochure for the ZENESIS model, specifically information on:
      - Technical specifications
      - Battery performance
      - Design aspects
    - **Purpose:** To refine distribution strategy and marketing plan for ZENESIS.
    - **Proposed Meeting:** 
      - Date: Tuesday, January 15th
      - Time: 10:00 AM
      - Location: Kim's office
    - **Closing:** Thank you and best regards.
answer = chain.invoke({"email_conversation": email_conversation})
print(answer)
**Important Parts of the Email:**
    
    - **Sender:** John (Senior Executive Director, Bike Corporation)
    - **Recipient:** Kim (Teddy International)
    - **Subject:** ZENESIS bike distribution cooperation and meeting schedule proposal
    - **Request:** Detailed brochure for the ZENESIS model, specifically information on:
      - Technical specifications
      - Battery performance
      - Design aspects
    - **Purpose:** To refine distribution strategy and marketing plan for ZENESIS.
    - **Proposed Meeting:** 
      - Date: Tuesday, January 15th
      - Time: 10:00 AM
      - Location: Kim's office
    - **Closing:** Thank you and best regards.
print(output)
**Important Parts of the Email:**
    
    - **Sender:** John (Senior Executive Director, Bike Corporation)
    - **Recipient:** Kim (Teddy International)
    - **Subject:** ZENESIS bike distribution cooperation and meeting schedule proposal
    - **Request:** Detailed brochure for the ZENESIS model, specifically information on:
      - Technical specifications
      - Battery performance
      - Design aspects
    - **Purpose:** To refine distribution strategy and marketing plan for ZENESIS.
    - **Proposed Meeting:** 
      - Date: Tuesday, January 15th
      - Time: 10:00 AM
      - Location: Kim's office
    - **Closing:** Thank you and best regards.

Using the PydanticOutputParser

When provided with email content like the one above, we can parse the email information using the class defined in the Pydantic style below.

For reference, the description inside the Field serves as guidance for extracting key information from text-based responses. LLMs rely on this description to extract the required information. Therefore, it is crucial that this description is accurate and clear.

class EmailSummary(BaseModel):
    person: str = Field(description="The sender of the email")
    email: str = Field(description="The email address of the sender")
    subject: str = Field(description="The subject of the email")
    summary: str = Field(description="A summary of the email content")
    date: str = Field(
        description="The meeting date and time mentioned in the email content"
    )


# Create an output parser using the PydanticOutputParser
parser = PydanticOutputParser(pydantic_object=EmailSummary)
# Print the instructions.
print(parser.get_format_instructions())
The output should be formatted as a JSON instance that conforms to the JSON schema below.
    
    As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
    the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.
    
    Here is the output schema:
    ```
    {"properties": {"person": {"description": "The sender of the email", "title": "Person", "type": "string"}, "email": {"description": "The email address of the sender", "title": "Email", "type": "string"}, "subject": {"description": "The subject of the email", "title": "Subject", "type": "string"}, "summary": {"description": "A summary of the email content", "title": "Summary", "type": "string"}, "date": {"description": "The meeting date and time mentioned in the email content", "title": "Date", "type": "string"}}, "required": ["person", "email", "subject", "summary", "date"]}
    ```

Let's define a prompt including the following variables:

  1. question: Receives the user's question.

  2. email_conversation: Inputs the content of the email conversation.

  3. format: Specifies the format.

prompt = PromptTemplate.from_template(
    """
You are a helpful assistant. 

QUESTION:
{question}

EMAIL CONVERSATION:
{email_conversation}

FORMAT:
{format}
"""
)


# Apply the partial method and fill in the ```format``` variable in advance
prompt = prompt.partial(format=parser.get_format_instructions())

Next, create a chain.

chain = prompt | llm

Execute the chain and review the results.

# Execute the chain and print the result.
response = chain.stream(
    {
        "email_conversation": email_conversation,
        "question": "Extract the main content of the email.",
    }
)

# The result is provided in JSON format.
output = stream_response(response, return_output=True)
```json
    {
      "person": "John",
      "email": "John@bikecorporation.me",
      "subject": "ZENESIS bike distribution cooperation and meeting schedule proposal",
      "summary": "John from Bike Corporation requests a detailed brochure for the ZENESIS bike model, including technical specifications, battery performance, and design aspects. He also proposes a meeting on January 15th at 10:00 AM to discuss collaboration possibilities.",
      "date": "January 15th, 10:00 AM"
    }
    ```

Finally, use the parser to parse the results and convert them into an EmailSummary object.

structured_output = parser.parse(output)
print(structured_output)
person='John' email='John@bikecorporation.me' subject='ZENESIS bike distribution cooperation and meeting schedule proposal' summary='John from Bike Corporation requests a detailed brochure for the ZENESIS bike model, including technical specifications, battery performance, and design aspects. He also proposes a meeting on January 15th at 10:00 AM to discuss collaboration possibilities.' date='January 15th, 10:00 AM'

Creating a Chain Including the PydanticOutputParser

You can generate the output as a Pydantic object that you define.

# Reconstruct the entire chain by adding an output parser
chain = prompt | llm | parser
# Execute the chain and print the results
response = chain.invoke(
    {
        "email_conversation": email_conversation,
        "question": "Extract the main content of the email.",
    }
)

# The results are output in the form of an EmailSummary object
print(response)
person='John' email='John@bikecorporation.me' subject='ZENESIS bike distribution cooperation and meeting schedule proposal' summary='John from Bike Corporation requests a detailed brochure for the ZENESIS bike model, including technical specifications, battery performance, and design aspects. He also proposes a meeting on January 15th at 10:00 AM to discuss collaboration possibilities.' date='January 15th, 10:00 AM'

with_structured_output()

By adding an output parser using .with_structured_output(Pydantic), you can convert the output into a Pydantic object.

llm_with_structured = ChatOpenAI(
    temperature=0, model_name="gpt-4o"
).with_structured_output(EmailSummary)
# Call the ```invoke()``` method and print the result
answer = llm_with_structured.invoke(email_conversation)
answer
EmailSummary(person='John', email='John@bikecorporation.me', subject='“ZENESIS” bike distribution cooperation and meeting schedule proposal', summary='John, Senior Executive Director at Bike Corporation, is interested in the ZENESIS bicycle model and requests a detailed brochure including technical specifications, battery performance, and design aspects. He proposes a meeting to discuss potential collaboration.', date='Tuesday, January 15th, at 10:00 AM')

Note

One thing to note is that the .with_structured_output() method does not support the stream() method.

Set up the environment. You may refer to for more details.

You can check out the for more details.

Pydantic Official Document
Environment Setup
langchain-opentutorial
Jaeho Kim
Donghak Lee
frimer
BokyungisaGod
LangChain Open Tutorial
Overview
Environment Setup
A Chain with the Basic StrOutputParser
Using the PydanticOutputParser