The CommaSeparatedListOutputParser is a specialized output parser in LangChain designed for generating structured outputs in the form of comma-separated lists.
It simplifies the process of extracting and presenting data in a clear and concise list format, making it particularly useful for organizing information such as data points, names, items, or other structured values. By leveraging this parser, users can enhance data clarity, ensure consistent formatting, and improve workflow efficiency, especially in applications where structured outputs are essential.
This tutorial demonstrates how to use the CommaSeparatedListOutputParser to:
Set up and initialize the parser for generating comma-separated lists
Integrate it with a prompt template and language model
Process structured outputs iteratively using streaming mechanisms
You can alternatively set OPENAI_API_KEY in .env file and load it.
[Note] This is not necessary if you've already set OPENAI_API_KEY in previous steps.
from dotenv import load_dotenv
load_dotenv()
True
Implementing the Comma-Separated List Output Parser
If you need to generate outputs in the form of a comma-separated list, the CommaSeparatedListOutputParser from LangChain simplifies the process. Below is a step-by-step implementation:
Importing Required Modules
Start by importing the necessary modules and initializing the CommaSeparatedListOutputParser. Retrieve the formatting instructions from the parser to guide the output structure.
from langchain_core.output_parsers import CommaSeparatedListOutputParser
# Initialize the output parser
output_parser = CommaSeparatedListOutputParser()
# Retrieve format instructions for the output parser
format_instructions = output_parser.get_format_instructions()
print(format_instructions)
Your response should be a list of comma separated values, eg: `foo, bar, baz` or `foo,bar,baz`
Creating the Prompt Template
Define a PromptTemplate that dynamically generates a list of items. The placeholder subject will be replaced with the desired topic during execution.
from langchain_core.prompts import PromptTemplate
# Define the prompt template
prompt = PromptTemplate(
template="List five {subject}.\n{format_instructions}",
input_variables=["subject"], # 'subject' will be dynamically replaced
partial_variables={
"format_instructions": format_instructions
}, # Use parser's format instructions
)
print(prompt)
input_variables=['subject'] input_types={} partial_variables={'format_instructions': 'Your response should be a list of comma separated values, eg: `foo, bar, baz` or `foo,bar,baz`'} template='List five {subject}.\n{format_instructions}'
Integrating with ChatOpenAI and Running the Chain
Combine the PromptTemplate, ChatOpenAI model, and CommaSeparatedListOutputParser into a chain. Finally, run the chain with a specific subject to produce results.
from langchain_openai import ChatOpenAI
# Initialize the ChatOpenAI model
model = ChatOpenAI(temperature=0)
# Combine the prompt, model, and output parser into a chain
chain = prompt | model | output_parser
# Run the chain with a specific subject
result = chain.invoke({"subject": "famous landmarks in South Korea"})
print(result)
['Gyeongbokgung Palace', 'N Seoul Tower', 'Bukchon Hanok Village', 'Seongsan Ilchulbong Peak', 'Haeundae Beach']
Accessing Data with Python Indexing
Since the CommaSeparatedListOutputParser automatically formats the output as a Python list, you can easily access individual elements using indexing.
# Accessing specific elements using Python indexing
print("First Landmark:", result[0])
print("Second Landmark:", result[1])
print("Last Landmark:", result[-1])
First Landmark: Gyeongbokgung Palace
Second Landmark: N Seoul Tower
Last Landmark: Haeundae Beach
Using Streamed Outputs
For larger outputs or real-time feedback, you can process the results using the stream method. This allows you to handle data piece by piece as it is generated.
# Iterate through the streamed output for a subject
for output in chain.stream({"subject": "famous landmarks in South Korea"}):
print(output)