PandasDataFrameOutputParser

Open in ColabOpen in GitHub

Overview

This tutorial handles how LLM output could be controlled as pd.DataFrame format.

Pandas is a package, useful in a format of table (i.e. tabular data), widely used by data scientist. It will help you explore, clean, and process the data.

To learn more about pd.DataFrame and its capabilities, visit one of Pandas official tutorial,10 minutes to pandas.

PandasDataFrameOutputParser parses an output using Pandas DataFrame format, according to the official API doc

  • This can be utilized both as forming of structured output to LLM query in a string and as a forming of structured input for LLM query

    • As an output, you can further look at the usage of pydantic in 01-PydanticOutputParser and elsewhere in this learning guide.

    • As an input, pd.DataFrame dataset can be utilized, to make LLM interact with the data

Objective of this tutorial

  • Know where PandasDataFrameOutputParser is used to interact with pd.DataFrame

Data Acqusition

  • Access to figshare.com where you can discover the outputs of the academic research from csv to pdfs.

Table of Contents

References

Internal Reference

Environment Setup

Set up the environment. You may refer to Environment Setup for more details.

[Note]

  • langchain-opentutorial is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials.

  • You can checkout the langchain-opentutorial for more details.

You can alternatively set API keys such as OPENAI_API_KEY in a .env file and load them.

[Note] This is not necessary if you've already set the required API keys in previous steps.

Without model use

  • The format_parser_output function is used to convert parser output to dictionary format and format the output.

  • When input is pd.DataFrame or file.

Activity Period
Activity Period Start Date
Operating Airline
Operating Airline IATA Code
Published Airline
Published Airline IATA Code
GEO Summary
GEO Region
Activity Type Code
Price Category Code
Terminal
Boarding Area
Passenger Count
data_as_of
data_loaded_at

0

199907

1999/07/01

ATA Airlines

TZ

ATA Airlines

TZ

Domestic

US

Deplaned

Low Fare

Terminal 1

B

31432

2023/11/20 07:01:34 AM

2023/11/20 07:02:25 AM

1

199907

1999/07/01

ATA Airlines

TZ

ATA Airlines

TZ

Domestic

US

Enplaned

Low Fare

Terminal 1

B

31353

2023/11/20 07:01:34 AM

2023/11/20 07:02:25 AM

2

199907

1999/07/01

ATA Airlines

TZ

ATA Airlines

TZ

Domestic

US

Thru / Transit

Low Fare

Terminal 1

B

2518

2023/11/20 07:01:34 AM

2023/11/20 07:02:25 AM

3

199907

1999/07/01

Aeroflot Russian International Airlines

NaN

Aeroflot Russian International Airlines

NaN

International

Europe

Deplaned

Other

Terminal 2

D

1324

2023/11/20 07:01:34 AM

2023/11/20 07:02:25 AM

4

199907

1999/07/01

Aeroflot Russian International Airlines

NaN

Aeroflot Russian International Airlines

NaN

International

Europe

Enplaned

Other

Terminal 2

D

1198

2023/11/20 07:01:34 AM

2023/11/20 07:02:25 AM

  • Example of looking up a value for a column.

  • Example of a row lookup.

  • Further Operations on Column

Next Example is using value_counts()

Activity Period
Activity Period Start Date
Operating Airline
Operating Airline IATA Code
Published Airline
Published Airline IATA Code
GEO Summary
GEO Region
Activity Type Code
Price Category Code
Terminal
Boarding Area
Passenger Count
data_as_of
data_loaded_at

0

199907

1999/07/01

ATA Airlines

TZ

ATA Airlines

TZ

Domestic

US

Deplaned

Low Fare

Terminal 1

B

31432

2023/11/20 07:01:34 AM

2023/11/20 07:02:25 AM

1

199907

1999/07/01

ATA Airlines

TZ

ATA Airlines

TZ

Domestic

US

Enplaned

Low Fare

Terminal 1

B

31353

2023/11/20 07:01:34 AM

2023/11/20 07:02:25 AM

2

199907

1999/07/01

ATA Airlines

TZ

ATA Airlines

TZ

Domestic

US

Thru / Transit

Low Fare

Terminal 1

B

2518

2023/11/20 07:01:34 AM

2023/11/20 07:02:25 AM

3

199907

1999/07/01

Aeroflot Russian International Airlines

NaN

Aeroflot Russian International Airlines

NaN

International

Europe

Deplaned

Other

Terminal 2

D

1324

2023/11/20 07:01:34 AM

2023/11/20 07:02:25 AM

4

199907

1999/07/01

Aeroflot Russian International Airlines

NaN

Aeroflot Russian International Airlines

NaN

International

Europe

Enplaned

Other

Terminal 2

D

1198

2023/11/20 07:01:34 AM

2023/11/20 07:02:25 AM

  • Reflections

    • Did not use the model

With model

Last updated