Excel File Loading in LangChain
Last updated
Last updated
Author: Hwayoung Cha
Design:
Peer Review :
This is a part of LangChain Open Tutorial
This tutorial covers the process of loading and handling Microsoft Excel
files in LangChain
.
It focuses on two primary methods: UnstructuredExcelLoader
for raw text extraction and DataFrameLoader
for structured data processing.
The guide aims to help developers effectively integrate Excel data into their LangChain
projects, covering both basic and advanced usage scenarios.
Set up the environment. You may refer to Environment Setup for more details.
[Note]
langchain-opentutorial
is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials.
You can checkout the langchain-opentutorial
for more details.
UnstructuredExcelLoader
is used to load Microsoft Excel
files.
This loader works with both .xlsx
and .xls
files.
When the loader is used in "elements"
mode, an HTML representation of the Excel file is provided under the text_as_html
key in the document metadata.
This confirms that one document has been loaded.
The page_content
contains the data from each row, while the text_as_html
in the metadata
stores the data in HTML format.
Similar to CSV files, we can load Excel files by using the read_excel()
function to create a DataFrame, and then load it.