Excel File Loading in LangChain
Author: Hwayoung Cha
Peer Review :
Proofread : Youngjun cho
This is a part of LangChain Open Tutorial
Overview
This tutorial covers the process of loading and handling Microsoft Excel files in LangChain .
It focuses on two primary methods: UnstructuredExcelLoader for raw text extraction and DataFrameLoader for structured data processing.
The guide aims to help developers effectively integrate Excel data into their LangChain projects, covering both basic and advanced usage scenarios.
Table of Contents
Environment Setup
Set up the environment. You may refer to Environment Setup for more details.
[Note]
langchain-opentutorialis a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials.You can checkout the
langchain-opentutorialfor more details.
UnstructuredExcelLoader
UnstructuredExcelLoaderUnstructuredExcelLoader is used to load Microsoft Excel files.
This loader works with both .xlsx and .xls files.
When the loader is used in mode="elements" , an HTML representation of the Excel file is provided under the text_as_html key in the document metadata.
This confirms that one document has been loaded.
The page_content contains the data from each row, while the text_as_html in the metadata stores the data in HTML format.

DataFrameLoader
DataFrameLoaderSimilar to CSV files, we can load Excel files by using the
read_excel()function to create apandas.DataFrame, and then load it.
Last updated