Excel File Loading in LangChain

Open in ColabOpen in GitHub

Overview

This tutorial covers the process of loading and handling Microsoft Excel files in LangChain .

It focuses on two primary methods: UnstructuredExcelLoader for raw text extraction and DataFrameLoader for structured data processing.

The guide aims to help developers effectively integrate Excel data into their LangChain projects, covering both basic and advanced usage scenarios.

Table of Contents


Environment Setup

Set up the environment. You may refer to Environment Setup for more details.

[Note]

  • langchain-opentutorial is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials.

  • You can checkout the langchain-opentutorial for more details.

UnstructuredExcelLoader

UnstructuredExcelLoader is used to load Microsoft Excel files.

This loader works with both .xlsx and .xls files.

When the loader is used in mode="elements" , an HTML representation of the Excel file is provided under the text_as_html key in the document metadata.

This confirms that one document has been loaded.

The page_content contains the data from each row, while the text_as_html in the metadata stores the data in HTML format.

text_as_html

DataFrameLoader

  • Similar to CSV files, we can load Excel files by using the read_excel() function to create a pandas.DataFrame, and then load it.

Last updated