HWP (Hangeul) Loader

Open in ColabOpen in GitHub

Overview

HWP is Hangeul Word Processor developed by Hancom , and it is Korea's representative office software.

It uses the .hwp file extension and is widely used in Businesses, Schools, and Government Institutions, and more.

Therefore, if you're a developer in South Korea, you've likely had (or will have) experience dealing with .hwp documents.

Unfortunately, it's not yet integrated with LangChain, so we'll need to use a custom-implemented HWPLoader with langchain-teddynote and langchain-opentutorial .

In this tutorial, we'll implement a HWPLoader that can load .hwp files and extract text from them.

Table of Contents

References


Environment Setup

Set up the environment. You may refer to Environment Setup for more details.

[Note]

  • langchain-opentutorial is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials.

  • You can checkout the langchain-opentutorial for more details.

HWP Loader Instantiate

You can instantiate HWP Loader with HWPLoader class.

Loader

You can load the document with load method.

Last updated