HWP (Hangeul) Loader
Author: Sunyoung Park (architectyou)
Peer Review : Suhyun Lee, Kane
Proofread : JaeJun Shim
This is a part of LangChain Open Tutorial
Overview
HWP is Hangeul Word Processor developed by Hancom , and it is Korea's representative office software.
It uses the .hwp file extension and is widely used in Businesses, Schools, and Government Institutions, and more.
Therefore, if you're a developer in South Korea, you've likely had (or will have) experience dealing with .hwp documents.
Unfortunately, it's not yet integrated with LangChain, so we'll need to use a custom-implemented HWPLoader with langchain-teddynote and langchain-opentutorial .
In this tutorial, we'll implement a HWPLoader that can load .hwp files and extract text from them.
Table of Contents
References
Environment Setup
Set up the environment. You may refer to Environment Setup for more details.
[Note]
langchain-opentutorialis a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials.You can checkout the
langchain-opentutorialfor more details.
HWP Loader Instantiate
You can instantiate HWP Loader with HWPLoader class.
Loader
You can load the document with load method.
Last updated