Arxiv Loader
Author: Sunyoung Park (architectyou)
Peer Review : ppakyeah
Proofread : JaeJun Shim
This is a part of LangChain Open Tutorial
Overview
arXiv is an open access archive for 2 million scholarly articles in the fields of physics,
mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering and systems
science, and economics.
To access the Arxiv document loader, you need to install arxiv, PyMuPDF and langchain-community integration packages.
PyMuPDF converts PDF files downloaded from arxiv.org into text format.
Table of Contents
References
Environment Setup
Set up the environment. You may refer to Environment Setup for more details.
[Note]
langchain-opentutorialis a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials.You can checkout the
langchain-opentutorialfor more details.
Arxiv-Loader-Instantiate
You can make arxiv loader instance to load documents from arxiv.org.
Initialize with search query to find documents in the Arixiv.org.
Supports all arguments of ArxivAPIWrapper .
Load
Use Load method to load documents from arxiv.org with ArxivLoader instance.
If
load_all_available_metais False, only partial metadata is displayed, not the complete metadata.
Lazy Load
When loading large amounts of documents, If you can perform downstream tasks on a subset of all loaded documents, you can lazy_load documents one at a time to minimize memory usage.
Asynchronous Load
Use aload method to load documents from arxiv.org asynchronously.
Use Summaries of Articles as Docs
Use get_summaries_as_docs method to get summaries of articles as docs.
Last updated