Arxiv Loader

Open in Colabarrow-up-rightOpen in GitHubarrow-up-right

Overview

arXivarrow-up-right is an open access archive for 2 million scholarly articles in the fields of physics,

mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering and systems

science, and economics.

API Documentationarrow-up-right

To access the Arxiv document loader, you need to install arxiv, PyMuPDF and langchain-community integration packages.

PyMuPDF converts PDF files downloaded from arxiv.org into text format.

Table of Contents

References


Environment Setup

Set up the environment. You may refer to Environment Setuparrow-up-right for more details.

[Note]

  • langchain-opentutorial is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials.

  • You can checkout the langchain-opentutorialarrow-up-right for more details.

Arxiv-Loader-Instantiate

You can make arxiv loader instance to load documents from arxiv.org.

Initialize with search query to find documents in the Arixiv.org. Supports all arguments of ArxivAPIWrapper .

Load

Use Load method to load documents from arxiv.org with ArxivLoader instance.

  • If load_all_available_meta is False, only partial metadata is displayed, not the complete metadata.

Lazy Load

When loading large amounts of documents, If you can perform downstream tasks on a subset of all loaded documents, you can lazy_load documents one at a time to minimize memory usage.

Asynchronous Load

Use aload method to load documents from arxiv.org asynchronously.

Use Summaries of Articles as Docs

Use get_summaries_as_docs method to get summaries of articles as docs.

Last updated