Arxiv Loader

Open in ColabOpen in GitHub

Overview

arXiv is an open access archive for 2 million scholarly articles in the fields of physics,

mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering and systems

science, and economics.

API Documentation

To access the Arxiv document loader, you need to install arxiv, PyMuPDF and langchain-community integration packages.

PyMuPDF converts PDF files downloaded from arxiv.org into text format.

Table of Contents

References


Environment Setup

Set up the environment. You may refer to Environment Setup for more details.

[Note]

  • langchain-opentutorial is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials.

  • You can checkout the langchain-opentutorial for more details.

Arxiv-Loader-Instantiate

You can make arxiv loader instance to load documents from arxiv.org.

Initialize with search query to find documents in the Arixiv.org. Supports all arguments of ArxivAPIWrapper .

Load

Use Load method to load documents from arxiv.org with ArxivLoader instance.

  • If load_all_available_meta is False, only partial metadata is displayed, not the complete metadata.

Lazy Load

When loading large amounts of documents, If you can perform downstream tasks on a subset of all loaded documents, you can lazy_load documents one at a time to minimize memory usage.

Asynchronous Load

Use aload method to load documents from arxiv.org asynchronously.

Use Summaries of Articles as Docs

Use get_summaries_as_docs method to get summaries of articles as docs.

Last updated