JSON

Let's look at how to load files with the .json extension using a loader.

Open in ColabOpen in GitHub

Overview

This tutorial demonstrates how to use LangChain's JSONLoader to load and process JSON files. We'll explore how to extract specific data from structured JSON files using jq-style queries.

Table of Contents

When you want to extract values under the content field within the message key of JSON data, you can easily do this using JSONLoader as shown below.

References

  • https://python.langchain.com/docs/how_to/document_loader_json/


Environment Setup

Set up the environment. You may refer to Environment Setup for more details.

[Note]

  • langchain-opentutorial is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials.

  • You can check out the langchain-opentutorial for more details.

You can alternatively set OPENAI_API_KEY in .env file and load it.

[Note] This is not necessary if you've already set OPENAI_API_KEY in previous steps.

Generate JSON Data


If you want to generate JSON data, you can use the following code.

The case of loading JSON data is as follows when you want to load your own JSON data.

JSONLoader


When you want to extract values under the content field within the message key of JSON data, you can easily do this using JSONLoader as shown below.

Basic Usage

This usage shows off how to execute load JSON and print what I get from

Loading Each Person as a Separate Document

We can load each person object from people.json as an individual document using the jq_schema=".people[]"

Using content_key within jq_schema

To load documents from a JSON file using content_key within the jq_schema, set is_content_key_jq_parsable=True. Ensure that content_key is compatible and can be parsed using the jq_schema.

Extracting Metadata from people.json

Let's define a metadata_func to extract relevant information like name, age, and city from each person object.

Understanding JSON Query Syntax

Let's explore the basic syntax of jq-style queries used in JSONLoader:

Basic Selectors

  • . : Current object

  • .key : Access specific key in object

  • .[] : Iterate over array elements

Pipe Operator

  • | : Pass result of left expression as input to right expression

Object Construction

  • {key: value} : Create new object

Example JSON:

Common Query Patterns:

  • .people[] : Access each array element

  • .people[].name : Get all names

  • .people[] | {name: .name} : Create new object with name

  • .people[] | {name, email: .contact.email} : Extract nested data

[Note]

  • Always use text_content=False when working with complex JSON data

  • This ensures proper handling of non-string values (objects, arrays, numbers)

Advanced Queries

Here are examples of extracting specific information using different jq schemas:

These examples demonstrate the flexibility of jq queries in fetching data in various ways.

Last updated