LangChain OpenTutorial
  • 🦜️🔗 The LangChain Open Tutorial for Everyone
  • 01-Basic
    • Getting Started on Windows
    • 02-Getting-Started-Mac
    • OpenAI API Key Generation and Testing Guide
    • LangSmith Tracking Setup
    • Using the OpenAI API (GPT-4o Multimodal)
    • Basic Example: Prompt+Model+OutputParser
    • LCEL Interface
    • Runnable
  • 02-Prompt
    • Prompt Template
    • Few-Shot Templates
    • LangChain Hub
    • Personal Prompts for LangChain
    • Prompt Caching
  • 03-OutputParser
    • PydanticOutputParser
    • PydanticOutputParser
    • CommaSeparatedListOutputParser
    • Structured Output Parser
    • JsonOutputParser
    • PandasDataFrameOutputParser
    • DatetimeOutputParser
    • EnumOutputParser
    • Output Fixing Parser
  • 04-Model
    • Using Various LLM Models
    • Chat Models
    • Caching
    • Caching VLLM
    • Model Serialization
    • Check Token Usage
    • Google Generative AI
    • Huggingface Endpoints
    • HuggingFace Local
    • HuggingFace Pipeline
    • ChatOllama
    • GPT4ALL
    • Video Q&A LLM (Gemini)
  • 05-Memory
    • ConversationBufferMemory
    • ConversationBufferWindowMemory
    • ConversationTokenBufferMemory
    • ConversationEntityMemory
    • ConversationKGMemory
    • ConversationSummaryMemory
    • VectorStoreRetrieverMemory
    • LCEL (Remembering Conversation History): Adding Memory
    • Memory Using SQLite
    • Conversation With History
  • 06-DocumentLoader
    • Document & Document Loader
    • PDF Loader
    • WebBaseLoader
    • CSV Loader
    • Excel File Loading in LangChain
    • Microsoft Word(doc, docx) With Langchain
    • Microsoft PowerPoint
    • TXT Loader
    • JSON
    • Arxiv Loader
    • UpstageDocumentParseLoader
    • LlamaParse
    • HWP (Hangeul) Loader
  • 07-TextSplitter
    • Character Text Splitter
    • 02. RecursiveCharacterTextSplitter
    • Text Splitting Methods in NLP
    • TokenTextSplitter
    • SemanticChunker
    • Split code with Langchain
    • MarkdownHeaderTextSplitter
    • HTMLHeaderTextSplitter
    • RecursiveJsonSplitter
  • 08-Embedding
    • OpenAI Embeddings
    • CacheBackedEmbeddings
    • HuggingFace Embeddings
    • Upstage
    • Ollama Embeddings With Langchain
    • LlamaCpp Embeddings With Langchain
    • GPT4ALL
    • Multimodal Embeddings With Langchain
  • 09-VectorStore
    • Vector Stores
    • Chroma
    • Faiss
    • Pinecone
    • Qdrant
    • Elasticsearch
    • MongoDB Atlas
    • PGVector
    • Neo4j
    • Weaviate
    • Faiss
    • {VectorStore Name}
  • 10-Retriever
    • VectorStore-backed Retriever
    • Contextual Compression Retriever
    • Ensemble Retriever
    • Long Context Reorder
    • Parent Document Retriever
    • MultiQueryRetriever
    • MultiVectorRetriever
    • Self-querying
    • TimeWeightedVectorStoreRetriever
    • TimeWeightedVectorStoreRetriever
    • Kiwi BM25 Retriever
    • Ensemble Retriever with Convex Combination (CC)
  • 11-Reranker
    • Cross Encoder Reranker
    • JinaReranker
    • FlashRank Reranker
  • 12-RAG
    • Understanding the basic structure of RAG
    • RAG Basic WebBaseLoader
    • Exploring RAG in LangChain
    • RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
    • Conversation-With-History
    • Translation
    • Multi Modal RAG
  • 13-LangChain-Expression-Language
    • RunnablePassthrough
    • Inspect Runnables
    • RunnableLambda
    • Routing
    • Runnable Parallel
    • Configure-Runtime-Chain-Components
    • Creating Runnable objects with chain decorator
    • RunnableWithMessageHistory
    • Generator
    • Binding
    • Fallbacks
    • RunnableRetry
    • WithListeners
    • How to stream runnables
  • 14-Chains
    • Summarization
    • SQL
    • Structured Output Chain
    • StructuredDataChat
  • 15-Agent
    • Tools
    • Bind Tools
    • Tool Calling Agent
    • Tool Calling Agent with More LLM Models
    • Iteration-human-in-the-loop
    • Agentic RAG
    • CSV/Excel Analysis Agent
    • Agent-with-Toolkits-File-Management
    • Make Report Using RAG, Web searching, Image generation Agent
    • TwoAgentDebateWithTools
    • React Agent
  • 16-Evaluations
    • Generate synthetic test dataset (with RAGAS)
    • Evaluation using RAGAS
    • HF-Upload
    • LangSmith-Dataset
    • LLM-as-Judge
    • Embedding-based Evaluator(embedding_distance)
    • LangSmith Custom LLM Evaluation
    • Heuristic Evaluation
    • Compare experiment evaluations
    • Summary Evaluators
    • Groundedness Evaluation
    • Pairwise Evaluation
    • LangSmith Repeat Evaluation
    • LangSmith Online Evaluation
    • LangFuse Online Evaluation
  • 17-LangGraph
    • 01-Core-Features
      • Understanding Common Python Syntax Used in LangGraph
      • Title
      • Building a Basic Chatbot with LangGraph
      • Building an Agent with LangGraph
      • Agent with Memory
      • LangGraph Streaming Outputs
      • Human-in-the-loop
      • LangGraph Manual State Update
      • Asking Humans for Help: Customizing State in LangGraph
      • DeleteMessages
      • DeleteMessages
      • LangGraph ToolNode
      • LangGraph ToolNode
      • Branch Creation for Parallel Node Execution
      • Conversation Summaries with LangGraph
      • Conversation Summaries with LangGraph
      • LangGrpah Subgraph
      • How to transform the input and output of a subgraph
      • LangGraph Streaming Mode
      • Errors
      • A Long-Term Memory Agent
    • 02-Structures
      • LangGraph-Building-Graphs
      • Naive RAG
      • Add Groundedness Check
      • Adding a Web Search Module
      • LangGraph-Add-Query-Rewrite
      • Agentic RAG
      • Adaptive RAG
      • Multi-Agent Structures (1)
      • Multi Agent Structures (2)
    • 03-Use-Cases
      • LangGraph Agent Simulation
      • Meta Prompt Generator based on User Requirements
      • CRAG: Corrective RAG
      • Plan-and-Execute
      • Multi Agent Collaboration Network
      • Multi Agent Collaboration Network
      • Multi-Agent Supervisor
      • 08-LangGraph-Hierarchical-Multi-Agent-Teams
      • 08-LangGraph-Hierarchical-Multi-Agent-Teams
      • SQL-Agent
      • 10-LangGraph-Research-Assistant
      • LangGraph Code Assistant
      • Deploy on LangGraph Cloud
      • Tree of Thoughts (ToT)
      • Ollama Deep Researcher (Deepseek-R1)
      • Functional API
      • Reflection in LangGraph
  • 19-Cookbook
    • 01-SQL
      • TextToSQL
      • SpeechToSQL
    • 02-RecommendationSystem
      • ResumeRecommendationReview
    • 03-GraphDB
      • Movie QA System with Graph Database
      • 05-TitanicQASystem
      • Real-Time GraphRAG QA
    • 04-GraphRAG
      • Academic Search System
      • Academic QA System with GraphRAG
    • 05-AIMemoryManagementSystem
      • ConversationMemoryManagementSystem
    • 06-Multimodal
      • Multimodal RAG
      • Shopping QnA
    • 07-Agent
      • 14-MoARAG
      • CoT Based Smart Web Search
      • 16-MultiAgentShoppingMallSystem
      • Agent-Based Dynamic Slot Filling
      • Code Debugging System
      • New Employee Onboarding Chatbot
      • 20-LangGraphStudio-MultiAgent
      • Multi-Agent Scheduler System
    • 08-Serving
      • FastAPI Serving
      • Sending Requests to Remote Graph Server
      • Building a Agent API with LangServe: Integrating Currency Exchange and Trip Planning
    • 08-SyntheticDataset
      • Synthetic Dataset Generation using RAG
    • 09-Monitoring
      • Langfuse Selfhosting
Powered by GitBook
On this page
  • Overview
  • Table of Contents
  • References
  • Environment Setup
  • Generate JSON Data
  • JSONLoader
  • Basic Usage
  • Loading Each Person as a Separate Document
  • Using content_key within jq_schema
  • Extracting Metadata from people.json
  • Understanding JSON Query Syntax
  • Advanced Queries
  1. 06-DocumentLoader

JSON

PreviousTXT LoaderNextArxiv Loader

Last updated 28 days ago

Let's look at how to load files with the .json extension using a loader.

  • Author:

  • Peer Review : ,

  • Proofread :

  • This is a part of

Overview

This tutorial demonstrates how to use LangChain's JSONLoader to load and process JSON files. We'll explore how to extract specific data from structured JSON files using jq-style queries.

Table of Contents

When you want to extract values under the content field within the message key of JSON data, you can easily do this using JSONLoader as shown below.

References

  • https://python.langchain.com/docs/how_to/document_loader_json/


Environment Setup

[Note]

  • langchain-opentutorial is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials.

%%capture --no-stderr
%pip install langchain-opentutorial
# Install required packages
from langchain_opentutorial import package

package.install(
    [
        "langsmith",
        "langchain",
        "langchain_community",
        "langchain_openai"
    ],
    verbose=False,
    upgrade=False,
)
%pip install rq
Requirement already satisfied: rq in /Users/leejungbin/Library/Caches/pypoetry/virtualenvs/langchain-opentutorial-LGorndcz-py3.11/lib/python3.11/site-packages (2.1.0)

    Requirement already satisfied: click>=5 in /Users/leejungbin/Library/Caches/pypoetry/virtualenvs/langchain-opentutorial-LGorndcz-py3.11/lib/python3.11/site-packages (from rq) (8.1.8)

    Requirement already satisfied: redis>=3.5 in /Users/leejungbin/Library/Caches/pypoetry/virtualenvs/langchain-opentutorial-LGorndcz-py3.11/lib/python3.11/site-packages (from rq) (5.2.1)

    Note: you may need to restart the kernel to use updated packages.
# Set environment variables
from langchain_opentutorial import set_env

set_env(
    {
        "OPENAI_API_KEY": "",
        "LANGCHAIN_API_KEY": "",
        "LANGCHAIN_TRACING_V2": "true",
        "LANGCHAIN_ENDPOINT": "https://api.smith.langchain.com",
        "LANGCHAIN_PROJECT": "09-JSONLoader",
    }
)

You can alternatively set OPENAI_API_KEY in .env file and load it.

[Note] This is not necessary if you've already set OPENAI_API_KEY in previous steps.

# Load environment variables
# Reload any variables that need to be overwritten from the previous cell

from dotenv import load_dotenv

load_dotenv(override=True)

Generate JSON Data


If you want to generate JSON data, you can use the following code.

from langchain import PromptTemplate
from langchain_openai import ChatOpenAI
from pathlib import Path
from dotenv import load_dotenv
from pprint import pprint
import json
import os

# Load .env file
load_dotenv()

# Initialize ChatOpenAI
llm = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0.7,
    model_kwargs={"response_format": {"type": "json_object"}}
)

# Create prompt template
prompt = PromptTemplate(
    input_variables=[],
    template="""Generate a JSON array containing detailed personal information for 5 people. 
        Include various fields like name, age, contact details, address, personal preferences, and any other interesting information you think would be relevant."""
)

# Create and invoke runnable sequence using the new pipe syntax
response = (prompt | llm).invoke({})
generated_data = json.loads(response.content)

# Save to JSON file
current_dir = Path().absolute()
data_dir = current_dir / "data"
data_dir.mkdir(exist_ok=True)

file_path = data_dir / "people.json"
with open(file_path, "w", encoding="utf-8") as f:
    json.dump(generated_data, f, ensure_ascii=False, indent=2)

print("Generated and saved JSON data:")
pprint(generated_data)
Generated and saved JSON data:
    {'people': [{'address': {'city': 'Springfield',
                             'country': 'USA',
                             'state': 'IL',
                             'street': '123 Maple St',
                             'zip': '62701'},
                 'age': 28,
                 'contact_details': {'email': 'alice.johnson@example.com',
                                     'phone': '+1-555-123-4567'},
                 'interesting_information': {'pet': {'breed': 'Golden Retriever',
                                                     'name': 'Buddy',
                                                     'type': 'dog'},
                                             'travel_history': [{'country': 'Japan',
                                                                 'year': 2019},
                                                                {'country': 'Italy',
                                                                 'year': 2021}]},
                 'name': 'Alice Johnson',
                 'personal_preferences': {'favorite_food': 'sushi',
                                          'hobbies': ['reading',
                                                      'hiking',
                                                      'photography'],
                                          'music_genres': ['jazz',
                                                           'classical',
                                                           'indie']}},
                {'address': {'city': 'Denver',
                             'country': 'USA',
                             'state': 'CO',
                             'street': '456 Oak Ave',
                             'zip': '80202'},
                 'age': 34,
                 'contact_details': {'email': 'michael.smith@example.com',
                                     'phone': '+1-555-234-5678'},
                 'interesting_information': {'pet': {'breed': 'Siamese',
                                                     'name': 'Whiskers',
                                                     'type': 'cat'},
                                             'volunteering': {'organization': 'Local '
                                                                              'Food '
                                                                              'Bank',
                                                              'years_active': 5}},
                 'name': 'Michael Smith',
                 'personal_preferences': {'favorite_food': 'pizza',
                                          'hobbies': ['cycling',
                                                      'cooking',
                                                      'gaming'],
                                          'music_genres': ['rock',
                                                           'pop',
                                                           'hip-hop']}},
                {'address': {'city': 'Austin',
                             'country': 'USA',
                             'state': 'TX',
                             'street': '789 Pine Rd',
                             'zip': '73301'},
                 'age': 22,
                 'contact_details': {'email': 'emily.davis@example.com',
                                     'phone': '+1-555-345-6789'},
                 'interesting_information': {'pet': None,
                                             'study': {'graduation_year': 2024,
                                                       'major': 'Fine Arts',
                                                       'university': 'University '
                                                                     'of Texas'}},
                 'name': 'Emily Davis',
                 'personal_preferences': {'favorite_food': 'tacos',
                                          'hobbies': ['painting',
                                                      'traveling',
                                                      'yoga'],
                                          'music_genres': ['country',
                                                           'folk',
                                                           'dance']}},
                {'address': {'city': 'Seattle',
                             'country': 'USA',
                             'state': 'WA',
                             'street': '101 Birch Blvd',
                             'zip': '98101'},
                 'age': 45,
                 'contact_details': {'email': 'david.brown@example.com',
                                     'phone': '+1-555-456-7890'},
                 'interesting_information': {'career': {'job_title': 'Software '
                                                                     'Engineer',
                                                        'years_experience': 20},
                                             'pet': {'breed': 'Canary',
                                                     'name': 'Tweety',
                                                     'type': 'bird'}},
                 'name': 'David Brown',
                 'personal_preferences': {'favorite_food': 'steak',
                                          'hobbies': ['golf', 'reading', 'fishing'],
                                          'music_genres': ['blues',
                                                           'classic rock',
                                                           'jazz']}},
                {'address': {'city': 'Miami',
                             'country': 'USA',
                             'state': 'FL',
                             'street': '202 Cedar Ct',
                             'zip': '33101'},
                 'age': 39,
                 'contact_details': {'email': 'sophia.wilson@example.com',
                                     'phone': '+1-555-567-8901'},
                 'interesting_information': {'pet': {'breed': 'Bulldog',
                                                     'name': 'Max',
                                                     'type': 'dog'},
                                             'travel_history': [{'country': 'Spain',
                                                                 'year': 2018},
                                                                {'country': 'Brazil',
                                                                 'year': 2020}]},
                 'name': 'Sophia Wilson',
                 'personal_preferences': {'favorite_food': 'paella',
                                          'hobbies': ['dancing',
                                                      'gardening',
                                                      'cooking'],
                                          'music_genres': ['latin',
                                                           'pop',
                                                           'salsa']}}]}

The case of loading JSON data is as follows when you want to load your own JSON data.

import json
from pathlib import Path
from pprint import pprint


file_path = "data/people.json"
data = json.loads(Path(file_path).read_text())

pprint(data)
{'people': [{'address': {'city': 'Springfield',
                             'country': 'USA',
                             'state': 'IL',
                             'street': '123 Maple St',
                             'zip': '62704'},
                 'age': 28,
                 'contact': {'email': 'alice.johnson@example.com',
                             'phone': '+1-555-0123',
                             'social_media': {'linkedin': 'linkedin.com/in/alicejohnson',
                                              'twitter': '@alice_j'}},
                 'interesting_fact': 'Alice has traveled to over 15 countries and '
                                     'speaks 3 languages.',
                 'name': {'first': 'Alice', 'last': 'Johnson'},
                 'personal_preferences': {'favorite_food': 'Italian',
                                          'hobbies': ['Reading',
                                                      'Hiking',
                                                      'Cooking'],
                                          'music_genre': 'Jazz',
                                          'travel_destinations': ['Japan',
                                                                  'Italy',
                                                                  'Canada']}},
                {'address': {'city': 'Metropolis',
                             'country': 'USA',
                             'state': 'NY',
                             'street': '456 Oak Ave',
                             'zip': '10001'},
                 'age': 34,
                 'contact': {'email': 'bob.smith@example.com',
                             'phone': '+1-555-0456',
                             'social_media': {'linkedin': 'linkedin.com/in/bobsmith',
                                              'twitter': '@bobsmith34'}},
                 'interesting_fact': 'Bob is an avid gamer and has competed in '
                                     'several national tournaments.',
                 'name': {'first': 'Bob', 'last': 'Smith'},
                 'personal_preferences': {'favorite_food': 'Mexican',
                                          'hobbies': ['Photography',
                                                      'Cycling',
                                                      'Video Games'],
                                          'music_genre': 'Rock',
                                          'travel_destinations': ['Brazil',
                                                                  'Australia',
                                                                  'Germany']}},
                {'address': {'city': 'Gotham',
                             'country': 'USA',
                             'state': 'NJ',
                             'street': '789 Pine Rd',
                             'zip': '07001'},
                 'age': 45,
                 'contact': {'email': 'charlie.davis@example.com',
                             'phone': '+1-555-0789',
                             'social_media': {'linkedin': 'linkedin.com/in/charliedavis',
                                              'twitter': '@charliedavis45'}},
                 'interesting_fact': 'Charlie has a small farm where he raises '
                                     'chickens and grows organic vegetables.',
                 'name': {'first': 'Charlie', 'last': 'Davis'},
                 'personal_preferences': {'favorite_food': 'Barbecue',
                                          'hobbies': ['Gardening',
                                                      'Fishing',
                                                      'Woodworking'],
                                          'music_genre': 'Country',
                                          'travel_destinations': ['Canada',
                                                                  'New Zealand',
                                                                  'Norway']}},
                {'address': {'city': 'Star City',
                             'country': 'USA',
                             'state': 'CA',
                             'street': '234 Birch Blvd',
                             'zip': '90001'},
                 'age': 22,
                 'contact': {'email': 'dana.lee@example.com',
                             'phone': '+1-555-0111',
                             'social_media': {'linkedin': 'linkedin.com/in/danalee',
                                              'twitter': '@danalee22'}},
                 'interesting_fact': 'Dana is a dance instructor and has won '
                                     'several local competitions.',
                 'name': {'first': 'Dana', 'last': 'Lee'},
                 'personal_preferences': {'favorite_food': 'Thai',
                                          'hobbies': ['Dancing',
                                                      'Sketching',
                                                      'Traveling'],
                                          'music_genre': 'Pop',
                                          'travel_destinations': ['Thailand',
                                                                  'France',
                                                                  'Spain']}},
                {'address': {'city': 'Central City',
                             'country': 'USA',
                             'state': 'TX',
                             'street': '345 Cedar St',
                             'zip': '75001'},
                 'age': 31,
                 'contact': {'email': 'ethan.garcia@example.com',
                             'phone': '+1-555-0999',
                             'social_media': {'linkedin': 'linkedin.com/in/ethangarcia',
                                              'twitter': '@ethangarcia31'}},
                 'interesting_fact': 'Ethan runs a popular travel blog where he '
                                     'shares his adventures and culinary '
                                     'experiences.',
                 'name': {'first': 'Ethan', 'last': 'Garcia'},
                 'personal_preferences': {'favorite_food': 'Indian',
                                          'hobbies': ['Running',
                                                      'Travel Blogging',
                                                      'Cooking'],
                                          'music_genre': 'Hip-Hop',
                                          'travel_destinations': ['India',
                                                                  'Italy',
                                                                  'Mexico']}}]}
print(type(data))

JSONLoader


When you want to extract values under the content field within the message key of JSON data, you can easily do this using JSONLoader as shown below.

Basic Usage

This usage shows off how to execute load JSON and print what I get from

from langchain_community.document_loaders import JSONLoader

# Create JSONLoader
loader = JSONLoader(
    file_path="data/people.json",
    jq_schema=".people[]",  # Access each item in the people array
    text_content=False,
)

# Load documents
docs = loader.load()
pprint(docs)
[Document(metadata={'source': '/Users/leejungbin/Downloads/LangChain-OpenTutorial/06-DocumentLoader/data/people.json', 'seq_num': 1}, page_content='{"name": "Alice Johnson", "age": 28, "contact_details": {"email": "alice.johnson@example.com", "phone": "+1-555-123-4567"}, "address": {"street": "123 Maple St", "city": "Springfield", "state": "IL", "zip": "62701", "country": "USA"}, "personal_preferences": {"hobbies": ["reading", "hiking", "photography"], "favorite_food": "sushi", "music_genres": ["jazz", "classical", "indie"]}, "interesting_information": {"pet": {"type": "dog", "name": "Buddy", "breed": "Golden Retriever"}, "travel_history": [{"country": "Japan", "year": 2019}, {"country": "Italy", "year": 2021}]}}'),
     Document(metadata={'source': '/Users/leejungbin/Downloads/LangChain-OpenTutorial/06-DocumentLoader/data/people.json', 'seq_num': 2}, page_content='{"name": "Michael Smith", "age": 34, "contact_details": {"email": "michael.smith@example.com", "phone": "+1-555-234-5678"}, "address": {"street": "456 Oak Ave", "city": "Denver", "state": "CO", "zip": "80202", "country": "USA"}, "personal_preferences": {"hobbies": ["cycling", "cooking", "gaming"], "favorite_food": "pizza", "music_genres": ["rock", "pop", "hip-hop"]}, "interesting_information": {"pet": {"type": "cat", "name": "Whiskers", "breed": "Siamese"}, "volunteering": {"organization": "Local Food Bank", "years_active": 5}}}'),
     Document(metadata={'source': '/Users/leejungbin/Downloads/LangChain-OpenTutorial/06-DocumentLoader/data/people.json', 'seq_num': 3}, page_content='{"name": "Emily Davis", "age": 22, "contact_details": {"email": "emily.davis@example.com", "phone": "+1-555-345-6789"}, "address": {"street": "789 Pine Rd", "city": "Austin", "state": "TX", "zip": "73301", "country": "USA"}, "personal_preferences": {"hobbies": ["painting", "traveling", "yoga"], "favorite_food": "tacos", "music_genres": ["country", "folk", "dance"]}, "interesting_information": {"pet": null, "study": {"major": "Fine Arts", "university": "University of Texas", "graduation_year": 2024}}}'),
     Document(metadata={'source': '/Users/leejungbin/Downloads/LangChain-OpenTutorial/06-DocumentLoader/data/people.json', 'seq_num': 4}, page_content='{"name": "David Brown", "age": 45, "contact_details": {"email": "david.brown@example.com", "phone": "+1-555-456-7890"}, "address": {"street": "101 Birch Blvd", "city": "Seattle", "state": "WA", "zip": "98101", "country": "USA"}, "personal_preferences": {"hobbies": ["golf", "reading", "fishing"], "favorite_food": "steak", "music_genres": ["blues", "classic rock", "jazz"]}, "interesting_information": {"pet": {"type": "bird", "name": "Tweety", "breed": "Canary"}, "career": {"job_title": "Software Engineer", "years_experience": 20}}}'),
     Document(metadata={'source': '/Users/leejungbin/Downloads/LangChain-OpenTutorial/06-DocumentLoader/data/people.json', 'seq_num': 5}, page_content='{"name": "Sophia Wilson", "age": 39, "contact_details": {"email": "sophia.wilson@example.com", "phone": "+1-555-567-8901"}, "address": {"street": "202 Cedar Ct", "city": "Miami", "state": "FL", "zip": "33101", "country": "USA"}, "personal_preferences": {"hobbies": ["dancing", "gardening", "cooking"], "favorite_food": "paella", "music_genres": ["latin", "pop", "salsa"]}, "interesting_information": {"pet": {"type": "dog", "name": "Max", "breed": "Bulldog"}, "travel_history": [{"country": "Spain", "year": 2018}, {"country": "Brazil", "year": 2020}]}}')]

Loading Each Person as a Separate Document

We can load each person object from people.json as an individual document using the jq_schema=".people[]"

loader = JSONLoader(
    file_path="data/people.json",
    jq_schema=".people[]",
    text_content=False,
)

data = loader.load()
data
[Document(metadata={'source': '/Users/leejungbin/Downloads/LangChain-OpenTutorial/06-DocumentLoader/data/people.json', 'seq_num': 1}, page_content='{"name": "Alice Johnson", "age": 28, "contact_details": {"email": "alice.johnson@example.com", "phone": "+1-555-123-4567"}, "address": {"street": "123 Maple St", "city": "Springfield", "state": "IL", "zip": "62701", "country": "USA"}, "personal_preferences": {"hobbies": ["reading", "hiking", "photography"], "favorite_food": "sushi", "music_genres": ["jazz", "classical", "indie"]}, "interesting_information": {"pet": {"type": "dog", "name": "Buddy", "breed": "Golden Retriever"}, "travel_history": [{"country": "Japan", "year": 2019}, {"country": "Italy", "year": 2021}]}}'),
     Document(metadata={'source': '/Users/leejungbin/Downloads/LangChain-OpenTutorial/06-DocumentLoader/data/people.json', 'seq_num': 2}, page_content='{"name": "Michael Smith", "age": 34, "contact_details": {"email": "michael.smith@example.com", "phone": "+1-555-234-5678"}, "address": {"street": "456 Oak Ave", "city": "Denver", "state": "CO", "zip": "80202", "country": "USA"}, "personal_preferences": {"hobbies": ["cycling", "cooking", "gaming"], "favorite_food": "pizza", "music_genres": ["rock", "pop", "hip-hop"]}, "interesting_information": {"pet": {"type": "cat", "name": "Whiskers", "breed": "Siamese"}, "volunteering": {"organization": "Local Food Bank", "years_active": 5}}}'),
     Document(metadata={'source': '/Users/leejungbin/Downloads/LangChain-OpenTutorial/06-DocumentLoader/data/people.json', 'seq_num': 3}, page_content='{"name": "Emily Davis", "age": 22, "contact_details": {"email": "emily.davis@example.com", "phone": "+1-555-345-6789"}, "address": {"street": "789 Pine Rd", "city": "Austin", "state": "TX", "zip": "73301", "country": "USA"}, "personal_preferences": {"hobbies": ["painting", "traveling", "yoga"], "favorite_food": "tacos", "music_genres": ["country", "folk", "dance"]}, "interesting_information": {"pet": null, "study": {"major": "Fine Arts", "university": "University of Texas", "graduation_year": 2024}}}'),
     Document(metadata={'source': '/Users/leejungbin/Downloads/LangChain-OpenTutorial/06-DocumentLoader/data/people.json', 'seq_num': 4}, page_content='{"name": "David Brown", "age": 45, "contact_details": {"email": "david.brown@example.com", "phone": "+1-555-456-7890"}, "address": {"street": "101 Birch Blvd", "city": "Seattle", "state": "WA", "zip": "98101", "country": "USA"}, "personal_preferences": {"hobbies": ["golf", "reading", "fishing"], "favorite_food": "steak", "music_genres": ["blues", "classic rock", "jazz"]}, "interesting_information": {"pet": {"type": "bird", "name": "Tweety", "breed": "Canary"}, "career": {"job_title": "Software Engineer", "years_experience": 20}}}'),
     Document(metadata={'source': '/Users/leejungbin/Downloads/LangChain-OpenTutorial/06-DocumentLoader/data/people.json', 'seq_num': 5}, page_content='{"name": "Sophia Wilson", "age": 39, "contact_details": {"email": "sophia.wilson@example.com", "phone": "+1-555-567-8901"}, "address": {"street": "202 Cedar Ct", "city": "Miami", "state": "FL", "zip": "33101", "country": "USA"}, "personal_preferences": {"hobbies": ["dancing", "gardening", "cooking"], "favorite_food": "paella", "music_genres": ["latin", "pop", "salsa"]}, "interesting_information": {"pet": {"type": "dog", "name": "Max", "breed": "Bulldog"}, "travel_history": [{"country": "Spain", "year": 2018}, {"country": "Brazil", "year": 2020}]}}')]

Using content_key within jq_schema

To load documents from a JSON file using content_key within the jq_schema, set is_content_key_jq_parsable=True. Ensure that content_key is compatible and can be parsed using the jq_schema.

loader = JSONLoader(
    file_path="data/people.json",
    jq_schema=".people[]",
    content_key="name",
    text_content=False
)

data = loader.load()
data
[Document(metadata={'source': '/Users/leejungbin/Downloads/LangChain-OpenTutorial/06-DocumentLoader/data/people.json', 'seq_num': 1}, page_content='Alice Johnson'),
     Document(metadata={'source': '/Users/leejungbin/Downloads/LangChain-OpenTutorial/06-DocumentLoader/data/people.json', 'seq_num': 2}, page_content='Michael Smith'),
     Document(metadata={'source': '/Users/leejungbin/Downloads/LangChain-OpenTutorial/06-DocumentLoader/data/people.json', 'seq_num': 3}, page_content='Emily Davis'),
     Document(metadata={'source': '/Users/leejungbin/Downloads/LangChain-OpenTutorial/06-DocumentLoader/data/people.json', 'seq_num': 4}, page_content='David Brown'),
     Document(metadata={'source': '/Users/leejungbin/Downloads/LangChain-OpenTutorial/06-DocumentLoader/data/people.json', 'seq_num': 5}, page_content='Sophia Wilson')]

Extracting Metadata from people.json

Let's define a metadata_func to extract relevant information like name, age, and city from each person object.

def metadata_func(record: dict, metadata: dict) -> dict:
    metadata["name"] = record.get("name")
    metadata["age"] = record.get("age")
    metadata["city"] = record.get("address", {}).get("city")
    return metadata

loader = JSONLoader(
    file_path="data/people.json",
    jq_schema=".people[]",
    content_key="name",
    metadata_func=metadata_func,
    text_content=False
)

data = loader.load()
data
[Document(metadata={'source': '/Users/leejungbin/Downloads/LangChain-OpenTutorial/06-DocumentLoader/data/people.json', 'seq_num': 1, 'name': 'Alice Johnson', 'age': 28, 'city': 'Springfield'}, page_content='Alice Johnson'),
     Document(metadata={'source': '/Users/leejungbin/Downloads/LangChain-OpenTutorial/06-DocumentLoader/data/people.json', 'seq_num': 2, 'name': 'Michael Smith', 'age': 34, 'city': 'Denver'}, page_content='Michael Smith'),
     Document(metadata={'source': '/Users/leejungbin/Downloads/LangChain-OpenTutorial/06-DocumentLoader/data/people.json', 'seq_num': 3, 'name': 'Emily Davis', 'age': 22, 'city': 'Austin'}, page_content='Emily Davis'),
     Document(metadata={'source': '/Users/leejungbin/Downloads/LangChain-OpenTutorial/06-DocumentLoader/data/people.json', 'seq_num': 4, 'name': 'David Brown', 'age': 45, 'city': 'Seattle'}, page_content='David Brown'),
     Document(metadata={'source': '/Users/leejungbin/Downloads/LangChain-OpenTutorial/06-DocumentLoader/data/people.json', 'seq_num': 5, 'name': 'Sophia Wilson', 'age': 39, 'city': 'Miami'}, page_content='Sophia Wilson')]

Understanding JSON Query Syntax

Let's explore the basic syntax of jq-style queries used in JSONLoader:

Basic Selectors

  • . : Current object

  • .key : Access specific key in object

  • .[] : Iterate over array elements

Pipe Operator

  • | : Pass result of left expression as input to right expression

Object Construction

  • {key: value} : Create new object

Example JSON:

{
  "people": [
    {"name": "Alice", "age": 30, "contactDetails": {"email": "alice@example.com", "phone": "123-456-7890"}},
    {"name": "Bob", "age": 25, "contactDetails": {"email": "bob@example.com", "phone": "098-765-4321"}}
  ]
}

Common Query Patterns:

  • .people[] : Access each array element

  • .people[].name : Get all names

  • .people[] | {name: .name} : Create new object with name

  • .people[] | {name, email: .contact.email} : Extract nested data

[Note]

  • Always use text_content=False when working with complex JSON data

  • This ensures proper handling of non-string values (objects, arrays, numbers)

Advanced Queries

Here are examples of extracting specific information using different jq schemas:

# Extract only contact details
contact_loader = JSONLoader(
    file_path="data/people.json",
    jq_schema=".people[] | {name: .name, contact: .contactDetails}",
    text_content=False
)

docs = contact_loader.load()
docs
[Document(metadata={'source': '/Users/leejungbin/Downloads/LangChain-OpenTutorial/06-DocumentLoader/data/people.json', 'seq_num': 1}, page_content='{"name": "Alice Johnson", "contact": null}'),
     Document(metadata={'source': '/Users/leejungbin/Downloads/LangChain-OpenTutorial/06-DocumentLoader/data/people.json', 'seq_num': 2}, page_content='{"name": "Michael Smith", "contact": null}'),
     Document(metadata={'source': '/Users/leejungbin/Downloads/LangChain-OpenTutorial/06-DocumentLoader/data/people.json', 'seq_num': 3}, page_content='{"name": "Emily Davis", "contact": null}'),
     Document(metadata={'source': '/Users/leejungbin/Downloads/LangChain-OpenTutorial/06-DocumentLoader/data/people.json', 'seq_num': 4}, page_content='{"name": "David Brown", "contact": null}'),
     Document(metadata={'source': '/Users/leejungbin/Downloads/LangChain-OpenTutorial/06-DocumentLoader/data/people.json', 'seq_num': 5}, page_content='{"name": "Sophia Wilson", "contact": null}')]
# Extract nested data
hobbies_loader = JSONLoader(
    file_path="data/people.json",
    jq_schema=".people[] | {name: .name, hobbies: .personalPreferences.hobbies}",
    text_content=False
)

docs = hobbies_loader.load()
docs
[Document(metadata={'source': '/Users/leejungbin/Downloads/LangChain-OpenTutorial/06-DocumentLoader/data/people.json', 'seq_num': 1}, page_content='{"name": "Alice Johnson", "hobbies": null}'),
     Document(metadata={'source': '/Users/leejungbin/Downloads/LangChain-OpenTutorial/06-DocumentLoader/data/people.json', 'seq_num': 2}, page_content='{"name": "Michael Smith", "hobbies": null}'),
     Document(metadata={'source': '/Users/leejungbin/Downloads/LangChain-OpenTutorial/06-DocumentLoader/data/people.json', 'seq_num': 3}, page_content='{"name": "Emily Davis", "hobbies": null}'),
     Document(metadata={'source': '/Users/leejungbin/Downloads/LangChain-OpenTutorial/06-DocumentLoader/data/people.json', 'seq_num': 4}, page_content='{"name": "David Brown", "hobbies": null}'),
     Document(metadata={'source': '/Users/leejungbin/Downloads/LangChain-OpenTutorial/06-DocumentLoader/data/people.json', 'seq_num': 5}, page_content='{"name": "Sophia Wilson", "hobbies": null}')]
# Get all interesting facts
facts_loader = JSONLoader(
    file_path="data/people.json",
    jq_schema=".people[] | {name: .name, facts: .interestingFacts}",
    text_content=False
)

docs = facts_loader.load()
docs
[Document(metadata={'source': '/Users/leejungbin/Downloads/LangChain-OpenTutorial/06-DocumentLoader/data/people.json', 'seq_num': 1}, page_content='{"name": "Alice Johnson", "facts": null}'),
     Document(metadata={'source': '/Users/leejungbin/Downloads/LangChain-OpenTutorial/06-DocumentLoader/data/people.json', 'seq_num': 2}, page_content='{"name": "Michael Smith", "facts": null}'),
     Document(metadata={'source': '/Users/leejungbin/Downloads/LangChain-OpenTutorial/06-DocumentLoader/data/people.json', 'seq_num': 3}, page_content='{"name": "Emily Davis", "facts": null}'),
     Document(metadata={'source': '/Users/leejungbin/Downloads/LangChain-OpenTutorial/06-DocumentLoader/data/people.json', 'seq_num': 4}, page_content='{"name": "David Brown", "facts": null}'),
     Document(metadata={'source': '/Users/leejungbin/Downloads/LangChain-OpenTutorial/06-DocumentLoader/data/people.json', 'seq_num': 5}, page_content='{"name": "Sophia Wilson", "facts": null}')]
# Extract email and phone together
contact_info = JSONLoader(
    file_path="data/people.json",
    jq_schema='.people[] | {name: .name, email: .contactDetails.email, phone: .contactDetails.phone}',
    text_content=False
)

docs = contact_loader.load()
docs
[Document(metadata={'source': '/Users/leejungbin/Downloads/LangChain-OpenTutorial/06-DocumentLoader/data/people.json', 'seq_num': 1}, page_content='{"name": "Alice Johnson", "contact": null}'),
     Document(metadata={'source': '/Users/leejungbin/Downloads/LangChain-OpenTutorial/06-DocumentLoader/data/people.json', 'seq_num': 2}, page_content='{"name": "Michael Smith", "contact": null}'),
     Document(metadata={'source': '/Users/leejungbin/Downloads/LangChain-OpenTutorial/06-DocumentLoader/data/people.json', 'seq_num': 3}, page_content='{"name": "Emily Davis", "contact": null}'),
     Document(metadata={'source': '/Users/leejungbin/Downloads/LangChain-OpenTutorial/06-DocumentLoader/data/people.json', 'seq_num': 4}, page_content='{"name": "David Brown", "contact": null}'),
     Document(metadata={'source': '/Users/leejungbin/Downloads/LangChain-OpenTutorial/06-DocumentLoader/data/people.json', 'seq_num': 5}, page_content='{"name": "Sophia Wilson", "contact": null}')]

These examples demonstrate the flexibility of jq queries in fetching data in various ways.

Set up the environment. You may refer to for more details.

You can check out the for more details.

Environment Setup
langchain-opentutorial
leebeanbin
syshin0116
Teddy Lee
JaeJun Shim
LangChain Open Tutorial
Environment Set up
JSON
Overview
Generate JSON Data
JSONLoader