RecursiveJsonSplitter
Author: HeeWung Song(Dan)
Peer Review : BokyungisaGod, Chaeyoon Kim
Proofread : Chaeyoon Kim
This is a part of LangChain Open Tutorial
Overview
This JSON splitter generates smaller JSON chunks by performing a depth-first traversal of JSON data.
The splitter aims to keep nested JSON objects intact as much as possible. However, to ensure chunk sizes remain within the min_chunk_size and max_chunk_size, it will split objects if needed. Note that very large string values (those not containing nested JSON) are not subject to splitting.
If precise control over chunk size is required, you can use a recursive text splitter on the chunks this splitter creates.
Splitting Criteria
Text splitting method: Based on JSON values
Chunk size: Determined by character count
Table of Contents
References
Environment Setup
Setting up your environment is the first step. See the Environment Setup guide for more details.
[Note]
The
langchain-opentutorialis a package of easy-to-use environment setup guidance, useful functions and utilities for tutorials.Check out the
langchain-opentutorialfor more details.
Alternatively, you can set and load OPENAI_API_KEY from a .env file.
[Note] This is only necessary if you haven't already set OPENAI_API_KEY in previous steps.
Basic JSON Splitting
Let's explore the basic methods of splitting JSON data using the RecursiveJsonSplitter.
JSON data preparation
RecursiveJsonSplitterconfigurationThree splitting methods (
split_json,create_documents, andsplit_text)Chunk size verification
Here is an example of splitting JSON data with the RecursiveJsonSplitter.
Use the splitter.split_json() method to recursively split JSON data.
The following code demonstrates two methods for splitting JSON data using a splitter object (like an instance of RecursiveJsonSplitter): use the splitter.create_documents() method to convert JSON data into Document objects, and use the splitter.split_text() method to split JSON data into a list of strings.
Handling JSON Structure
Let's explore how the RecursiveJsonSplitter handles different JSON structures and its limitations.
Verification of list object size
Parsing JSON structures
Using the
convert_listsparameter for list transformation
By examining texts[2] (one of the larger chunks), we can confirm it contains a list object.
The second chunk exceeds the size limit (300) because it contains a list.
The
RecursiveJsonSplitteris designed not to split list objects.
You can parse the chunk at index 2 using the json module.
Setting the convert_lists parameter to True transforms JSON lists into key:value pairs (formatted as index:item).
You can access specific documents within the docs list using their index.
Last updated