LangChain OpenTutorial
  • 🦜️🔗 The LangChain Open Tutorial for Everyone
  • 01-Basic
    • Getting Started on Windows
    • 02-Getting-Started-Mac
    • OpenAI API Key Generation and Testing Guide
    • LangSmith Tracking Setup
    • Using the OpenAI API (GPT-4o Multimodal)
    • Basic Example: Prompt+Model+OutputParser
    • LCEL Interface
    • Runnable
  • 02-Prompt
    • Prompt Template
    • Few-Shot Templates
    • LangChain Hub
    • Personal Prompts for LangChain
    • Prompt Caching
  • 03-OutputParser
    • PydanticOutputParser
    • PydanticOutputParser
    • CommaSeparatedListOutputParser
    • Structured Output Parser
    • JsonOutputParser
    • PandasDataFrameOutputParser
    • DatetimeOutputParser
    • EnumOutputParser
    • Output Fixing Parser
  • 04-Model
    • Using Various LLM Models
    • Chat Models
    • Caching
    • Caching VLLM
    • Model Serialization
    • Check Token Usage
    • Google Generative AI
    • Huggingface Endpoints
    • HuggingFace Local
    • HuggingFace Pipeline
    • ChatOllama
    • GPT4ALL
    • Video Q&A LLM (Gemini)
  • 05-Memory
    • ConversationBufferMemory
    • ConversationBufferWindowMemory
    • ConversationTokenBufferMemory
    • ConversationEntityMemory
    • ConversationKGMemory
    • ConversationSummaryMemory
    • VectorStoreRetrieverMemory
    • LCEL (Remembering Conversation History): Adding Memory
    • Memory Using SQLite
    • Conversation With History
  • 06-DocumentLoader
    • Document & Document Loader
    • PDF Loader
    • WebBaseLoader
    • CSV Loader
    • Excel File Loading in LangChain
    • Microsoft Word(doc, docx) With Langchain
    • Microsoft PowerPoint
    • TXT Loader
    • JSON
    • Arxiv Loader
    • UpstageDocumentParseLoader
    • LlamaParse
    • HWP (Hangeul) Loader
  • 07-TextSplitter
    • Character Text Splitter
    • 02. RecursiveCharacterTextSplitter
    • Text Splitting Methods in NLP
    • TokenTextSplitter
    • SemanticChunker
    • Split code with Langchain
    • MarkdownHeaderTextSplitter
    • HTMLHeaderTextSplitter
    • RecursiveJsonSplitter
  • 08-Embedding
    • OpenAI Embeddings
    • CacheBackedEmbeddings
    • HuggingFace Embeddings
    • Upstage
    • Ollama Embeddings With Langchain
    • LlamaCpp Embeddings With Langchain
    • GPT4ALL
    • Multimodal Embeddings With Langchain
  • 09-VectorStore
    • Vector Stores
    • Chroma
    • Faiss
    • Pinecone
    • Qdrant
    • Elasticsearch
    • MongoDB Atlas
    • PGVector
    • Neo4j
    • Weaviate
    • Faiss
    • {VectorStore Name}
  • 10-Retriever
    • VectorStore-backed Retriever
    • Contextual Compression Retriever
    • Ensemble Retriever
    • Long Context Reorder
    • Parent Document Retriever
    • MultiQueryRetriever
    • MultiVectorRetriever
    • Self-querying
    • TimeWeightedVectorStoreRetriever
    • TimeWeightedVectorStoreRetriever
    • Kiwi BM25 Retriever
    • Ensemble Retriever with Convex Combination (CC)
  • 11-Reranker
    • Cross Encoder Reranker
    • JinaReranker
    • FlashRank Reranker
  • 12-RAG
    • Understanding the basic structure of RAG
    • RAG Basic WebBaseLoader
    • Exploring RAG in LangChain
    • RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
    • Conversation-With-History
    • Translation
    • Multi Modal RAG
  • 13-LangChain-Expression-Language
    • RunnablePassthrough
    • Inspect Runnables
    • RunnableLambda
    • Routing
    • Runnable Parallel
    • Configure-Runtime-Chain-Components
    • Creating Runnable objects with chain decorator
    • RunnableWithMessageHistory
    • Generator
    • Binding
    • Fallbacks
    • RunnableRetry
    • WithListeners
    • How to stream runnables
  • 14-Chains
    • Summarization
    • SQL
    • Structured Output Chain
    • StructuredDataChat
  • 15-Agent
    • Tools
    • Bind Tools
    • Tool Calling Agent
    • Tool Calling Agent with More LLM Models
    • Iteration-human-in-the-loop
    • Agentic RAG
    • CSV/Excel Analysis Agent
    • Agent-with-Toolkits-File-Management
    • Make Report Using RAG, Web searching, Image generation Agent
    • TwoAgentDebateWithTools
    • React Agent
  • 16-Evaluations
    • Generate synthetic test dataset (with RAGAS)
    • Evaluation using RAGAS
    • HF-Upload
    • LangSmith-Dataset
    • LLM-as-Judge
    • Embedding-based Evaluator(embedding_distance)
    • LangSmith Custom LLM Evaluation
    • Heuristic Evaluation
    • Compare experiment evaluations
    • Summary Evaluators
    • Groundedness Evaluation
    • Pairwise Evaluation
    • LangSmith Repeat Evaluation
    • LangSmith Online Evaluation
    • LangFuse Online Evaluation
  • 17-LangGraph
    • 01-Core-Features
      • Understanding Common Python Syntax Used in LangGraph
      • Title
      • Building a Basic Chatbot with LangGraph
      • Building an Agent with LangGraph
      • Agent with Memory
      • LangGraph Streaming Outputs
      • Human-in-the-loop
      • LangGraph Manual State Update
      • Asking Humans for Help: Customizing State in LangGraph
      • DeleteMessages
      • DeleteMessages
      • LangGraph ToolNode
      • LangGraph ToolNode
      • Branch Creation for Parallel Node Execution
      • Conversation Summaries with LangGraph
      • Conversation Summaries with LangGraph
      • LangGrpah Subgraph
      • How to transform the input and output of a subgraph
      • LangGraph Streaming Mode
      • Errors
      • A Long-Term Memory Agent
    • 02-Structures
      • LangGraph-Building-Graphs
      • Naive RAG
      • Add Groundedness Check
      • Adding a Web Search Module
      • LangGraph-Add-Query-Rewrite
      • Agentic RAG
      • Adaptive RAG
      • Multi-Agent Structures (1)
      • Multi Agent Structures (2)
    • 03-Use-Cases
      • LangGraph Agent Simulation
      • Meta Prompt Generator based on User Requirements
      • CRAG: Corrective RAG
      • Plan-and-Execute
      • Multi Agent Collaboration Network
      • Multi Agent Collaboration Network
      • Multi-Agent Supervisor
      • 08-LangGraph-Hierarchical-Multi-Agent-Teams
      • 08-LangGraph-Hierarchical-Multi-Agent-Teams
      • SQL-Agent
      • 10-LangGraph-Research-Assistant
      • LangGraph Code Assistant
      • Deploy on LangGraph Cloud
      • Tree of Thoughts (ToT)
      • Ollama Deep Researcher (Deepseek-R1)
      • Functional API
      • Reflection in LangGraph
  • 19-Cookbook
    • 01-SQL
      • TextToSQL
      • SpeechToSQL
    • 02-RecommendationSystem
      • ResumeRecommendationReview
    • 03-GraphDB
      • Movie QA System with Graph Database
      • 05-TitanicQASystem
      • Real-Time GraphRAG QA
    • 04-GraphRAG
      • Academic Search System
      • Academic QA System with GraphRAG
    • 05-AIMemoryManagementSystem
      • ConversationMemoryManagementSystem
    • 06-Multimodal
      • Multimodal RAG
      • Shopping QnA
    • 07-Agent
      • 14-MoARAG
      • CoT Based Smart Web Search
      • 16-MultiAgentShoppingMallSystem
      • Agent-Based Dynamic Slot Filling
      • Code Debugging System
      • New Employee Onboarding Chatbot
      • 20-LangGraphStudio-MultiAgent
      • Multi-Agent Scheduler System
    • 08-Serving
      • FastAPI Serving
      • Sending Requests to Remote Graph Server
      • Building a Agent API with LangServe: Integrating Currency Exchange and Trip Planning
    • 08-SyntheticDataset
      • Synthetic Dataset Generation using RAG
    • 09-Monitoring
      • Langfuse Selfhosting
Powered by GitBook
On this page
  • Overview
  • Table of Contents
  • References
  • Environment Setup
  • Setup Neo4j
  • What is Neo4j?
  • Prepare Data
  • Introduce Data
  • Preprocessing Data
  • Setting up Neo4j
  • Load Embedding Model
  • Load Neo4j Client
  • Create Index
  • Document Manager
  • Create Instance
  • Upsert Document
  • Upsert Parallel Document
  • Similarity Search
  • Delete Document
  1. 09-VectorStore

Neo4j

PreviousPGVectorNextWeaviate

Last updated 19 hours ago

  • Author:

  • Peer Review: , ,

  • This is a part of

Overview

This tutorial covers how to use Neo4j with LangChain .

Neo4j is a graph database backed by vector store and can be deployed locally or on cloud.

To fully utilize Neo4j, you need to learn about Cypher, declarative query language.

This tutorial walks you through using CRUD operations with the Neo4j storing , updating , deleting documents, and performing similarity-based retrieval .

Table of Contents

References


Environment Setup

[Note]

  • langchain-opentutorial is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials.

%%capture --no-stderr
%pip install langchain-opentutorial
# Install necessary package
%pip install -qU neo4j
Note: you may need to restart the kernel to use updated packages.
# Install required packages
from langchain_opentutorial import package

package.install(
    [
        "langsmith",
        "langchain",
        "langchain_core",
        "langchain_community",
        "langchain_openai",
        "neo4j",
        "nltk",
    ],
    verbose=False,
    upgrade=False,
)
# Set environment variables
from langchain_opentutorial import set_env

set_env(
    {
        "OPENAI_API_KEY": "Your OPENAI API KEY",
        "LANGCHAIN_API_KEY": "Your LangChain API KEY",
        "LANGCHAIN_TRACING_V2": "true",
        "LANGCHAIN_ENDPOINT": "https://api.smith.langchain.com",
        "LANGCHAIN_PROJECT": "Neo4j",
        "NEO4J_URI": "Your Neo4j Aura URI",
        "NEO4J_USERNAME": "Your Neo4j Aura Username",
        "NEO4J_PASSWORD": "Your Neo4j Aura Password",
    }
)
Environment variables have been set successfully.

You can alternatively set API keys such as OPENAI_API_KEY in a .env file and load them.

[Note] This is not necessary if you've already set the required API keys in previous steps.

from dotenv import load_dotenv

load_dotenv(override=True)
True

Setup Neo4j

We have two options to start with: cloud or local deployment.

In this tutorial, we will use the cloud service called Aura, provided by Neo4j.

We will also describe how to deploy Neo4j using Docker.

  • Getting started with Aura

    Visit the website and click "Get Started" Free at the top right.

    Once you have signed in, you will see a button, Create instance, and after that, you will see your username and password.

    To get your API key, click Download and continue to download a .txt file that contains the API key to connect your NEO4j Aura .

  • Getting started with Docker

    Here is the description for how to run Neo4j using Docker.

    To run Neo4j container , use the following command.

    docker run \
        -itd \
        --publish=7474:7474 --publish=7687:7687 \
        --volume=$HOME/neo4j/data:/data \
        --env=NEO4J_AUTH=none \
        --name neo4j \
        neo4j

    You can visit Neo4j Docker installation reference to check more detailed information.

[NOTE]

  • Neo4j also supports native deployment on macOS, Windows and Linux. Visit the Neo4j Official Installation guide reference for more details.

  • The Neo4j community edition only supports one database.

What is Neo4j?

Neo4j is a native graph database, which means it represents data as nodes and edges.

  • Nodes

    • label: tag to represent node role in a domain.

    • property: key-value pairs, e.g. name-John.

  • Edges

    • Represents relationship between two nodes.

    • Directional, which means it has start and end node.

    • property: like nodes, edge can have properties.

  • NoSQL

    • Neo4j does not require predefined schema allowing flexible data modeling.

  • Cypher

    • Neo4j uses Cypher, a declarative query language, to interact with the database.

    • Cypher expression resembles how humans think about relationships.

Prepare Data

This section guides you through the data preparation process .

This section includes the following components:

  • Data Introduction

  • Preprocess Data

Introduce Data

In this tutorial, we will use the fairy tale 📗 The Little Prince in PDF format as our data.

This material complies with the Apache 2.0 license .

The data is used in a text (.txt) format converted from the original PDF.

You can view the data at the link below.

Preprocessing Data

In this tutorial section, we will preprocess the text data from The Little Prince and convert it into a list of LangChain Document objects with metadata.

Each document chunk will include a title field in the metadata, extracted from the first line of each section.

from langchain.schema import Document
from langchain.text_splitter import RecursiveCharacterTextSplitter
import re
from typing import List


def preprocessing_data(content: str) -> List[Document]:
    # 1. Split the text by double newlines to separate sections
    blocks = content.split("\n\n")

    # 2. Initialize the text splitter
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=500,  # Maximum number of characters per chunk
        chunk_overlap=50,  # Overlap between chunks to preserve context
        separators=["\n\n", "\n", " "],  # Order of priority for splitting
    )

    documents = []

    # 3. Loop through each section
    for block in blocks:
        lines = block.strip().splitlines()
        if not lines:
            continue

        # Extract title from the first line using square brackets [ ]
        first_line = lines[0]
        title_match = re.search(r"\[(.*?)\]", first_line)
        title = title_match.group(1).strip() if title_match else None

        # Remove the title line from content
        body = "\n".join(lines[1:]).strip()
        if not body:
            continue

        # 4. Chunk the section using the text splitter
        chunks = text_splitter.split_text(body)

        # 5. Create a LangChain Document for each chunk with the same title metadata
        for chunk in chunks:
            documents.append(Document(page_content=chunk, metadata={"title": title}))

    print(f"Generated {len(documents)} chunked documents.")

    return documents
# Load the entire text file
with open("./data/the_little_prince.txt", "r", encoding="utf-8") as f:
    content = f.read()

# Preprocessing Data

docs = preprocessing_data(content=content)
Generated 262 chunked documents.

Setting up Neo4j

This part walks you through the initial setup of Neo4j .

This section includes the following components:

  • Load Embedding Model

  • Load Neo4j Client

  • Create Index

Load Embedding Model

In the Load Embedding Model section, you'll learn how to load an embedding model.

This tutorial uses OpenAI's API-Key for loading the model.

💡 If you prefer to use another embedding model, see the instructions below.

import os
from langchain_openai import OpenAIEmbeddings

embedding = OpenAIEmbeddings(model="text-embedding-3-large")

Load Neo4j Client

In the Load Neo4j Client section, we cover how to load the database client object using the Python SDK for Neo4j .

import time
import neo4j


# Create Database Client Object Function


def get_db_client(uri, username, password):
    """

    Initializes and returns a VectorStore client instance.


    This function loads configuration (e.g., API key, host) from environment

    variables or default values and creates a client object to interact

    with the Neo4j Python SDK.


    Returns:

        client:ClientType - An instance of the Neo4j client.


    Raises:

        ValueError: If required configuration is missing.

    """

    client = neo4j.GraphDatabase.driver(uri=uri, auth=(username, password))

    return client
# Get DB Client Object
uri = os.getenv("NEO4J_URI")
username = os.getenv("NEO4J_USERNAME")
password = os.getenv("NEO4J_PASSWORD")

client = get_db_client(uri, username, password)

Create Index

If you are successfully connected to Neo4j Aura, some basic indexes are already created.

But, in this tutorial we will create a new index with Neo4jIndexManager class.

from utils.neo4j import Neo4jIndexManager

#  Create IndexManager Object
indexManger = Neo4jIndexManager(client)

# Create A New Index
index_name = "tutorial_index"
node_label = "tutorial_node"

# create a new index
try:
    tutorial_index = indexManger.create_index(
        embedding, index_name=index_name, metric="cosine", node_label=node_label
    )
except Exception as e:
    print("Index creation failed due to")
    print(type(e))
    print(str(e))
Index with name tutorial_index already exists.
    Returning Neo4jDBManager object.
    Created index information
    ('Index name: tutorial_index', 'Node label: tutorial_node', 'Similarity metric: COSINE', 'Embedding dimension: 3072', 'Embedding node property: embedding', 'Text node property: text')
    Index creation successful. Return Neo4jDBManager object.
    Index creation failed due to
    
    name 'Neo4jCRUDManager' is not defined

Document Manager

To support the Langchain-Opentutorial , we implemented a custom set of CRUD functionalities for VectorDBs.

The following operations are included:

  • upsert : Update existing documents or insert if they don’t exist

  • upsert_parallel : Perform upserts in parallel for large-scale data

  • similarity_search : Search for similar documents based on embeddings

  • delete : Remove documents based on filter conditions

Each of these features is implemented as class methods specific to each VectorDB.

In this tutorial, you can easily utilize these methods to interact with your VectorDB.

We plan to continuously expand the functionality by adding more common operations in the future.

Create Instance

First, we create an instance of the Neo4j helper class to use its CRUD functionalities.

This class is initialized with the Neo4j Python SDK client instance, index name and the embedding model instance , both of which were defined in the previous section.

from utils.neo4j import Neo4jDocumentManager

crud_manager = Neo4jDocumentManager(
    client=client, index_name="tutorial_index", embedding=embedding
)

Now you can use the following CRUD operations with the crud_manager instance.

These instance allow you to easily manage documents in your Neo4j .

Upsert Document

Update existing documents or insert if they don’t exist

✅ Args

  • texts : Iterable[str] – List of text contents to be inserted/updated.

  • metadatas : Optional[List[Dict]] – List of metadata dictionaries for each text (optional).

  • ids : Optional[List[str]] – Custom IDs for the documents. If not provided, IDs will be auto-generated.

  • **kwargs : Extra arguments for the underlying vector store.

🔄 Return

  • None

from uuid import uuid4

# Create ID for each document
ids = [str(uuid4()) for _ in docs]

args = {
    "texts": [doc.page_content for doc in docs[:2]],
    "metadatas": [doc.metadata for doc in docs[:2]],
    "ids": ids[:2],
    # Add additional parameters if you need
}

crud_manager.upsert(**args)

Upsert Parallel Document

Perform upserts in parallel for large-scale data

✅ Args

  • texts : Iterable[str] – List of text contents to be inserted/updated.

  • metadatas : Optional[List[Dict]] – List of metadata dictionaries for each text (optional).

  • ids : Optional[List[str]] – Custom IDs for the documents. If not provided, IDs will be auto-generated.

  • batch_size : int – Number of documents per batch (default: 32).

  • workers : int – Number of parallel workers (default: 10).

  • **kwargs : Extra arguments for the underlying vector store.

🔄 Return

  • None

from uuid import uuid4

args = {
    "texts": [doc.page_content for doc in docs],
    "metadatas": [doc.metadata for doc in docs],
    "ids": ids,
    # Add additional parameters if you need
}

crud_manager.upsert_parallel(**args)

Similarity Search

Search for similar documents based on embeddings .

This method uses "cosine similarity" .

✅ Args

  • query : str – The text query for similarity search.

  • k : int – Number of top results to return (default: 10).

**kwargs : Additional search options (e.g., filters).

🔄 Return

  • results : List[Document] – A list of LangChain Document objects ranked by similarity.

# Search by query
results = crud_manager.search(query="What is essential is invisible to the eye.", k=3)
for idx, result in enumerate(results):
    print(f"Rank {idx+1}")
    print(f"Contents : {result['text']}")
    print(f"Metadata: {result['metadata']}")
    print(f"Similarity Score: {result['score']}")
    print()
Rank 1
    Contents : And he went back to meet the fox. 
    "Goodbye," he said. 
    "Goodbye," said the fox. "And now here is my secret, a very simple secret: It is only with the heart that one can see rightly; what is essential is invisible to the eye." 
    "What is essential is invisible to the eye," the little prince repeated, so that he would be sure to remember.
    "It is the time you have wasted for your rose that makes your rose so important."
    Metadata: {'id': '148b9b3f-2231-4ebd-86d8-6aa841c4ac1b', 'title': 'Chapter 21', 'embedding': None}
    Similarity Score: 0.755
    
    Rank 2
    Contents : "Yes," I said to the little prince. "The house, the stars, the desert-- what gives them their beauty is something that is invisible!" 
    "I am glad," he said, "that you agree with my fox."
    Metadata: {'id': '62df5e3c-2668-4f5c-96ea-23c5b7a38351', 'title': 'Chapter 24', 'embedding': None}
    Similarity Score: 0.748
    
    Rank 3
    Contents : "The men where you live," said the little prince, "raise five thousand roses in the same garden-- and they do not find in it what they are looking for." 
    "They do not find it," I replied. 
    "And yet what they are looking for could be found in one single rose, or in a little water." 
    "Yes, that is true," I said. 
    And the little prince added: 
    "But the eyes are blind. One must look with the heart..."
    Metadata: {'id': 'ff93762d-6bde-44f4-b3d2-c6dc466b46a8', 'title': 'Chapter 25', 'embedding': None}
    Similarity Score: 0.711
    

Delete Document

Remove documents based on filter conditions

✅ Args

  • ids : Optional[List[str]] – List of document IDs to delete. If None, deletion is based on filter.

  • filters : Optional[Dict] – Dictionary specifying filter conditions (e.g., metadata match).

  • **kwargs : Any additional parameters.

🔄 Return

  • Boolean

# Delete by ids
ids = ids[:5]  # The 'ids' value you want to delete
crud_manager.delete(ids=ids)
True
# Delete by ids with filters
filters = {"page": 6}
crud_manager.delete(filters={"title": "chapter 6"})
True
# Delete All
crud_manager.delete()
True

Set up the environment. You may refer to for more details.

You can checkout the for more details.

You can create a new Neo4j Aura account on the official website.

Cypher
Neo4j Docker Installation
Neo4j Official Installation guide
Neo4j Python SDK document
Neo4j document
Langchain Neo4j document
Environment Setup
langchain-opentutorial
Neo4j
Data Link
Embedding Models
Neo4j Python SDK Docs
Jongho Lee
HyeonJong Moon
Haseom Shin
Sohyeon Yim
LangChain Open Tutorial
Overview
Environment Setup
What is Neo4j?
Prepare Data
Setting up Neo4j
Document Manager