LangChain OpenTutorial
  • 🦜️🔗 The LangChain Open Tutorial for Everyone
  • 01-Basic
    • Getting Started on Windows
    • 02-Getting-Started-Mac
    • OpenAI API Key Generation and Testing Guide
    • LangSmith Tracking Setup
    • Using the OpenAI API (GPT-4o Multimodal)
    • Basic Example: Prompt+Model+OutputParser
    • LCEL Interface
    • Runnable
  • 02-Prompt
    • Prompt Template
    • Few-Shot Templates
    • LangChain Hub
    • Personal Prompts for LangChain
    • Prompt Caching
  • 03-OutputParser
    • PydanticOutputParser
    • PydanticOutputParser
    • CommaSeparatedListOutputParser
    • Structured Output Parser
    • JsonOutputParser
    • PandasDataFrameOutputParser
    • DatetimeOutputParser
    • EnumOutputParser
    • Output Fixing Parser
  • 04-Model
    • Using Various LLM Models
    • Chat Models
    • Caching
    • Caching VLLM
    • Model Serialization
    • Check Token Usage
    • Google Generative AI
    • Huggingface Endpoints
    • HuggingFace Local
    • HuggingFace Pipeline
    • ChatOllama
    • GPT4ALL
    • Video Q&A LLM (Gemini)
  • 05-Memory
    • ConversationBufferMemory
    • ConversationBufferWindowMemory
    • ConversationTokenBufferMemory
    • ConversationEntityMemory
    • ConversationKGMemory
    • ConversationSummaryMemory
    • VectorStoreRetrieverMemory
    • LCEL (Remembering Conversation History): Adding Memory
    • Memory Using SQLite
    • Conversation With History
  • 06-DocumentLoader
    • Document & Document Loader
    • PDF Loader
    • WebBaseLoader
    • CSV Loader
    • Excel File Loading in LangChain
    • Microsoft Word(doc, docx) With Langchain
    • Microsoft PowerPoint
    • TXT Loader
    • JSON
    • Arxiv Loader
    • UpstageDocumentParseLoader
    • LlamaParse
    • HWP (Hangeul) Loader
  • 07-TextSplitter
    • Character Text Splitter
    • 02. RecursiveCharacterTextSplitter
    • Text Splitting Methods in NLP
    • TokenTextSplitter
    • SemanticChunker
    • Split code with Langchain
    • MarkdownHeaderTextSplitter
    • HTMLHeaderTextSplitter
    • RecursiveJsonSplitter
  • 08-Embedding
    • OpenAI Embeddings
    • CacheBackedEmbeddings
    • HuggingFace Embeddings
    • Upstage
    • Ollama Embeddings With Langchain
    • LlamaCpp Embeddings With Langchain
    • GPT4ALL
    • Multimodal Embeddings With Langchain
  • 09-VectorStore
    • Vector Stores
    • Chroma
    • Faiss
    • Pinecone
    • Qdrant
    • Elasticsearch
    • MongoDB Atlas
    • PGVector
    • Neo4j
    • Weaviate
    • Faiss
    • {VectorStore Name}
  • 10-Retriever
    • VectorStore-backed Retriever
    • Contextual Compression Retriever
    • Ensemble Retriever
    • Long Context Reorder
    • Parent Document Retriever
    • MultiQueryRetriever
    • MultiVectorRetriever
    • Self-querying
    • TimeWeightedVectorStoreRetriever
    • TimeWeightedVectorStoreRetriever
    • Kiwi BM25 Retriever
    • Ensemble Retriever with Convex Combination (CC)
  • 11-Reranker
    • Cross Encoder Reranker
    • JinaReranker
    • FlashRank Reranker
  • 12-RAG
    • Understanding the basic structure of RAG
    • RAG Basic WebBaseLoader
    • Exploring RAG in LangChain
    • RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval
    • Conversation-With-History
    • Translation
    • Multi Modal RAG
  • 13-LangChain-Expression-Language
    • RunnablePassthrough
    • Inspect Runnables
    • RunnableLambda
    • Routing
    • Runnable Parallel
    • Configure-Runtime-Chain-Components
    • Creating Runnable objects with chain decorator
    • RunnableWithMessageHistory
    • Generator
    • Binding
    • Fallbacks
    • RunnableRetry
    • WithListeners
    • How to stream runnables
  • 14-Chains
    • Summarization
    • SQL
    • Structured Output Chain
    • StructuredDataChat
  • 15-Agent
    • Tools
    • Bind Tools
    • Tool Calling Agent
    • Tool Calling Agent with More LLM Models
    • Iteration-human-in-the-loop
    • Agentic RAG
    • CSV/Excel Analysis Agent
    • Agent-with-Toolkits-File-Management
    • Make Report Using RAG, Web searching, Image generation Agent
    • TwoAgentDebateWithTools
    • React Agent
  • 16-Evaluations
    • Generate synthetic test dataset (with RAGAS)
    • Evaluation using RAGAS
    • HF-Upload
    • LangSmith-Dataset
    • LLM-as-Judge
    • Embedding-based Evaluator(embedding_distance)
    • LangSmith Custom LLM Evaluation
    • Heuristic Evaluation
    • Compare experiment evaluations
    • Summary Evaluators
    • Groundedness Evaluation
    • Pairwise Evaluation
    • LangSmith Repeat Evaluation
    • LangSmith Online Evaluation
    • LangFuse Online Evaluation
  • 17-LangGraph
    • 01-Core-Features
      • Understanding Common Python Syntax Used in LangGraph
      • Title
      • Building a Basic Chatbot with LangGraph
      • Building an Agent with LangGraph
      • Agent with Memory
      • LangGraph Streaming Outputs
      • Human-in-the-loop
      • LangGraph Manual State Update
      • Asking Humans for Help: Customizing State in LangGraph
      • DeleteMessages
      • DeleteMessages
      • LangGraph ToolNode
      • LangGraph ToolNode
      • Branch Creation for Parallel Node Execution
      • Conversation Summaries with LangGraph
      • Conversation Summaries with LangGraph
      • LangGrpah Subgraph
      • How to transform the input and output of a subgraph
      • LangGraph Streaming Mode
      • Errors
      • A Long-Term Memory Agent
    • 02-Structures
      • LangGraph-Building-Graphs
      • Naive RAG
      • Add Groundedness Check
      • Adding a Web Search Module
      • LangGraph-Add-Query-Rewrite
      • Agentic RAG
      • Adaptive RAG
      • Multi-Agent Structures (1)
      • Multi Agent Structures (2)
    • 03-Use-Cases
      • LangGraph Agent Simulation
      • Meta Prompt Generator based on User Requirements
      • CRAG: Corrective RAG
      • Plan-and-Execute
      • Multi Agent Collaboration Network
      • Multi Agent Collaboration Network
      • Multi-Agent Supervisor
      • 08-LangGraph-Hierarchical-Multi-Agent-Teams
      • 08-LangGraph-Hierarchical-Multi-Agent-Teams
      • SQL-Agent
      • 10-LangGraph-Research-Assistant
      • LangGraph Code Assistant
      • Deploy on LangGraph Cloud
      • Tree of Thoughts (ToT)
      • Ollama Deep Researcher (Deepseek-R1)
      • Functional API
      • Reflection in LangGraph
  • 19-Cookbook
    • 01-SQL
      • TextToSQL
      • SpeechToSQL
    • 02-RecommendationSystem
      • ResumeRecommendationReview
    • 03-GraphDB
      • Movie QA System with Graph Database
      • 05-TitanicQASystem
      • Real-Time GraphRAG QA
    • 04-GraphRAG
      • Academic Search System
      • Academic QA System with GraphRAG
    • 05-AIMemoryManagementSystem
      • ConversationMemoryManagementSystem
    • 06-Multimodal
      • Multimodal RAG
      • Shopping QnA
    • 07-Agent
      • 14-MoARAG
      • CoT Based Smart Web Search
      • 16-MultiAgentShoppingMallSystem
      • Agent-Based Dynamic Slot Filling
      • Code Debugging System
      • New Employee Onboarding Chatbot
      • 20-LangGraphStudio-MultiAgent
      • Multi-Agent Scheduler System
    • 08-Serving
      • FastAPI Serving
      • Sending Requests to Remote Graph Server
      • Building a Agent API with LangServe: Integrating Currency Exchange and Trip Planning
    • 08-SyntheticDataset
      • Synthetic Dataset Generation using RAG
    • 09-Monitoring
      • Langfuse Selfhosting
Powered by GitBook
On this page
  • Overview
  • Table of Contents
  • References
  • Environment Setup
  • Initialization
  • Deploy a cluster
  • Connect to your cluster
  • Initialize MongoDBAtlas and MongoDBAtlasDocumentManager
  • Initialize MongoDB database and collection
  • Atlas Vector Search Indexes
  • Create a Search Index or Vector Search Index
  • Update a Search Index
  • Delete a Search Index
  • Vector Store
  • Create a Index
  • Load Data
  • Document loaders
  • Data Preprocessing
  • Preserving text file structure
  • Text splitters
  • Add metadata
  • Manage vector store
  • Add
  • Query Filter
  • Delete
  • Query vector store
  • Semantic Search
  • Semantic Search with Score
  • Semantic Search with Filtering
  • CRUD Operations with PyMongo
  • Setting up with an empty collection
  • Upsert
  • Read with Evaluation Operators
  • Update with query filter
  • Upsert option
  • Delete with query filter
  1. 09-VectorStore

MongoDB Atlas

PreviousElasticsearchNextPGVector

Last updated 28 days ago

  • Author:

  • Peer Review : ,

  • This is a part of

Overview

This tutorial covers the initial setup process for users who are new to MongoDB Atlas.

If you're already familiar with MongoDB Atlas, you can skip the section.

All examples run on a free cluster, and once you add a collection to your database, you'll be ready to start.

You’ll learn preprocessing to preserve document structure after loading data from a The Little Prince file, how to add and delete documents to a collection, and manage vector store.

Once the documents added, you can learn how to query your data using semantic search, index updates for filtering, and MQL operators.

By the end of this tutorial, you'll be able to integrate PyMongo with LangChain and use VectorStore.

Table of Contents

References


Environment Setup

[Note]

  • langchain-opentutorial is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials.

%%capture --no-stderr
%pip install langchain-opentutorial
# Install required packages
from langchain_opentutorial import package

package.install(
    [
        "langchain_openai",
        "langsmith",
        "langchain_core",
        "langchain_community",
        "langchain-mongodb",
        "pymongo",
        "certifi",
    ],
    verbose=False,
    upgrade=False,
)
    [notice] A new release of pip is available: 24.1 -> 25.0.1
    [notice] To update, run: pip install --upgrade pip
# Set environment variables
from langchain_opentutorial import set_env

set_env(
    {
        "OPENAI_API_KEY": "",
        "LANGCHAIN_API_KEY": "",
        "MONGODB_ATLAS_CLUSTER_URI": "",
        "LANGCHAIN_TRACING_V2": "true",
        "LANGCHAIN_ENDPOINT": "https://api.smith.langchain.com",
        "LANGCHAIN_PROJECT": "07-MongoDB-Atlas",
    }
)
Environment variables have been set successfully.

You can alternatively set API keys such as OPENAI_API_KEY in a .env file and load them.

[Note] This is not necessary if you've already set the required API keys in previous steps.

If you are already using MongoDB Atlas, you can set the cluster connection string to MONGODB_ATLAS_CLUSTER_URI in your .env file.

# Load API keys from .env file
from dotenv import load_dotenv

load_dotenv(override=True)
True

Initialization

After you register with and log in to Atlas, you can create a Free cluster.

Atlas CLI can be difficult to use if you're not used to working with development tools, so this tutorial will walk you through how to use Atlas UI.

Deploy a cluster

Please select the appropriate project in your Organization. If the project doesn't exist, you'll need to create it.

If you select a project, you can create a cluster.

Follow the procedure below to deploy a cluster

  • select Cluster: M0 Free cluster option

Note: You can deploy only one Free cluster per Atlas project

  • select Provider: M0 on AWS, GCP, and Azure

  • select Region

  • create a database user and add your IP address settings.

After you deploy a cluster, you can see the cluster you deployed as shown in the image below.

Connect to your cluster

Click Get connection string in the image above to get the cluster URI and set the value of MONGODB_ATLAS_CLUSTER_URI in the .env file.

The connection string resembles the following example:

mongodb+srv://[databaseUser]:[databasePassword]@[clusterName].[hostName].mongodb.net/?retryWrites=true&w=majority

Initialize MongoDBAtlas and MongoDBAtlasDocumentManager

MongoDBAtlas manages MongoDB collections and vector store.

  • You can also create a vector store that integrates Atlas Vector Search and Langchain.

MongoDBAtlasDocumentManager that handles document processing and CRUD operations in MongoDB Atlas.

Initialize MongoDB database and collection

  • A MongoDB database stores a collections of documents.

from utils.mongodb_atlas import MongoDBAtlas, MongoDBAtlasDocumentManager

DB_NAME = "langchain-opentutorial-db"
COLLECTION_NAME = "little-prince"

atlas = MongoDBAtlas(DB_NAME, COLLECTION_NAME)
document_manager = MongoDBAtlasDocumentManager(atlas=atlas)

You can browse collections to see the little-prince collection you just created and the sample data provided by Atlas.

In this tutorial, we will use the little-prince collection in the langchain-opentutorial-db database.

Atlas Vector Search Indexes

When performing vector search in Atlas, you must create an Atlas Vector Search Index.

Create a Search Index or Vector Search Index

You can define Atlas Search Index or Atlas Vector Search Index using SearchIndexModel object.

  • definition : define the Search Index.

  • name : query the Search Index by name.

from pymongo.operations import SearchIndexModel

TEST_SEARCH_INDEX_NAME = "test_search_index"
TEST_VECTOR_SEARCH_INDEX_NAME = "test_vector_index"

search_index = SearchIndexModel(
    definition={
        "mappings": {"dynamic": True},
    },
    name=TEST_SEARCH_INDEX_NAME,
)

vector_index = SearchIndexModel(
    definition={
        "fields": [
            {
                "type": "vector",
                "numDimensions": 1536,
                "path": "embedding",
                "similarity": "cosine",
            }
        ]
    },
    name=TEST_VECTOR_SEARCH_INDEX_NAME,
    type="vectorSearch",
)
  • create_index : create a single Atlas Search Index or Atlas Vector Search Index. Checks internally if a Search Index with the same name exists.

atlas.create_index(TEST_SEARCH_INDEX_NAME, search_index)
atlas.create_index(TEST_VECTOR_SEARCH_INDEX_NAME, vector_index)

Click the Atlas Search tab to see the search indexes that you created.

Update a Search Index

  • update_index : update an Atlas Search Index or Atlas Vector Search Index.

new_vector_index = {
    "fields": [
        {
            "type": "vector",
            "numDimensions": 1536,
            "path": "embedding",
            "similarity": "euclidean",
        }
    ]
}

atlas.update_index(TEST_VECTOR_SEARCH_INDEX_NAME, definition=new_vector_index)

If the update is successful, click test_vector_index in the list of Index Name on the Atlas Search tab to see more information.

You can see that the Similarity Method for the Vector Field has changed to euclidean.

You can also click the Edit Index Definition button on the right side of the Atlas UI to update it.

Delete a Search Index

  • delete_index : remove an Atlas Search Index or Atlas Vector Search Index.

atlas.delete_index(TEST_SEARCH_INDEX_NAME)
atlas.delete_index(TEST_VECTOR_SEARCH_INDEX_NAME)

Vector Store

  • create_vector_store : create a vector store using MongoDBAtlasVectorSearch .

    • embedding : embedding model to use.

    • index_name : index to use when querying the vector store.

    • relevance_score_fn : similarity score used for the index. You can choose from euclidean, cosine, and dotProduct.

from langchain_openai import OpenAIEmbeddings

embedding = OpenAIEmbeddings(model="text-embedding-3-small")
TUTORIAL_VECTOR_SEARCH_INDEX_NAME = "langchain-opentutorial-index"

atlas.create_vector_store(
    embedding=embedding,
    index_name=TUTORIAL_VECTOR_SEARCH_INDEX_NAME,
    relevance_score_fn="cosine",
)

Create a Index

atlas.create_vector_search_index(dimensions=1536)

Click the Atlas Search tab to see the search index langchain-opentutorial-index that you created.

Load Data

LangChain provides Document loaders that can load a variety of data sources.

Document loaders

  • get_documents : use TextLoader to add data from the the_little_prince.txt in the data directory to the little-prince collection.

documents = document_manager.get_documents(
    file_path="./data/the_little_prince.txt", encoding="utf-8"
)

The get_documents method returns List[Document].

  • metadata : data associated with content

  • page_content : string text

Data Preprocessing

Preserving text file structure

  • split_by_chapter

    • To preserve the structure of the text file, let's modify it to split the file into chapters.

    • the_little_prince.txt used [ Chapter X ] as a delimiter to separate the chapters.

from typing import List


def split_by_chapter(text: str) -> List[str]:
    chapters = text.split("[ Chapter ")
    return [chapter.split(" ]", 1)[-1].strip() for chapter in chapters]
  • split_documents : split documents by chapter

    • Add doc_index to metadata

split_chapters = document_manager.split_documents(
    documents, split_condition=split_by_chapter, split_index_name="doc_index"
)

If you compare the documents to split_chapters , you can see that page_content is split by chapter .

first_chapter = split_chapters[1]
print(f"{first_chapter.page_content[:30]}, metadata: {first_chapter.metadata}")
- we are introduced to the nar, metadata: {'doc_index': 1}

Text splitters

Splitting a Document into appropriately sized chunks allows you to process text data more efficiently.

To split a Document while preserving paragraph and sentence structure, use RecursiveCharacterTextSplitter .

  • chunk_size : setting the maximum size of chunks

  • chunk_overlap : setting the character overlap size between chunks

from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=200)
split_documents = document_manager.split_documents_by_splitter(
    text_splitter, split_chapters
)

Add metadata

Splitting the document into chunk_size increases the number of documents.

Add an chunk_index key to the metadata to identify the document index, since it is not split into one Document per chapter.

for index, doc in enumerate(split_documents):
    doc.metadata.update({"chunk_index": index})

The chunk_index has been added to the metadata.

You can see that some of the page_content text in the Document overlaps.

Manage vector store

Now that you've initialized the vector_store and loaded the data, you can add and delete Documents to the little-prince collection.

Add

  • add_documents : Add documents to the vector_store and returns a List of IDs for the added documents.

ids = atlas.add_documents(documents=split_documents)

delete function allow specify the Document IDs to delete, so ids store the IDs of the added documents.

Check the first document ID. The number of IDs matches the number of documents, and each ID is a unique value.

In the image below, after adding documents the STORAGE SIZE of the collection increases and you can see the documents corresponding to each ID, such as ids[0] .

The embedding field is a vector representation of the text data. It is used to determine similarity to the query vector for vector search.

Query Filter

Create a Document object, add it to a collection.

from langchain_core.documents import Document

sample_document = Document(
    page_content="I am leveraging my experience as a developer to provide development education and nurture many new developers.",
    metadata={"source": "linkedin"},
)
sample_id = atlas.add_documents([sample_document])

TOTAL DOCUMENTS has increased from 167 to 168.

On the last page, you can see the page_content of sample_document .

Alternatively, you can add query filter, such as the source field, to view the search results.

Delete

You can specify the document IDs to delete as arguments to the delete_documents function, such as sample_id .

atlas.delete_documents(ids=sample_id)
True

If True returns, the deletion is successful.

You can see that TOTAL DOCUMENTS has decreasesd from 168 to 167 and that sample_document has been deleted.

Query vector store

Make a query related to the content of The Little Prince and see if the vector_store returns results from a search for similar documents.

The query is based on the most well-known story about the relationship between the Little Prince and the Fox.

query = "What does it mean to be tamed according to the fox?"

Semantic Search

similarity_search method performs a basic semantic search

The k parameter in the example below specifies the number of documents.

It returns a List[Document] ranked by relevance.

atlas.similarity_search(query=query, k=1)
[Document(metadata={'_id': '67b07b9602e46738df0bbb2e', 'doc_index': 21, 'chunk_index': 122}, page_content='The fox gazed at the little prince, for a long time. \n(picture)\n"Please-- tame me!" he said. \n"I want to, very much," the little prince replied. "But I have not much time. I have friends to discover, and a great many things to understand." \n"One only understands the things that one tames," said the fox. "Men have no more time to understand anything. They buy things all ready made at the shops. But there is no shop anywhere where one can buy friendship, and so men have no friends any more. If you want a friend, tame me..." \n"What must I do, to tame you?" asked the little prince.')]

Semantic Search with Score

similarity_search_with_score method also performs a semantic search.

The difference with the similarity_search method is that it returns a relevance score of documents between 0 and 1.

atlas.similarity_search_with_score(query=query, k=3)
[(Document(metadata={'_id': '67b07b9602e46738df0bbb2e', 'doc_index': 21, 'chunk_index': 122}, page_content='The fox gazed at the little prince, for a long time. \n(picture)\n"Please-- tame me!" he said. \n"I want to, very much," the little prince replied. "But I have not much time. I have friends to discover, and a great many things to understand." \n"One only understands the things that one tames," said the fox. "Men have no more time to understand anything. They buy things all ready made at the shops. But there is no shop anywhere where one can buy friendship, and so men have no friends any more. If you want a friend, tame me..." \n"What must I do, to tame you?" asked the little prince.'),
      0.8047155141830444),
     (Document(metadata={'_id': '67b07b9602e46738df0bbb2a', 'doc_index': 21, 'chunk_index': 118}, page_content='"No," said the little prince. "I am looking for friends. What does that mean-- ‘tame‘?" \n"It is an act too often neglected," said the fox. It means to establish ties." \n"\'To establish ties\'?"\n"Just that," said the fox. "To me, you are still nothing more than a little boy who is just like a hundred thousand other little boys. And I have no need of you. And you, on your part, have no need of me. To you, I am nothing more than a fox like a hundred thousand other foxes. But if you tame me, then we shall need each other. To me, you will be unique in all the world. To you, I shall be unique in all the world..." \n"I am beginning to understand," said the little prince. "There is a flower... I think that she has tamed me..."'),
      0.7951536178588867),
     (Document(metadata={'_id': '67b07b9602e46738df0bbb29', 'doc_index': 21, 'chunk_index': 117}, page_content='"What does that mean-- ‘tame‘?" \n"You do not live here," said the fox. "What is it that you are looking for?" \n"I am looking for men," said the little prince. "What does that mean-- ‘tame‘?" \n"Men," said the fox. "They have guns, and they hunt. It is very disturbing. They also raise chickens. These are their only interests. Are you looking for chickens?" \n"No," said the little prince. "I am looking for friends. What does that mean-- ‘tame‘?" \n"It is an act too often neglected," said the fox. It means to establish ties." \n"\'To establish ties\'?"'),
      0.7918769717216492)]

Semantic Search with Filtering

MongoDB Atlas supports pre-filtering your data using MongoDB Query Language(MQL) Operators.

You must update the index definition using update_vector_search_index .

atlas.update_vector_search_index(dimensions=1536, filters=["chunk_index"])

Notice that chunk_index have been added to the Index Fields and Documents have been added as well.

There are comparison query operators that find values that match a condition.

For example, the $eq operator finds documents that match a specified value.

Now you can add a pre_filter condition that documents chunk_index are lower than or equal to 120 using the $lte operator.

atlas.similarity_search_with_score(
    query=query, k=3, pre_filter={"chunk_index": {"$lte": 120}}
)
[(Document(metadata={'_id': '67b07b9602e46738df0bbb2a', 'doc_index': 21, 'chunk_index': 118}, page_content='"No," said the little prince. "I am looking for friends. What does that mean-- ‘tame‘?" \n"It is an act too often neglected," said the fox. It means to establish ties." \n"\'To establish ties\'?"\n"Just that," said the fox. "To me, you are still nothing more than a little boy who is just like a hundred thousand other little boys. And I have no need of you. And you, on your part, have no need of me. To you, I am nothing more than a fox like a hundred thousand other foxes. But if you tame me, then we shall need each other. To me, you will be unique in all the world. To you, I shall be unique in all the world..." \n"I am beginning to understand," said the little prince. "There is a flower... I think that she has tamed me..."'),
      0.7951536178588867),
     (Document(metadata={'_id': '67b07b9602e46738df0bbb29', 'doc_index': 21, 'chunk_index': 117}, page_content='"What does that mean-- ‘tame‘?" \n"You do not live here," said the fox. "What is it that you are looking for?" \n"I am looking for men," said the little prince. "What does that mean-- ‘tame‘?" \n"Men," said the fox. "They have guns, and they hunt. It is very disturbing. They also raise chickens. These are their only interests. Are you looking for chickens?" \n"No," said the little prince. "I am looking for friends. What does that mean-- ‘tame‘?" \n"It is an act too often neglected," said the fox. It means to establish ties." \n"\'To establish ties\'?"'),
      0.7918769717216492),
     (Document(metadata={'_id': '67b07b9602e46738df0bbb2c', 'doc_index': 21, 'chunk_index': 120}, page_content='"My life is very monotonous," the fox said. "I hunt chickens; men hunt me. All the chickens are just alike, and all the men are just alike. And, in consequence, I am a little bored. But if you tame me, it will be as if the sun came to shine on my life . I shall know the sound of a step that will be different from all the others. Other steps send me hurrying back underneath the ground. Yours will call me, like music, out of my burrow. And then look: you see the grain-fields down yonder? I do not ea t bread. Wheat is of no use to me. The wheat fields have nothing to say to me. And that is sad. But you have hair that is the colour of gold. Think how wonderful that will be when you have tamed me! The grain, which is also golden, will bring me bac k the thought of you. And I shall love to'),
      0.7739419937133789)]

CRUD Operations with PyMongo

Let's use PyMongo Collection instead of MongoDBAtlasVectorSearch for our Document CRUD Operations.

Setting up with an empty collection

Delete all documents in vector_store and start with an empty collection.

  • delete_documents : If you don't specify an ID, all documents added to the collection are deleted.

atlas.delete_documents()
True

If True returns, the deletion is successful.

You can see that TOTAL DOCUMENTS has decreasesd to 0.

Upsert

Splits a list of documents into page_content and metadata , then upsert them.

  • upsert_parallel : update documents that match the filter or insert new documents.

Internally, Document is converted to RawBSONDocument .

  • RawBSONDocument : represent BSON document using the raw bytes.

    • BSON, the binary representation of JSON, is primarily used internally by MongoDB.

texts, metadatas = zip(*[(doc.page_content, doc.metadata) for doc in split_documents])
document_manager.upsert_parallel(texts=texts, metadatas=list(metadatas))

Read with Evaluation Operators

To compare the equality, use <field> : <value> expression .

For example, $regex operator returns documents that match a regular expression.

  • fox_query_filter : find all documents inclues the string fox in the page_content field.

  • find_one_by_filter : retrieve the first document that matches the condition.

fox_query_filter = {"page_content": {"$regex": "fox"}}

find_result = document_manager.find_one_by_filter(filter=fox_query_filter)
print(find_result["page_content"])
- the little prince befriends the fox
    It was then that the fox appeared.
    "Good morning," said the fox. 
    "Good morning," the little prince responded politely, although when he turned around he saw nothing. 
    "I am right here," the voice said, "under the apple tree." 
    (picture)
    "Who are you?" asked the little prince, and added, "You are very pretty to look at." 
    "I am a fox," said the fox. 
    "Come and play with me," proposed the little prince. "I am so unhappy." 
    "I cannot play with you," the fox said. "I am not tamed." 
    "Ah! Please excuse me," said the little prince. 
    But, after some thought, he added: 
    "What does that mean-- ‘tame‘?" 
    "You do not live here," said the fox. "What is it that you are looking for?" 
    "I am looking for men," said the little prince. "What does that mean-- ‘tame‘?"
  • find : find all documents that match the condition. Passing an empty filter will return all documents.

cursor = document_manager.find(filter=fox_query_filter)

fox_story_documents = []
for doc in cursor:
    fox_story_documents.append(doc)
len(fox_story_documents)
19

Update with query filter

For example, $set operator sets the value of a field in a document.

  • preface_query_filter : find all documents with the value 0 in the metadata.doc_index field.

  • update_operation : updates 0 in the document's metadata.doc_index to -1 .

preface_query_filter = {"metadata.doc_index": 0}
update_operation = {"$set": {"metadata.doc_index": -1}}
  • update_one_by_filter : updates the first document that matches the condition.

  • update_many_by_filter : updates all documents that match the condition.

updateOneResult = document_manager.update_one_by_filter(
    preface_query_filter, update_operation
)
updateManyResult = document_manager.update_many_by_filter(
    preface_query_filter, update_operation
)

update_one and update_many return UpdateResult object that contains the properties below:

  • matched_count : The number of documents that matched the query filter.

  • modified_count : The number of documents modified.

print(
    f"matched: {updateOneResult.matched_count}, modified: {updateOneResult.modified_count}"
)
print(
    f"matched: {updateManyResult.matched_count}, modified: {updateManyResult.modified_count}"
)
matched: 1, modified: 1
    matched: 5, modified: 5

Upsert option

If you set the upsert to True in update operation, inserts a new document if no document matches the query filter.

  • source_query_filter : find all documents with the value facebook in the metadata.source field.

  • upsert_operation : updates facebook in the document's metadata.source to book .

source_query_filter = {"metadata.source": "facebook"}
upsert_operation = {"$set": {"metadata.source": "book"}}
upsertResult = document_manager.upsert_many_by_filter(
    source_query_filter, upsert_operation
)
print(
    f"matched: {upsertResult.matched_count}, modified: {upsertResult.modified_count}, upserted_id: {upsertResult.upserted_id}"
)
matched: 0, modified: 0, upserted_id: 67b07ce6fbff5980ceb32fa2

Delete with query filter

  • delete_one_by_filter : deletes the first document that matches the condition and returns DeleteResult object.

  • deleted_count : The number of documents deleted.

deleteOneResult = document_manager.delete_one_by_filter(
    fox_query_filter, comment="Deleting the first document containing fox"
)

print(f"deleted: {deleteOneResult.deleted_count}")
deleted: 1
  • delete : deletes all documents that match the condition.

document_manager.delete(filters=fox_query_filter)

Set up the environment. You may refer to for more details.

You can checkout the for more details.

MONGODB_ATLAS_CLUSTER_URI is required to use MongoDB Atlas and is explained in the .

is a multi-cloud database service that provides an easy way to host and manage your data in the cloud.

Atlas can be started with or Atlas UI.

Then go back to the and run the load_dotenv function again.

Internally, it connects to the cluster using , the MongoDB python driver.

To learn more about definition of SearchIndexModel , see .

create_vector_search_index : Alternative to the above section that creates a Vector Search Index.

In the section above, page_content has all the text in the file assigned to it.

Compare the image below to when you first created the index in .

You can also use to perform operations.

You can use to perform operations.

Get Started with Atlas
Deploy a Free Cluster
Connection Strings
Atlas Search and Vector Search Indexes
Review Atlas Search Index Syntax
JSON and BSON
Write Data to MongoDB
Read Data from MongoDB
Query Filter Documents
Update Operators
Integrate Atlas Vector Search with LangChain
Get Started with the LangChain Integration
Comparison Query Operators
MongoDB Atlas
Document loaders
Text splitters
Environment Setup
langchain-opentutorial
MongoDB Atlas
Atlas CLI
PyMongo
available sample datasets
Review Atlas Search Index Syntax
evaluation operators
update operators
Connect to your cluster
Environment Setup
Create a Search Index or Vector Search Index
Document loaders
Vector Store
Ivy Bae
Haseom Shin
ro__o_jun
LangChain Open Tutorial
Initialization
Overview
Environement Setup
Initialization
Atlas Vector Search Indexes
Vector Store
Load Data
Data Preprocessing
Manage vector store
Query vector store
CRUD Operations with PyMongo
mongodb-atlas-project
mongodb-atlas-cluster-deploy
mongodb-atlas-collection
mongodb-atlas-search-index
mongodb-atlas-update-search-index
mongodb-atlas-add-documents
mongodb-atlas-last-document
mongodb-atlas-index-update