MongoDB Atlas
Author: Ivy Bae
Peer Review : Haseom Shin, ro__o_jun
This is a part of LangChain Open Tutorial
Overview
This tutorial covers the initial setup process for users who are new to MongoDB Atlas.
If you're already familiar with MongoDB Atlas, you can skip the Initialization section.
All examples run on a free cluster, and once you add a collection to your database, you'll be ready to start.
You’ll learn preprocessing to preserve document structure after loading data from a The Little Prince file, how to add and delete documents to a collection, and manage vector store.
Once the documents added, you can learn how to query your data using semantic search, index updates for filtering, and MQL operators.
By the end of this tutorial, you'll be able to integrate PyMongo with LangChain and use VectorStore.
Table of Contents
References
Environment Setup
Set up the environment. You may refer to Environment Setup for more details.
[Note]
langchain-opentutorial
is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials.You can checkout the
langchain-opentutorial
for more details.
%%capture --no-stderr
%pip install langchain-opentutorial
# Install required packages
from langchain_opentutorial import package
package.install(
[
"langchain_openai",
"langsmith",
"langchain_core",
"langchain_community",
"langchain-mongodb",
"pymongo",
"certifi",
],
verbose=False,
upgrade=False,
)
[notice] A new release of pip is available: 24.1 -> 25.0.1
[notice] To update, run: pip install --upgrade pip
# Set environment variables
from langchain_opentutorial import set_env
set_env(
{
"OPENAI_API_KEY": "",
"LANGCHAIN_API_KEY": "",
"MONGODB_ATLAS_CLUSTER_URI": "",
"LANGCHAIN_TRACING_V2": "true",
"LANGCHAIN_ENDPOINT": "https://api.smith.langchain.com",
"LANGCHAIN_PROJECT": "07-MongoDB-Atlas",
}
)
Environment variables have been set successfully.
You can alternatively set API keys such as OPENAI_API_KEY
in a .env
file and load them.
[Note] This is not necessary if you've already set the required API keys in previous steps.
MONGODB_ATLAS_CLUSTER_URI
is required to use MongoDB Atlas and is explained in the Connect to your cluster.
If you are already using MongoDB Atlas, you can set the cluster connection string to MONGODB_ATLAS_CLUSTER_URI
in your .env
file.
# Load API keys from .env file
from dotenv import load_dotenv
load_dotenv(override=True)
True
Initialization
MongoDB Atlas is a multi-cloud database service that provides an easy way to host and manage your data in the cloud.
After you register with and log in to Atlas, you can create a Free cluster.
Atlas can be started with Atlas CLI or Atlas UI.
Atlas CLI can be difficult to use if you're not used to working with development tools, so this tutorial will walk you through how to use Atlas UI.
Deploy a cluster
Please select the appropriate project in your Organization. If the project doesn't exist, you'll need to create it.
If you select a project, you can create a cluster.

Follow the procedure below to deploy a cluster
select Cluster: M0 Free cluster option
Note: You can deploy only one Free cluster per Atlas project
select Provider: M0 on AWS, GCP, and Azure
select Region
create a database user and add your IP address settings.
After you deploy a cluster, you can see the cluster you deployed as shown in the image below.

Connect to your cluster
Click Get connection string in the image above to get the cluster URI and set the value of MONGODB_ATLAS_CLUSTER_URI
in the .env
file.
The connection string resembles the following example:
mongodb+srv://[databaseUser]:[databasePassword]@[clusterName].[hostName].mongodb.net/?retryWrites=true&w=majority
Then go back to the Environment Setup and run the load_dotenv
function again.
Initialize MongoDBAtlas and MongoDBAtlasDocumentManager
MongoDBAtlas
manages MongoDB collections and vector store.
Internally, it connects to the cluster using PyMongo, the MongoDB python driver.
You can also create a vector store that integrates Atlas Vector Search and Langchain.
MongoDBAtlasDocumentManager
that handles document processing and CRUD operations in MongoDB Atlas.
Initialize MongoDB database and collection
A MongoDB database stores a collections of documents.
from utils.mongodb_atlas import MongoDBAtlas, MongoDBAtlasDocumentManager
DB_NAME = "langchain-opentutorial-db"
COLLECTION_NAME = "little-prince"
atlas = MongoDBAtlas(DB_NAME, COLLECTION_NAME)
document_manager = MongoDBAtlasDocumentManager(atlas=atlas)
You can browse collections to see the little-prince collection you just created and the sample data provided by Atlas.

In this tutorial, we will use the little-prince collection in the langchain-opentutorial-db database.
Atlas Vector Search Indexes
When performing vector search in Atlas, you must create an Atlas Vector Search Index.
Create a Search Index or Vector Search Index
You can define Atlas Search Index or Atlas Vector Search Index using SearchIndexModel
object.
definition
: define the Search Index.name
: query the Search Index by name.
To learn more about definition
of SearchIndexModel
, see Review Atlas Search Index Syntax.
from pymongo.operations import SearchIndexModel
TEST_SEARCH_INDEX_NAME = "test_search_index"
TEST_VECTOR_SEARCH_INDEX_NAME = "test_vector_index"
search_index = SearchIndexModel(
definition={
"mappings": {"dynamic": True},
},
name=TEST_SEARCH_INDEX_NAME,
)
vector_index = SearchIndexModel(
definition={
"fields": [
{
"type": "vector",
"numDimensions": 1536,
"path": "embedding",
"similarity": "cosine",
}
]
},
name=TEST_VECTOR_SEARCH_INDEX_NAME,
type="vectorSearch",
)
create_index
: create a single Atlas Search Index or Atlas Vector Search Index. Checks internally if a Search Index with the same name exists.
atlas.create_index(TEST_SEARCH_INDEX_NAME, search_index)
atlas.create_index(TEST_VECTOR_SEARCH_INDEX_NAME, vector_index)
Click the Atlas Search tab to see the search indexes that you created.

Update a Search Index
update_index
: update an Atlas Search Index or Atlas Vector Search Index.
new_vector_index = {
"fields": [
{
"type": "vector",
"numDimensions": 1536,
"path": "embedding",
"similarity": "euclidean",
}
]
}
atlas.update_index(TEST_VECTOR_SEARCH_INDEX_NAME, definition=new_vector_index)
If the update is successful, click test_vector_index in the list of Index Name on the Atlas Search tab to see more information.
You can see that the Similarity Method for the Vector Field has changed to euclidean.

You can also click the Edit Index Definition button on the right side of the Atlas UI to update it.
Delete a Search Index
delete_index
: remove an Atlas Search Index or Atlas Vector Search Index.
atlas.delete_index(TEST_SEARCH_INDEX_NAME)
atlas.delete_index(TEST_VECTOR_SEARCH_INDEX_NAME)
Vector Store
create_vector_store
: create a vector store usingMongoDBAtlasVectorSearch
.embedding
: embedding model to use.index_name
: index to use when querying the vector store.relevance_score_fn
: similarity score used for the index. You can choose from euclidean, cosine, and dotProduct.
from langchain_openai import OpenAIEmbeddings
embedding = OpenAIEmbeddings(model="text-embedding-3-small")
TUTORIAL_VECTOR_SEARCH_INDEX_NAME = "langchain-opentutorial-index"
atlas.create_vector_store(
embedding=embedding,
index_name=TUTORIAL_VECTOR_SEARCH_INDEX_NAME,
relevance_score_fn="cosine",
)
Create a Index
create_vector_search_index
: Alternative to the above Create a Search Index or Vector Search Index section that creates a Vector Search Index.
atlas.create_vector_search_index(dimensions=1536)
Click the Atlas Search tab to see the search index langchain-opentutorial-index that you created.
Load Data
LangChain provides Document loaders that can load a variety of data sources.
Document loaders
get_documents
: useTextLoader
to add data from the the_little_prince.txt in the data directory to the little-prince collection.
documents = document_manager.get_documents(
file_path="./data/the_little_prince.txt", encoding="utf-8"
)
The get_documents
method returns List[Document].
metadata
: data associated with contentpage_content
: string text
Data Preprocessing
Preserving text file structure
In the Document loaders section above, page_content
has all the text in the file assigned to it.
split_by_chapter
To preserve the structure of the text file, let's modify it to split the file into chapters.
the_little_prince.txt used [ Chapter X ] as a delimiter to separate the chapters.
from typing import List
def split_by_chapter(text: str) -> List[str]:
chapters = text.split("[ Chapter ")
return [chapter.split(" ]", 1)[-1].strip() for chapter in chapters]
split_documents
: split documents by chapterAdd
doc_index
to metadata
split_chapters = document_manager.split_documents(
documents, split_condition=split_by_chapter, split_index_name="doc_index"
)
If you compare the documents
to split_chapters
, you can see that page_content
is split by chapter
.
first_chapter = split_chapters[1]
print(f"{first_chapter.page_content[:30]}, metadata: {first_chapter.metadata}")
- we are introduced to the nar, metadata: {'doc_index': 1}
Text splitters
Splitting a Document
into appropriately sized chunks allows you to process text data more efficiently.
To split a Document
while preserving paragraph and sentence structure, use RecursiveCharacterTextSplitter
.
chunk_size
: setting the maximum size of chunkschunk_overlap
: setting the character overlap size between chunks
from langchain_text_splitters import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=200)
split_documents = document_manager.split_documents_by_splitter(
text_splitter, split_chapters
)
Add metadata
Splitting the document into chunk_size
increases the number of documents.
Add an chunk_index
key to the metadata to identify the document index, since it is not split into one Document
per chapter.
for index, doc in enumerate(split_documents):
doc.metadata.update({"chunk_index": index})
The chunk_index
has been added to the metadata.
You can see that some of the page_content
text in the Document
overlaps.
Manage vector store
Now that you've initialized the vector_store
and loaded the data, you can add and delete Documents to the little-prince collection.
Add
add_documents
: Add documents to thevector_store
and returns a List of IDs for the added documents.
ids = atlas.add_documents(documents=split_documents)
delete
function allow specify the Document IDs to delete, so ids
store the IDs of the added documents.
Check the first document ID. The number of IDs matches the number of documents, and each ID is a unique value.
In the image below, after adding documents the STORAGE SIZE of the collection increases and you can see the documents corresponding to each ID, such as ids[0]
.

The embedding
field is a vector representation of the text data. It is used to determine similarity to the query vector for vector search.
Query Filter
Create a Document
object, add it to a collection.
from langchain_core.documents import Document
sample_document = Document(
page_content="I am leveraging my experience as a developer to provide development education and nurture many new developers.",
metadata={"source": "linkedin"},
)
sample_id = atlas.add_documents([sample_document])
TOTAL DOCUMENTS has increased from 167 to 168.
On the last page, you can see the page_content
of sample_document
.
Alternatively, you can add query filter, such as the source
field, to view the search results.

Delete
You can specify the document IDs to delete as arguments to the delete_documents
function, such as sample_id
.
atlas.delete_documents(ids=sample_id)
True
If True
returns, the deletion is successful.
You can see that TOTAL DOCUMENTS has decreasesd from 168 to 167 and that sample_document
has been deleted.
Query vector store
Make a query
related to the content of The Little Prince and see if the vector_store
returns results from a search for similar documents.
The query
is based on the most well-known story about the relationship between the Little Prince and the Fox.
query = "What does it mean to be tamed according to the fox?"
Semantic Search
similarity_search
method performs a basic semantic search
The k
parameter in the example below specifies the number of documents.
It returns a List[Document] ranked by relevance.
atlas.similarity_search(query=query, k=1)
[Document(metadata={'_id': '67b07b9602e46738df0bbb2e', 'doc_index': 21, 'chunk_index': 122}, page_content='The fox gazed at the little prince, for a long time. \n(picture)\n"Please-- tame me!" he said. \n"I want to, very much," the little prince replied. "But I have not much time. I have friends to discover, and a great many things to understand." \n"One only understands the things that one tames," said the fox. "Men have no more time to understand anything. They buy things all ready made at the shops. But there is no shop anywhere where one can buy friendship, and so men have no friends any more. If you want a friend, tame me..." \n"What must I do, to tame you?" asked the little prince.')]
Semantic Search with Score
similarity_search_with_score
method also performs a semantic search.
The difference with the similarity_search
method is that it returns a relevance score of documents between 0 and 1.
atlas.similarity_search_with_score(query=query, k=3)
[(Document(metadata={'_id': '67b07b9602e46738df0bbb2e', 'doc_index': 21, 'chunk_index': 122}, page_content='The fox gazed at the little prince, for a long time. \n(picture)\n"Please-- tame me!" he said. \n"I want to, very much," the little prince replied. "But I have not much time. I have friends to discover, and a great many things to understand." \n"One only understands the things that one tames," said the fox. "Men have no more time to understand anything. They buy things all ready made at the shops. But there is no shop anywhere where one can buy friendship, and so men have no friends any more. If you want a friend, tame me..." \n"What must I do, to tame you?" asked the little prince.'),
0.8047155141830444),
(Document(metadata={'_id': '67b07b9602e46738df0bbb2a', 'doc_index': 21, 'chunk_index': 118}, page_content='"No," said the little prince. "I am looking for friends. What does that mean-- ‘tame‘?" \n"It is an act too often neglected," said the fox. It means to establish ties." \n"\'To establish ties\'?"\n"Just that," said the fox. "To me, you are still nothing more than a little boy who is just like a hundred thousand other little boys. And I have no need of you. And you, on your part, have no need of me. To you, I am nothing more than a fox like a hundred thousand other foxes. But if you tame me, then we shall need each other. To me, you will be unique in all the world. To you, I shall be unique in all the world..." \n"I am beginning to understand," said the little prince. "There is a flower... I think that she has tamed me..."'),
0.7951536178588867),
(Document(metadata={'_id': '67b07b9602e46738df0bbb29', 'doc_index': 21, 'chunk_index': 117}, page_content='"What does that mean-- ‘tame‘?" \n"You do not live here," said the fox. "What is it that you are looking for?" \n"I am looking for men," said the little prince. "What does that mean-- ‘tame‘?" \n"Men," said the fox. "They have guns, and they hunt. It is very disturbing. They also raise chickens. These are their only interests. Are you looking for chickens?" \n"No," said the little prince. "I am looking for friends. What does that mean-- ‘tame‘?" \n"It is an act too often neglected," said the fox. It means to establish ties." \n"\'To establish ties\'?"'),
0.7918769717216492)]
Semantic Search with Filtering
MongoDB Atlas supports pre-filtering your data using MongoDB Query Language(MQL) Operators.
You must update the index definition using update_vector_search_index
.
atlas.update_vector_search_index(dimensions=1536, filters=["chunk_index"])
Compare the image below to when you first created the index in Vector Store.
Notice that chunk_index
have been added to the Index Fields and Documents have been added as well.

There are comparison query operators that find values that match a condition.
For example, the $eq
operator finds documents that match a specified value.
Now you can add a pre_filter
condition that documents chunk_index are lower than or equal to 120 using the $lte
operator.
atlas.similarity_search_with_score(
query=query, k=3, pre_filter={"chunk_index": {"$lte": 120}}
)
[(Document(metadata={'_id': '67b07b9602e46738df0bbb2a', 'doc_index': 21, 'chunk_index': 118}, page_content='"No," said the little prince. "I am looking for friends. What does that mean-- ‘tame‘?" \n"It is an act too often neglected," said the fox. It means to establish ties." \n"\'To establish ties\'?"\n"Just that," said the fox. "To me, you are still nothing more than a little boy who is just like a hundred thousand other little boys. And I have no need of you. And you, on your part, have no need of me. To you, I am nothing more than a fox like a hundred thousand other foxes. But if you tame me, then we shall need each other. To me, you will be unique in all the world. To you, I shall be unique in all the world..." \n"I am beginning to understand," said the little prince. "There is a flower... I think that she has tamed me..."'),
0.7951536178588867),
(Document(metadata={'_id': '67b07b9602e46738df0bbb29', 'doc_index': 21, 'chunk_index': 117}, page_content='"What does that mean-- ‘tame‘?" \n"You do not live here," said the fox. "What is it that you are looking for?" \n"I am looking for men," said the little prince. "What does that mean-- ‘tame‘?" \n"Men," said the fox. "They have guns, and they hunt. It is very disturbing. They also raise chickens. These are their only interests. Are you looking for chickens?" \n"No," said the little prince. "I am looking for friends. What does that mean-- ‘tame‘?" \n"It is an act too often neglected," said the fox. It means to establish ties." \n"\'To establish ties\'?"'),
0.7918769717216492),
(Document(metadata={'_id': '67b07b9602e46738df0bbb2c', 'doc_index': 21, 'chunk_index': 120}, page_content='"My life is very monotonous," the fox said. "I hunt chickens; men hunt me. All the chickens are just alike, and all the men are just alike. And, in consequence, I am a little bored. But if you tame me, it will be as if the sun came to shine on my life . I shall know the sound of a step that will be different from all the others. Other steps send me hurrying back underneath the ground. Yours will call me, like music, out of my burrow. And then look: you see the grain-fields down yonder? I do not ea t bread. Wheat is of no use to me. The wheat fields have nothing to say to me. And that is sad. But you have hair that is the colour of gold. Think how wonderful that will be when you have tamed me! The grain, which is also golden, will bring me bac k the thought of you. And I shall love to'),
0.7739419937133789)]
CRUD Operations with PyMongo
Let's use PyMongo Collection instead of MongoDBAtlasVectorSearch
for our Document CRUD Operations.
Setting up with an empty collection
Delete all documents in vector_store
and start with an empty collection.
delete_documents
: If you don't specify an ID, all documents added to the collection are deleted.
atlas.delete_documents()
True
If True
returns, the deletion is successful.
You can see that TOTAL DOCUMENTS has decreasesd to 0.
Upsert
Splits a list of documents into page_content
and metadata
, then upsert them.
upsert_parallel
: update documents that match the filter or insert new documents.
Internally, Document
is converted to RawBSONDocument
.
RawBSONDocument
: represent BSON document using the raw bytes.BSON, the binary representation of JSON, is primarily used internally by MongoDB.
texts, metadatas = zip(*[(doc.page_content, doc.metadata) for doc in split_documents])
document_manager.upsert_parallel(texts=texts, metadatas=list(metadatas))
Read with Evaluation Operators
To compare the equality, use <field>
: <value>
expression .
You can also use evaluation operators to perform operations.
For example, $regex
operator returns documents that match a regular expression.
fox_query_filter
: find all documents inclues the stringfox
in thepage_content
field.find_one_by_filter
: retrieve the first document that matches the condition.
fox_query_filter = {"page_content": {"$regex": "fox"}}
find_result = document_manager.find_one_by_filter(filter=fox_query_filter)
print(find_result["page_content"])
- the little prince befriends the fox
It was then that the fox appeared.
"Good morning," said the fox.
"Good morning," the little prince responded politely, although when he turned around he saw nothing.
"I am right here," the voice said, "under the apple tree."
(picture)
"Who are you?" asked the little prince, and added, "You are very pretty to look at."
"I am a fox," said the fox.
"Come and play with me," proposed the little prince. "I am so unhappy."
"I cannot play with you," the fox said. "I am not tamed."
"Ah! Please excuse me," said the little prince.
But, after some thought, he added:
"What does that mean-- ‘tame‘?"
"You do not live here," said the fox. "What is it that you are looking for?"
"I am looking for men," said the little prince. "What does that mean-- ‘tame‘?"
find
: find all documents that match the condition. Passing an empty filter will return all documents.
cursor = document_manager.find(filter=fox_query_filter)
fox_story_documents = []
for doc in cursor:
fox_story_documents.append(doc)
len(fox_story_documents)
19
Update with query filter
You can use update operators to perform operations.
For example, $set
operator sets the value of a field in a document.
preface_query_filter
: find all documents with the value0
in themetadata.doc_index
field.update_operation
: updates0
in the document'smetadata.doc_index
to-1
.
preface_query_filter = {"metadata.doc_index": 0}
update_operation = {"$set": {"metadata.doc_index": -1}}
update_one_by_filter
: updates the first document that matches the condition.update_many_by_filter
: updates all documents that match the condition.
updateOneResult = document_manager.update_one_by_filter(
preface_query_filter, update_operation
)
updateManyResult = document_manager.update_many_by_filter(
preface_query_filter, update_operation
)
update_one
and update_many
return UpdateResult
object that contains the properties below:
matched_count
: The number of documents that matched the query filter.modified_count
: The number of documents modified.
print(
f"matched: {updateOneResult.matched_count}, modified: {updateOneResult.modified_count}"
)
print(
f"matched: {updateManyResult.matched_count}, modified: {updateManyResult.modified_count}"
)
matched: 1, modified: 1
matched: 5, modified: 5
Upsert option
If you set the upsert
to True
in update operation, inserts a new document if no document matches the query filter.
source_query_filter
: find all documents with the valuefacebook
in themetadata.source
field.upsert_operation
: updatesfacebook
in the document'smetadata.source
tobook
.
source_query_filter = {"metadata.source": "facebook"}
upsert_operation = {"$set": {"metadata.source": "book"}}
upsertResult = document_manager.upsert_many_by_filter(
source_query_filter, upsert_operation
)
print(
f"matched: {upsertResult.matched_count}, modified: {upsertResult.modified_count}, upserted_id: {upsertResult.upserted_id}"
)
matched: 0, modified: 0, upserted_id: 67b07ce6fbff5980ceb32fa2
Delete with query filter
delete_one_by_filter
: deletes the first document that matches the condition and returnsDeleteResult
object.deleted_count
: The number of documents deleted.
deleteOneResult = document_manager.delete_one_by_filter(
fox_query_filter, comment="Deleting the first document containing fox"
)
print(f"deleted: {deleteOneResult.deleted_count}")
deleted: 1
delete
: deletes all documents that match the condition.
document_manager.delete(filters=fox_query_filter)
Last updated