MongoDB Atlas
Author: Ivy Bae
Peer Review : Haseom Shin, ro__o_jun
This is a part of LangChain Open Tutorial
Overview
This tutorial covers the initial setup process for users who are new to MongoDB Atlas.
If you're already familiar with MongoDB Atlas, you can skip the Initialization section.
All examples run on a free cluster, and once you add a collection to your database, you'll be ready to start.
You’ll learn preprocessing to preserve document structure after loading data from a The Little Prince file, how to add and delete documents to a collection, and manage vector store.
Once the documents added, you can learn how to query your data using semantic search, index updates for filtering, and MQL operators.
By the end of this tutorial, you'll be able to integrate PyMongo with LangChain and use VectorStore.
Table of Contents
References
Environment Setup
Set up the environment. You may refer to Environment Setup for more details.
[Note]
langchain-opentutorialis a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials.You can checkout the
langchain-opentutorialfor more details.
You can alternatively set API keys such as OPENAI_API_KEY in a .env file and load them.
[Note] This is not necessary if you've already set the required API keys in previous steps.
MONGODB_ATLAS_CLUSTER_URI is required to use MongoDB Atlas and is explained in the Connect to your cluster.
If you are already using MongoDB Atlas, you can set the cluster connection string to MONGODB_ATLAS_CLUSTER_URI in your .env file.
Initialization
MongoDB Atlas is a multi-cloud database service that provides an easy way to host and manage your data in the cloud.
After you register with and log in to Atlas, you can create a Free cluster.
Atlas can be started with Atlas CLI or Atlas UI.
Atlas CLI can be difficult to use if you're not used to working with development tools, so this tutorial will walk you through how to use Atlas UI.
Deploy a cluster
Please select the appropriate project in your Organization. If the project doesn't exist, you'll need to create it.
If you select a project, you can create a cluster.

Follow the procedure below to deploy a cluster
select Cluster: M0 Free cluster option
Note: You can deploy only one Free cluster per Atlas project
select Provider: M0 on AWS, GCP, and Azure
select Region
create a database user and add your IP address settings.
After you deploy a cluster, you can see the cluster you deployed as shown in the image below.

Connect to your cluster
Click Get connection string in the image above to get the cluster URI and set the value of MONGODB_ATLAS_CLUSTER_URI in the .env file.
The connection string resembles the following example:
mongodb+srv://[databaseUser]:[databasePassword]@[clusterName].[hostName].mongodb.net/?retryWrites=true&w=majority
Then go back to the Environment Setup and run the load_dotenv function again.
Initialize MongoDBAtlas and MongoDBAtlasDocumentManager
MongoDBAtlas manages MongoDB collections and vector store.
Internally, it connects to the cluster using PyMongo, the MongoDB python driver.
You can also create a vector store that integrates Atlas Vector Search and Langchain.
MongoDBAtlasDocumentManager that handles document processing and CRUD operations in MongoDB Atlas.
Initialize MongoDB database and collection
A MongoDB database stores a collections of documents.
You can browse collections to see the little-prince collection you just created and the sample data provided by Atlas.

In this tutorial, we will use the little-prince collection in the langchain-opentutorial-db database.
Atlas Vector Search Indexes
When performing vector search in Atlas, you must create an Atlas Vector Search Index.
Create a Search Index or Vector Search Index
You can define Atlas Search Index or Atlas Vector Search Index using SearchIndexModel object.
definition: define the Search Index.name: query the Search Index by name.
To learn more about definition of SearchIndexModel , see Review Atlas Search Index Syntax.
create_index: create a single Atlas Search Index or Atlas Vector Search Index. Checks internally if a Search Index with the same name exists.
Click the Atlas Search tab to see the search indexes that you created.

Update a Search Index
update_index: update an Atlas Search Index or Atlas Vector Search Index.
If the update is successful, click test_vector_index in the list of Index Name on the Atlas Search tab to see more information.
You can see that the Similarity Method for the Vector Field has changed to euclidean.

You can also click the Edit Index Definition button on the right side of the Atlas UI to update it.
Delete a Search Index
delete_index: remove an Atlas Search Index or Atlas Vector Search Index.
Vector Store
create_vector_store: create a vector store usingMongoDBAtlasVectorSearch.embedding: embedding model to use.index_name: index to use when querying the vector store.relevance_score_fn: similarity score used for the index. You can choose from euclidean, cosine, and dotProduct.
Create a Index
create_vector_search_index: Alternative to the above Create a Search Index or Vector Search Index section that creates a Vector Search Index.
Click the Atlas Search tab to see the search index langchain-opentutorial-index that you created.
Load Data
LangChain provides Document loaders that can load a variety of data sources.
Document loaders
get_documents: useTextLoaderto add data from the the_little_prince.txt in the data directory to the little-prince collection.
The get_documents method returns List[Document].
metadata: data associated with contentpage_content: string text
Data Preprocessing
Preserving text file structure
In the Document loaders section above, page_content has all the text in the file assigned to it.
split_by_chapterTo preserve the structure of the text file, let's modify it to split the file into chapters.
the_little_prince.txt used [ Chapter X ] as a delimiter to separate the chapters.
split_documents: split documents by chapterAdd
doc_indexto metadata
If you compare the documents to split_chapters , you can see that page_content is split by chapter .
Text splitters
Splitting a Document into appropriately sized chunks allows you to process text data more efficiently.
To split a Document while preserving paragraph and sentence structure, use RecursiveCharacterTextSplitter .
chunk_size: setting the maximum size of chunkschunk_overlap: setting the character overlap size between chunks
Add metadata
Splitting the document into chunk_size increases the number of documents.
Add an chunk_index key to the metadata to identify the document index, since it is not split into one Document per chapter.
The chunk_index has been added to the metadata.
You can see that some of the page_content text in the Document overlaps.
Manage vector store
Now that you've initialized the vector_store and loaded the data, you can add and delete Documents to the little-prince collection.
Add
add_documents: Add documents to thevector_storeand returns a List of IDs for the added documents.
delete function allow specify the Document IDs to delete, so ids store the IDs of the added documents.
Check the first document ID. The number of IDs matches the number of documents, and each ID is a unique value.
In the image below, after adding documents the STORAGE SIZE of the collection increases and you can see the documents corresponding to each ID, such as ids[0] .

The embedding field is a vector representation of the text data. It is used to determine similarity to the query vector for vector search.
Query Filter
Create a Document object, add it to a collection.
TOTAL DOCUMENTS has increased from 167 to 168.
On the last page, you can see the page_content of sample_document .
Alternatively, you can add query filter, such as the source field, to view the search results.

Delete
You can specify the document IDs to delete as arguments to the delete_documents function, such as sample_id .
If True returns, the deletion is successful.
You can see that TOTAL DOCUMENTS has decreasesd from 168 to 167 and that sample_document has been deleted.
Query vector store
Make a query related to the content of The Little Prince and see if the vector_store returns results from a search for similar documents.
The query is based on the most well-known story about the relationship between the Little Prince and the Fox.
Semantic Search
similarity_search method performs a basic semantic search
The k parameter in the example below specifies the number of documents.
It returns a List[Document] ranked by relevance.
Semantic Search with Score
similarity_search_with_score method also performs a semantic search.
The difference with the similarity_search method is that it returns a relevance score of documents between 0 and 1.
Semantic Search with Filtering
MongoDB Atlas supports pre-filtering your data using MongoDB Query Language(MQL) Operators.
You must update the index definition using update_vector_search_index .
Compare the image below to when you first created the index in Vector Store.
Notice that chunk_index have been added to the Index Fields and Documents have been added as well.

There are comparison query operators that find values that match a condition.
For example, the $eq operator finds documents that match a specified value.
Now you can add a pre_filter condition that documents chunk_index are lower than or equal to 120 using the $lte operator.
CRUD Operations with PyMongo
Let's use PyMongo Collection instead of MongoDBAtlasVectorSearch for our Document CRUD Operations.
Setting up with an empty collection
Delete all documents in vector_store and start with an empty collection.
delete_documents: If you don't specify an ID, all documents added to the collection are deleted.
If True returns, the deletion is successful.
You can see that TOTAL DOCUMENTS has decreasesd to 0.
Upsert
Splits a list of documents into page_content and metadata , then upsert them.
upsert_parallel: update documents that match the filter or insert new documents.
Internally, Document is converted to RawBSONDocument .
RawBSONDocument: represent BSON document using the raw bytes.BSON, the binary representation of JSON, is primarily used internally by MongoDB.
Read with Evaluation Operators
To compare the equality, use <field> : <value> expression .
You can also use evaluation operators to perform operations.
For example, $regex operator returns documents that match a regular expression.
fox_query_filter: find all documents inclues the stringfoxin thepage_contentfield.find_one_by_filter: retrieve the first document that matches the condition.
find: find all documents that match the condition. Passing an empty filter will return all documents.
Update with query filter
You can use update operators to perform operations.
For example, $set operator sets the value of a field in a document.
preface_query_filter: find all documents with the value0in themetadata.doc_indexfield.update_operation: updates0in the document'smetadata.doc_indexto-1.
update_one_by_filter: updates the first document that matches the condition.update_many_by_filter: updates all documents that match the condition.
update_one and update_many return UpdateResult object that contains the properties below:
matched_count: The number of documents that matched the query filter.modified_count: The number of documents modified.
Upsert option
If you set the upsert to True in update operation, inserts a new document if no document matches the query filter.
source_query_filter: find all documents with the valuefacebookin themetadata.sourcefield.upsert_operation: updatesfacebookin the document'smetadata.sourcetobook.
Delete with query filter
delete_one_by_filter: deletes the first document that matches the condition and returnsDeleteResultobject.deleted_count: The number of documents deleted.
delete: deletes all documents that match the condition.
Last updated