# Set environment variablesfrom langchain_opentutorial import set_envset_env( {"OPENAI_API_KEY": "","LANGCHAIN_API_KEY": "","LANGCHAIN_TRACING_V2": "true","LANGCHAIN_ENDPOINT": "https://api.smith.langchain.com","LANGCHAIN_PROJECT": "HuggingFace Embeddings", # title 과 동일하게 설정해 주세요"HUGGINGFACEHUB_API_TOKEN": "", })
Environment variables have been set successfully.
You can alternatively set OPENAI_API_KEY in .env file and load it.
[Note]
This is not necessary if you've already set OPENAI_API_KEY in previous steps.
from dotenv import load_dotenvload_dotenv(override=True)
True
# Automatically select the appropriate deviceimport torchimport platformdefget_device():if platform.system()=="Darwin":# macOS specificifhasattr(torch.backends, "mps")and torch.backends.mps.is_available():print("✅ Using MPS (Metal Performance Shaders) on macOS")return"mps"if torch.cuda.is_available():print("✅ Using CUDA (NVIDIA GPU)")return"cuda"else:print("✅ Using CPU")return"cpu"# Set the devicedevice =get_device()print("🖥️ Current device in use:", device)
✅ Using MPS (Metal Performance Shaders) on macOS
🖥️ Current device in use: mps
# Embedding Model Local Storage Pathimport osimport warnings# Ignore warningswarnings.filterwarnings("ignore")# Set the download path to ./cache/os.environ["HF_HOME"]="./cache/"
Data Preparation for Embedding-Based Search Tutorial
To perform embedding-based search, we prepare both a Query and Documents.
Query
Write a key question that will serve as the basis for the search.
# Queryq ="Please tell me more about LangChain."
Documents
Prepare multiple documents (texts) that will serve as the target for the search.
Each document will be embedded to enable semantic search capabilities.
# Documents for Text Embeddingdocs = ["Hi, nice to meet you.","LangChain simplifies the process of building applications with large language models.","The LangChain English tutorial is structured based on LangChain's official documentation, cookbook, and various practical examples to help users utilize LangChain more easily and effectively.","LangChain simplifies the process of building applications with large-scale language models.","Retrieval-Augmented Generation (RAG) is an effective technique for improving AI responses.",]
Which Text Embedding Model Should You Use?
Leverage the MTEB leaderboard and free embedding models to confidently select and utilize the best-performing text embedding models for your projects! 🚀
🚀 What is MTEB (Massive Text Embedding Benchmark)?
MTEB is a benchmark designed to systematically and objectively evaluate the performance of text embedding models.
Purpose: To fairly compare the performance of embedding models.
Evaluation Tasks: Includes tasks like Classification,Retrieval,Clustering, and Semantic Similarity.
Supported Models: A wide range of text embedding models available on Hugging Face.
Results: Displayed as scores, with top-performing models ranked on the leaderboard.
Offers strong multilingual support with consistent results.
2️⃣ multilingual-e5-large
A powerful multilingual embedding model.
3️⃣ bge-m3
Optimized for large-scale text processing, excelling in retrieval and semantic similarity tasks.
Similarity Calculation
Similarity Calculation Using Vector Dot Product
Similarity is determined using the dot product of vectors.
Similarity Calculation Formula:
📐 Mathematical Significance of the Vector Dot Product
Definition of Vector Dot Product
The dot product of two vectors, $\mathbf{a}$ and $\mathbf{b}$, is mathematically defined as:
Relationship with Cosine Similarity
The dot product also relates to cosine similarity and follows this property:
Where:
$|\mathbf{a}|$ and $|\mathbf{b}|$ represent the magnitudes (norms, specifically Euclidean norms) of vectors $\mathbf{a}$ and $\mathbf{b}$.
$\theta$ is the angle between the two vectors.
$\cos \theta$ represents the cosine similarity between the two vectors.
🔍 Interpretation of Vector Dot Product in Similarity
When the dot product value is large (a large positive value):
The magnitudes ($|\mathbf{a}|$ and $|\mathbf{b}|$) of the two vectors are large.
The angle ($\theta$) between the two vectors is small ( $\cos \theta$ approaches 1 ).
This indicates that the two vectors point in a similar direction and are more semantically similar, especially when their magnitudes are also large.
📏 Calculation of Vector Magnitude (Norm)
Definition of Euclidean Norm
For a vector $\mathbf{a} = [a_1, a_2, \ldots, a_n]$, the Euclidean norm $|\mathbf{a}|$ is calculated as:
This magnitude represents the length or size of the vector in multi-dimensional space.
Understanding these mathematical foundations helps ensure precise similarity calculations, enabling better performance in tasks like semantic search,retrieval systems, and recommendation engines. 🚀
Similarity calculation between embedded_query and embedded_document
embed_documents : For embedding multiple texts (documents)
embed_query : For embedding a single text (query)
We've implemented a method to search for the most relevant documents using text embeddings.
Let's use search_similar_documents(q, docs, hf_embeddings) to find the most relevant documents.
import numpy as npdefsearch_similar_documents(q,docs,hf_embeddings):""" Search for the most relevant documents based on a query using text embeddings. Args: q (str): The query string for which relevant documents are to be found. docs (list of str): A list of document strings to compare against the query. hf_embeddings: An embedding model object with `embed_query` and `embed_documents` methods. Returns: tuple: - embedded_query (numpy.ndarray): The embedding vector of the query. - embedded_documents (numpy.ndarray): The embedding matrix of the documents. Workflow: 1. Embed the query string into a numerical vector using `embed_query`. 2. Embed each document into numerical vectors using `embed_documents`. 3. Calculate similarity scores between the query and documents using the dot product. 4. Sort the documents based on their similarity scores in descending order. 5. Print the query and display the sorted documents by their relevance. 6. Return the query and document embeddings for further analysis if needed. """# Embed the query and documents using the embedding model embedded_query = hf_embeddings.embed_query(q) embedded_documents = hf_embeddings.embed_documents(docs)# Calculate similarity scores using dot product similarity_scores = np.array(embedded_query)@ np.array(embedded_documents).T# Sort documents by similarity scores in descending order sorted_idx = similarity_scores.argsort()[::-1]# Display the resultsprint(f"[Query] {q}\n"+"="*40)for i, idx inenumerate(sorted_idx):print(f"[{i}] {docs[idx]}")print()# Return embeddings for potential further processing or analysisreturn embedded_query, embedded_documents
HuggingFaceEndpointEmbeddings Overview
HuggingFaceEndpointEmbeddings is a feature in the LangChain library that leverages Hugging Face’s Inference API endpoint to generate text embeddings seamlessly.
📚 Key Concepts
Hugging Face Inference API
Access pre-trained embedding models via Hugging Face’s API.
No need to download models locally; embeddings are generated directly through the API.
LangChain Integration
Easily integrate embedding results into LangChain workflows using its standardized interface.
Use Cases
Text-query and document similarity calculation
Search and recommendation systems
Natural Language Understanding (NLU) applications
⚙️ Key Parameters
model : The Hugging Face model ID (e.g., BAAI/bge-m3 )
task : The task to perform (usually "feature-extraction" )
api_key : Your Hugging Face API token
model_kwargs : Additional model configuration parameters
💡 Advantages
No Local Model Download: Instant access via API.
Scalability: Supports a wide range of pre-trained Hugging Face models.
Seamless Integration: Effortlessly integrates embeddings into LangChain workflows.
⚠️ Caveats
API Support: Not all models support API inference.
Speed & Cost: Free APIs may have slower response times and usage limitations.
With HuggingFaceEndpointEmbeddings, you can easily integrate Hugging Face’s powerful embedding models into your LangChain workflows for efficient and scalable NLP solutions. 🚀
Let’s use the intfloat/multilingual-e5-large-instruct model via the API to search for the most relevant documents using text embeddings.
from langchain_huggingface.embeddings import HuggingFaceEndpointEmbeddingsmodel_name ="intfloat/multilingual-e5-large-instruct"hf_endpoint_embeddings =HuggingFaceEndpointEmbeddings( model=model_name, task="feature-extraction", huggingfacehub_api_token=os.environ["HUGGINGFACEHUB_API_TOKEN"],)
Search for the most relevant documents based on a query using text embeddings.
%%time# Embed the query and documents using the embedding modelembedded_query = hf_endpoint_embeddings.embed_query(q)embedded_documents = hf_endpoint_embeddings.embed_documents(docs)
CPU times: user 7.18 ms, sys: 2.32 ms, total: 9.5 ms
Wall time: 1.21 s
# Calculate similarity scores using dot productsimilarity_scores = np.array(embedded_query)@ np.array(embedded_documents).T# Sort documents by similarity scores in descending ordersorted_idx = similarity_scores.argsort()[::-1]
# Display the resultsprint(f"[Query] {q}\n"+"="*40)for i, idx inenumerate(sorted_idx):print(f"[{i}] {docs[idx]}")print()
[Query] Please tell me more about LangChain.
========================================
[0] LangChain simplifies the process of building applications with large language models.
[1] LangChain simplifies the process of building applications with large-scale language models.
[2] The LangChain English tutorial is structured based on LangChain's official documentation, cookbook, and various practical examples to help users utilize LangChain more easily and effectively.
[3] Retrieval-Augmented Generation (RAG) is an effective technique for improving AI responses.
[4] Hi, nice to meet you.
[Query] Please tell me more about LangChain.
========================================
[0] LangChain simplifies the process of building applications with large language models.
[1] LangChain simplifies the process of building applications with large-scale language models.
[2] The LangChain English tutorial is structured based on LangChain's official documentation, cookbook, and various practical examples to help users utilize LangChain more easily and effectively.
[3] Retrieval-Augmented Generation (RAG) is an effective technique for improving AI responses.
[4] Hi, nice to meet you.
CPU times: user 7.25 ms, sys: 3.26 ms, total: 10.5 ms
Wall time: 418 ms
HuggingFaceEmbeddings Overview
HuggingFaceEmbeddings is a feature in the LangChain library that enables the conversion of text data into vectors using Hugging Face embedding models.
This class downloads and operates Hugging Face models locally for efficient processing.
📚 Key Concepts
Hugging Face Pre-trained Models
Leverages pre-trained embedding models provided by Hugging Face.
Downloads models locally for direct embedding operations.
LangChain Integration
Seamlessly integrates with LangChain workflows using its standardized interface.
Use Cases
Text-query and document similarity calculation
Search and recommendation systems
Natural Language Understanding (NLU) applications
⚙️ Key Parameters
model_name : The Hugging Face model ID (e.g., sentence-transformers/all-MiniLM-L6-v2 )
model_kwargs : Additional model configuration parameters (e.g., GPU/CPU device settings)
encode_kwargs : Extra settings for embedding generation
💡 Advantages
Local Embedding Operations: Perform embeddings locally without requiring an internet connection.
High Performance: Utilize GPU settings for faster embedding generation.
Model Variety: Supports a wide range of Hugging Face models.
⚠️ Caveats
Local Storage Requirement: Pre-trained models must be downloaded locally.
Environment Configuration: Performance may vary depending on GPU/CPU device settings.
With HuggingFaceEmbeddings, you can efficiently leverage Hugging Face's powerful embedding models in a local environment, enabling flexible and scalable NLP solutions. 🚀
Let's download the embedding model locally, perform embeddings, and search for the most relevant documents.
from langchain_huggingface.embeddings import HuggingFaceEmbeddingsmodel_name ="intfloat/multilingual-e5-large-instruct"hf_embeddings_e5_instruct =HuggingFaceEmbeddings( model_name=model_name, model_kwargs={"device": device}, # mps, cuda, cpu encode_kwargs={"normalize_embeddings": True},)
modules.json: 0%| | 0.00/349 [00:00
config_sentence_transformers.json: 0%| | 0.00/128 [00:00<?, ?B/s]README.md: 0%| | 0.00/140k [00:00<?, ?B/s]config.json: 0%| | 0.00/690 [00:00<?, ?B/s]model.safetensors: 0%| | 0.00/1.12G [00:00<?, ?B/s]tokenizer_config.json: 0%| | 0.00/1.18k [00:00<?, ?B/s]sentencepiece.bpe.model: 0%| | 0.00/5.07M [00:00<?, ?B/s]tokenizer.json: 0%| | 0.00/17.1M [00:00<?, ?B/s]special_tokens_map.json: 0%| | 0.00/964 [00:00<?, ?B/s]1_Pooling/config.json: 0%| | 0.00/271 [00:00<?, ?B/s]
%%timeembedded_query, embedded_documents = search_similar_documents(q, docs, hf_embeddings_e5_instruct)
[Query] Please tell me more about LangChain. ======================================== [0] LangChain simplifies the process of building applications with large language models. [1] LangChain simplifies the process of building applications with large-scale language models. [2] The LangChain English tutorial is structured based on LangChain's official documentation, cookbook, and various practical examples to help users utilize LangChain more easily and effectively. [3] Retrieval-Augmented Generation (RAG) is an effective technique for improving AI responses. [4] Hi, nice to meet you. CPU times: user 326 ms, sys: 120 ms, total: 446 ms Wall time: 547 ms
print(f"Model: \t\t{model_name}")print(f"Document Dimension: \t{len(embedded_documents[0])}")print(f"Query Dimension: \t{len(embedded_query)}")
Model: intfloat/multilingual-e5-large-instruct Document Dimension: 1024 Query Dimension: 1024
intfloat/multilingual-e5-large
from langchain_huggingface.embeddings import HuggingFaceEmbeddingsmodel_name = "intfloat/multilingual-e5-large"hf_embeddings_e5_large = HuggingFaceEmbeddings( model_name=model_name, model_kwargs={"device": device}, # mps, cuda, cpu encode_kwargs={"normalize_embeddings": True},)
modules.json: 0%| | 0.00/387 [00:00README.md: 0%| | 0.00/160k [00:00<?, ?B/s]sentence_bert_config.json: 0%| | 0.00/57.0 [00:00<?, ?B/s]config.json: 0%| | 0.00/690 [00:00<?, ?B/s]model.safetensors: 0%| | 0.00/2.24G [00:00<?, ?B/s]tokenizer_config.json: 0%| | 0.00/418 [00:00<?, ?B/s]sentencepiece.bpe.model: 0%| | 0.00/5.07M [00:00<?, ?B/s]tokenizer.json: 0%| | 0.00/17.1M [00:00<?, ?B/s]special_tokens_map.json: 0%| | 0.00/280 [00:00<?, ?B/s]1_Pooling/config.json: 0%| | 0.00/201 [00:00<?, ?B/s]%%timeembedded_query, embedded_documents = search_similar_documents(q, docs, hf_embeddings_e5_large)[Query] Please tell me more about LangChain. ======================================== [0] LangChain simplifies the process of building applications with large-scale language models. [1] LangChain simplifies the process of building applications with large language models. [2] The LangChain English tutorial is structured based on LangChain's official documentation, cookbook, and various practical examples to help users utilize LangChain more easily and effectively. [3] Retrieval-Augmented Generation (RAG) is an effective technique for improving AI responses. [4] Hi, nice to meet you. CPU times: user 84.1 ms, sys: 511 ms, total: 595 ms Wall time: 827 msprint(f"Model: \t\t{model_name}")print(f"Document Dimension: \t{len(embedded_documents[0])}")print(f"Query Dimension: \t{len(embedded_query)}")Model: intfloat/multilingual-e5-large Document Dimension: 1024 Query Dimension: 1024BAAI/bge-m3from langchain_huggingface import HuggingFaceEmbeddingsmodel_name = "BAAI/bge-m3"model_kwargs = {"device": device} # mps, cuda, cpuencode_kwargs = {"normalize_embeddings": True}hf_embeddings_bge_m3 = HuggingFaceEmbeddings( model_name=model_name, model_kwargs=model_kwargs, encode_kwargs=encode_kwargs)modules.json: 0%| | 0.00/349 [00:00config_sentence_transformers.json: 0%| | 0.00/123 [00:00<?, ?B/s]README.md: 0%| | 0.00/15.8k [00:00<?, ?B/s]sentence_bert_config.json: 0%| | 0.00/54.0 [00:00<?, ?B/s]config.json: 0%| | 0.00/687 [00:00<?, ?B/s]pytorch_model.bin: 0%| | 0.00/2.27G [00:00<?, ?B/s]tokenizer_config.json: 0%| | 0.00/444 [00:00<?, ?B/s]sentencepiece.bpe.model: 0%| | 0.00/5.07M [00:00<?, ?B/s]tokenizer.json: 0%| | 0.00/17.1M [00:00<?, ?B/s]special_tokens_map.json: 0%| | 0.00/964 [00:00<?, ?B/s]1_Pooling/config.json: 0%| | 0.00/191 [00:00<?, ?B/s]%%timeembedded_query, embedded_documents = search_similar_documents(q, docs, hf_embeddings_bge_m3)[Query] Please tell me more about LangChain. ======================================== [0] LangChain simplifies the process of building applications with large language models. [1] LangChain simplifies the process of building applications with large-scale language models. [2] The LangChain English tutorial is structured based on LangChain's official documentation, cookbook, and various practical examples to help users utilize LangChain more easily and effectively. [3] Hi, nice to meet you. [4] Retrieval-Augmented Generation (RAG) is an effective technique for improving AI responses. CPU times: user 81.1 ms, sys: 1.29 s, total: 1.37 s Wall time: 1.5 sprint(f"Model: \t\t{model_name}")print(f"Document Dimension: \t{len(embedded_documents[0])}")print(f"Query Dimension: \t{len(embedded_query)}")Model: BAAI/bge-m3 Document Dimension: 1024 Query Dimension: 1024FlagEmbedding Usage GuideFlagEmbedding is an advanced embedding framework developed by BAAI (Beijing Academy of Artificial Intelligence).It supports various embedding approaches and is primarily used with the BGE (BAAI General Embedding) model.FlagEmbedding excels in tasks such as semantic search, natural language processing (NLP), and recommendation systems.📚 Core Concepts of FlagEmbedding1️⃣ Dense EmbeddingDefinition: Represents the overall meaning of a text as a single high-density vector.Advantages: Effectively captures semantic similarity.Use Cases: Semantic search, document similarity computation.2️⃣ Lexical EmbeddingDefinition: Breaks text into word-level components, emphasizing word matching.Advantages: Ensures precise matching of specific words or phrases.Use Cases: Keyword-based search, exact word matching.3️⃣ Multi-Vector EmbeddingDefinition: Splits a document into multiple vectors for representation.Advantages: Allows more granular representation of lengthy texts or diverse topics.Use Cases: Complex document structure analysis, detailed topic matching.FlagEmbedding offers a flexible and powerful toolkit for leveraging embeddings across a wide range of NLP tasks and semantic search applications. 🚀The following code is used to control tokenizer parallelism in Hugging Face's transformers library:TOKENIZERS_PARALLELISM = "true" → Optimized for speed, suitable for large-scale data processing.TOKENIZERS_PARALLELISM = "false" → Ensures stability, prevents conflicts and race conditions.import osos.environ["TOKENIZERS_PARALLELISM"] = "true" # "false"# install FlagEmbedding%pip install -qU FlagEmbedding⚙️ Key ParameterBGEM3FlagModelmodel_name : The Hugging Face model ID (e.g., BAAI/bge-m3 ).use_fp16 : When set to True, reduces memory usage and improves encoding speed.bge_embeddings.encodebatch_size : Defines the number of documents to process at once.max_length : Sets the maximum token length for encoding documents.Increase for longer documents to ensure full content encoding.Excessively large values may degrade performance.return_dense : When set to True, returns Dense Vectors only.return_sparse : When set to True, returns Sparse Vectors.return_colbert_vecs : When set to True, returns ColBERT-style vectors.1️⃣ Dense Vector Embedding ExampleDefinition: Represents the overall meaning of a text as a single high-density vector.Advantages: Effectively captures semantic similarity.Use Cases: Semantic search, document similarity computation.from FlagEmbedding import BGEM3FlagModelmodel_name = "BAAI/bge-m3"bge_embeddings = BGEM3FlagModel( model_name, use_fp16=True, # Enabling fp16 improves encoding speed with minimal precision trade-off.)# Encode documents with specified parametersembedded_documents_dense_vecs = bge_embeddings.encode( sentences=docs, batch_size=12, max_length=8192, # Reduce this value if your documents are shorter to speed up encoding.)["dense_vecs"]# Query Encodingembedded_query_dense_vecs = bge_embeddings.encode( sentences=[q], batch_size=12, max_length=8192, # Reduce this value if your documents are shorter to speed up encoding.)["dense_vecs"]Fetching 30 files: 0%| | 0/30 [00:00imgs/mkqa.jpg: 0%| | 0.00/608k [00:00<?, ?B/s]imgs/.DS_Store: 0%| | 0.00/6.15k [00:00<?, ?B/s]imgs/long.jpg: 0%| | 0.00/485k [00:00<?, ?B/s]imgs/bm25.jpg: 0%| | 0.00/132k [00:00<?, ?B/s]imgs/miracl.jpg: 0%| | 0.00/576k [00:00<?, ?B/s]imgs/nqa.jpg: 0%| | 0.00/158k [00:00<?, ?B/s].gitattributes: 0%| | 0.00/1.63k [00:00<?, ?B/s]colbert_linear.pt: 0%| | 0.00/2.10M [00:00<?, ?B/s]imgs/others.webp: 0%| | 0.00/21.0k [00:00<?, ?B/s]long.jpg: 0%| | 0.00/127k [00:00<?, ?B/s]onnx/Constant_7_attr__value: 0%| | 0.00/65.6k [00:00<?, ?B/s]onnx/config.json: 0%| | 0.00/698 [00:00<?, ?B/s]model.onnx: 0%| | 0.00/725k [00:00<?, ?B/s]model.onnx_data: 0%| | 0.00/2.27G [00:00<?, ?B/s]onnx/tokenizer_config.json: 0%| | 0.00/1.17k [00:00<?, ?B/s]tokenizer.json: 0%| | 0.00/17.1M [00:00<?, ?B/s]sparse_linear.pt: 0%| | 0.00/3.52k [00:00<?, ?B/s]You're using a XLMRobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.embedded_documents_dense_vecsarray([[-0.0271 , 0.003561, -0.0506 , ..., 0.00911 , -0.04565 , 0.02028 ], [-0.02242 , -0.01398 , -0.00946 , ..., 0.01851 , 0.01907 , -0.01917 ], [ 0.01386 , -0.02118 , 0.01807 , ..., -0.01463 , 0.04373 , -0.011856], [-0.02365 , -0.008675, -0.000806, ..., 0.01537 , 0.01438 , -0.02342 ], [-0.01289 , -0.007313, -0.0121 , ..., -0.00561 , 0.03787 , 0.006016]], dtype=float16)embedded_query_dense_vecsarray([[-0.02156 , -0.01993 , -0.01706 , ..., -0.01994 , 0.0318 , -0.003395]], dtype=float16)# docs embedding dimensionembedded_documents_dense_vecs.shape(5, 1024)# query embedding dimensionembedded_query_dense_vecs.shape(1, 1024)# Calculating Similarity Between Documents and Queryfrom sklearn.metrics.pairwise import cosine_similaritysimilarities = cosine_similarity( embedded_query_dense_vecs, embedded_documents_dense_vecs)most_similar_idx = similarities.argmax()# Display the Most Similar Documentprint(f"Question: {q}")print(f"Most similar document: {docs[most_similar_idx]}")Question: Please tell me more about LangChain. Most similar document: LangChain simplifies the process of building applications with large language models.from FlagEmbedding import BGEM3FlagModelmodel_name = "BAAI/bge-m3"bge_embeddings = BGEM3FlagModel( model_name, use_fp16=True, # Enabling fp16 improves encoding speed with minimal precision trade-off.)# Encode documents with specified parametersembedded_documents_dense_vecs_default = bge_embeddings.encode( sentences=docs, return_dense=True)["dense_vecs"]# Query Encodingembedded_query_dense_vecs_default = bge_embeddings.encode( sentences=[q], return_dense=True)["dense_vecs"]Fetching 30 files: 0%| | 0/30 [00:00You're using a XLMRobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.# Calculating Similarity Between Documents and Queryfrom sklearn.metrics.pairwise import cosine_similaritysimilarities = cosine_similarity( embedded_query_dense_vecs_default, embedded_documents_dense_vecs_default)most_similar_idx = similarities.argmax()# Display the Most Similar Documentprint(f"Question: {q}")print(f"Most similar document: {docs[most_similar_idx]}")Question: Please tell me more about LangChain. Most similar document: LangChain simplifies the process of building applications with large language models.2️⃣ Sparse(Lexical) Vector Embedding ExampleSparse Embedding (Lexical Weight)Sparse embedding is an embedding method that utilizes high-dimensional vectors where most values are zero.The approach using lexical weight generates embeddings by considering the importance of each word.How It WorksCalculate the lexical weight for each word. Techniques like TF-IDF or BM25 can be used.For each word in a document or query, assign a value to the corresponding dimension of the sparse vector based on its lexical weight.As a result, documents and queries are represented as high-dimensional vectors where most values are zero.AdvantagesDirectly reflects the importance of words.Enables precise matching of specific words or phrases.Faster computation compared to dense embeddings.from FlagEmbedding import BGEM3FlagModelmodel_name = "BAAI/bge-m3"bge_embeddings = BGEM3FlagModel( model_name, use_fp16=True, # Enabling fp16 improves encoding speed with minimal precision trade-off.)# Encode documents with specified parametersembedded_documents_sparse_vecs = bge_embeddings.encode( sentences=docs, return_sparse=True)# Query Encodingembedded_query_sparse_vecs = bge_embeddings.encode(sentences=[q], return_sparse=True)Fetching 30 files: 0%| | 0/30 [00:00You're using a XLMRobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.lexical_scores_0 = bge_embeddings.compute_lexical_matching_score( embedded_query_sparse_vecs["lexical_weights"][0], embedded_documents_sparse_vecs["lexical_weights"][0],)lexical_scores_1 = bge_embeddings.compute_lexical_matching_score( embedded_query_sparse_vecs["lexical_weights"][0], embedded_documents_sparse_vecs["lexical_weights"][1],)lexical_scores_2 = bge_embeddings.compute_lexical_matching_score( embedded_query_sparse_vecs["lexical_weights"][0], embedded_documents_sparse_vecs["lexical_weights"][2],)lexical_scores_3 = bge_embeddings.compute_lexical_matching_score( embedded_query_sparse_vecs["lexical_weights"][0], embedded_documents_sparse_vecs["lexical_weights"][3],)lexical_scores_4 = bge_embeddings.compute_lexical_matching_score( embedded_query_sparse_vecs["lexical_weights"][0], embedded_documents_sparse_vecs["lexical_weights"][4],)print(f"question: {q}")print("====================")for i, doc in enumerate(docs): print(doc, f": {eval(f'lexical_scores_{i}')}")question: Please tell me more about LangChain. ==================== Hi, nice to meet you. : 0.0118865966796875 LangChain simplifies the process of building applications with large language models. : 0.2313995361328125 The LangChain English tutorial is structured based on LangChain's official documentation, cookbook, and various practical examples to help users utilize LangChain more easily and effectively. : 0.18797683715820312 LangChain simplifies the process of building applications with large-scale language models. : 0.2268962860107422 Retrieval-Augmented Generation (RAG) is an effective technique for improving AI responses. : 0.0023689270019531253️⃣ Multi-Vector(ColBERT) Embedding ExampleColBERT (Contextualized Late Interaction over BERT) is an efficient approach for document retrieval.This method uses a multi-vector strategy to represent both documents and queries with multiple vectors.How It WorksGenerate a separate vector for each token in a document, resulting in multiple vectors per document.Similarly, generate a separate vector for each token in a query.During retrieval, calculate the similarity between each query token vector and all document token vectors.Aggregate these similarity scores to produce a final retrieval score.AdvantagesEnables fine-grained token-level matching.Captures contextual embeddings effectively.Performs efficiently even with long documents.from FlagEmbedding import BGEM3FlagModelmodel_name = "BAAI/bge-m3"bge_embeddings = BGEM3FlagModel( model_name, use_fp16=True, # Enabling fp16 improves encoding speed with minimal precision trade-off.)# Encode documents with specified parametersembedded_documents_colbert_vecs = bge_embeddings.encode( sentences=docs, return_colbert_vecs=True)# Query Encodingembedded_query_colbert_vecs = bge_embeddings.encode( sentences=[q], return_colbert_vecs=True)Fetching 30 files: 0%| | 0/30 [00:00You're using a XLMRobertaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.colbert_scores_0 = bge_embeddings.colbert_score( embedded_query_colbert_vecs["colbert_vecs"][0], embedded_documents_colbert_vecs["colbert_vecs"][0],)colbert_scores_1 = bge_embeddings.colbert_score( embedded_query_colbert_vecs["colbert_vecs"][0], embedded_documents_colbert_vecs["colbert_vecs"][1],)colbert_scores_2 = bge_embeddings.colbert_score( embedded_query_colbert_vecs["colbert_vecs"][0], embedded_documents_colbert_vecs["colbert_vecs"][2],)colbert_scores_3 = bge_embeddings.colbert_score( embedded_query_colbert_vecs["colbert_vecs"][0], embedded_documents_colbert_vecs["colbert_vecs"][3],)colbert_scores_4 = bge_embeddings.colbert_score( embedded_query_colbert_vecs["colbert_vecs"][0], embedded_documents_colbert_vecs["colbert_vecs"][4],)print(f"question: {q}")print("====================")for i, doc in enumerate(docs): print(doc, f": {eval(f'colbert_scores_{i}')}")question: Please tell me more about LangChain. ==================== Hi, nice to meet you. : 0.509117841720581 LangChain simplifies the process of building applications with large language models. : 0.7039894461631775 The LangChain English tutorial is structured based on LangChain's official documentation, cookbook, and various practical examples to help users utilize LangChain more easily and effectively. : 0.6632840037345886 LangChain simplifies the process of building applications with large-scale language models. : 0.7057777643203735 Retrieval-Augmented Generation (RAG) is an effective technique for improving AI responses. : 0.38082367181777954💡 Advantages of FlagEmbeddingDiverse Embedding Options: Supports the Dense, Lexical, and Multi-Vector approaches.High-Performance Models: Utilizes powerful pre-trained models like BGE.Flexibility: Choose the optimal embedding method based on your use case.Scalability: Capable of performing embeddings on large-scale datasets.⚠️ ConsiderationsModel Size: Some models may require significant storage capacity.Resource Requirements: GPU usage is recommended for large-scale vector computations.Configuration Needs: Optimal performance may require parameter tuning.📊 FlagEmbedding Vector Comparison