Upstage
Author: Sun Hyoung Lee
Peer Review : Pupba, DoWoung Kong
Proofread : Youngjun cho
This is a part of LangChain Open Tutorial
Overview
'Upstage' is a Korean startup specializing in artificial intelligence (AI) technology, particularly in large language models (LLM) and document AI.
Table of Contents
References
Environment Setup
Set up the environment. You may refer to Environment Setup for more details.
[Note]
langchain-opentutorial
is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials.You can checkout the
langchain-opentutorial
for more details.
API Key Configuration
To use UpstageEmbeddings
, you need to obtain a Upstage API key.
Once you have your API key, set it as the value for the variable UPSTAGE_API_KEY
.
%%capture --no-stderr
%pip install langchain-opentutorial
# Install required packages
from langchain_opentutorial import package
package.install(
["langchain_community"],
verbose=False,
upgrade=False,
)
# Set environment variables
from langchain_opentutorial import set_env
set_env(
{
"UPSTAGE_API_KEY": "",
"LANGCHAIN_API_KEY": "",
"LANGCHAIN_TRACING_V2": "true",
"LANGCHAIN_ENDPOINT": "https://api.smith.langchain.com",
"LANGCHAIN_PROJECT": "CH08-Embeddings-UpstageEmebeddings",
}
)
Environment variables have been set successfully.
You can alternatively set UPSTAGE_API_KEY
in .env
file and load it.
[Note] This is not necessary if you've already set UPSTAGE_API_KEY
in previous steps.
from dotenv import load_dotenv
load_dotenv(override=True)
True
texts = [
"Hello, nice to meet you.",
"LangChain simplifies the process of building applications with large language models",
"The LangChain Korean tutorial is designed to help users utilize LangChain more easily and effectively based on LangChain's official documentation, cookbook, and various practical examples.",
"LangChain simplifies the process of building applications with large-scale language models.",
"Retrieval-Augmented Generation (RAG) is an effective technique for improving AI responses.",
]
Check Supported Embedding Models
https://developers.upstage.ai/docs/apis/embeddings
Model Information
embedding-query
2024-05-10
4000
A Solar-base Query Embedding model with a 4k context limit. This model is optimized for embedding user queries in information retrieval tasks such as search and re-ranking.
embedding-passage
2024-05-10
4000
A Solar-base Passage Embedding model with a 4k context limit. This model is optimized for embedding documents or texts for retrieval purposes.
from langchain_upstage import UpstageEmbeddings
# Query-Only Embedding Model
query_embeddings = UpstageEmbeddings(model="embedding-query")
# Sentence-Only Embedding Model
passage_embeddings = UpstageEmbeddings(model="embedding-passage")
Embed the query
# Query Embedding
embedded_query = query_embeddings.embed_query(
" Please provide detailed information about LangChain. "
)
# Print embedding dimension
len(embedded_query)
4096
Embed the document
# Document Embedding
embedded_documents = passage_embeddings.embed_documents(texts)
The similarity calculation results are displayed.
import numpy as np
# Question (embedded_query): Tell me about LangChain.
similarity = np.array(embedded_query) @ np.array(embedded_documents).T
# Sort by similarity in descending order
sorted_idx = (np.array(embedded_query) @ np.array(embedded_documents).T).argsort()[::-1]
# Display results
print("[Query] Tell me about LangChain.\n====================================")
for i, idx in enumerate(sorted_idx):
print(f"[{i}] Similarity: {similarity[idx]:.3f} | {texts[idx]}")
print()
[Query] Tell me about LangChain.
====================================
[0] Similarity: 0.535 | LangChain simplifies the process of building applications with large-scale language models.
[1] Similarity: 0.519 | LangChain simplifies the process of building applications with large language models
[2] Similarity: 0.509 | The LangChain Korean tutorial is designed to help users utilize LangChain more easily and effectively based on LangChain's official documentation, cookbook, and various practical examples.
[3] Similarity: 0.230 | Retrieval-Augmented Generation (RAG) is an effective technique for improving AI responses.
[4] Similarity: 0.158 | Hello, nice to meet you.
Last updated