Upstage

Author: Sun Hyoung Lee
Peer Review : Pupba, DoWoung Kong
Proofread : Youngjun cho
This is a part of LangChain Open Tutorial

Overview

'Upstage' is a Korean startup specializing in artificial intelligence (AI) technology, particularly in large language models (LLM) and document AI.

References

Environment Setup

Set up the environment. You may refer to Environment Setup for more details.

[Note]

langchain-opentutorial is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials.
You can checkout the langchain-opentutorial for more details.

API Key Configuration

To use UpstageEmbeddings , you need to obtain a Upstage API key.

Once you have your API key, set it as the value for the variable UPSTAGE_API_KEY .

%%capture --no-stderr
%pip install langchain-opentutorial

# Install required packages
from langchain_opentutorial import package

package.install(
    ["langchain_community"],
    verbose=False,
    upgrade=False,
)

# Set environment variables
from langchain_opentutorial import set_env

set_env(
    {
        "UPSTAGE_API_KEY": "",
        "LANGCHAIN_API_KEY": "",
        "LANGCHAIN_TRACING_V2": "true",
        "LANGCHAIN_ENDPOINT": "https://api.smith.langchain.com",
        "LANGCHAIN_PROJECT": "CH08-Embeddings-UpstageEmebeddings",
    }
)

Environment variables have been set successfully.

You can alternatively set UPSTAGE_API_KEY in .env file and load it.

[Note] This is not necessary if you've already set UPSTAGE_API_KEY in previous steps.

from dotenv import load_dotenv

load_dotenv(override=True)

True

texts = [
    "Hello, nice to meet you.",
    "LangChain simplifies the process of building applications with large language models",
    "The LangChain Korean tutorial is designed to help users utilize LangChain more easily and effectively based on LangChain's official documentation, cookbook, and various practical examples.",
    "LangChain simplifies the process of building applications with large-scale language models.",
    "Retrieval-Augmented Generation (RAG) is an effective technique for improving AI responses.",
]

Check Supported Embedding Models

https://developers.upstage.ai/docs/apis/embeddings

Model Information

Model

Release Date

Context Length

Description

embedding-query

2024-05-10

4000

A Solar-base Query Embedding model with a 4k context limit. This model is optimized for embedding user queries in information retrieval tasks such as search and re-ranking.

embedding-passage

2024-05-10

4000

A Solar-base Passage Embedding model with a 4k context limit. This model is optimized for embedding documents or texts for retrieval purposes.

from langchain_upstage import UpstageEmbeddings

# Query-Only Embedding Model
query_embeddings = UpstageEmbeddings(model="embedding-query")

# Sentence-Only Embedding Model
passage_embeddings = UpstageEmbeddings(model="embedding-passage")

Embed the query

# Query Embedding
embedded_query = query_embeddings.embed_query(
    " Please provide detailed information about LangChain. "
)
# Print embedding dimension
len(embedded_query)

Embed the document

# Document Embedding
embedded_documents = passage_embeddings.embed_documents(texts)

The similarity calculation results are displayed.

import numpy as np

# Question (embedded_query): Tell me about LangChain.
similarity = np.array(embedded_query) @ np.array(embedded_documents).T

# Sort by similarity in descending order
sorted_idx = (np.array(embedded_query) @ np.array(embedded_documents).T).argsort()[::-1]

# Display results
print("[Query] Tell me about LangChain.\n====================================")
for i, idx in enumerate(sorted_idx):
    print(f"[{i}] Similarity: {similarity[idx]:.3f} | {texts[idx]}")
    print()

[Query] Tell me about LangChain.
    ====================================
    [0] Similarity: 0.535 | LangChain simplifies the process of building applications with large-scale language models.
    
    [1] Similarity: 0.519 | LangChain simplifies the process of building applications with large language models
    
    [2] Similarity: 0.509 | The LangChain Korean tutorial is designed to help users utilize LangChain more easily and effectively based on LangChain's official documentation, cookbook, and various practical examples.
    
    [3] Similarity: 0.230 | Retrieval-Augmented Generation (RAG) is an effective technique for improving AI responses.
    
    [4] Similarity: 0.158 | Hello, nice to meet you.

PreviousHuggingFace Embeddings NextOllama Embeddings With Langchain

Last updated 2 months ago

Overview

Table of Contents

References

Environment Setup

API Key Configuration