Chat Models
Last updated
Last updated
Author: PangPangGod
Design:
Peer Review : YooKyung Jeon
This is a part of LangChain Open Tutorial
This tutorial covers an explanation of various Chat models (OpenAI, Anthropic, etc.) along with brief usage examples.
Set up the environment. You may refer to Environment Setup for more details.
[Note]
langchain-opentutorial
is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials.
You can checkout the langchain-opentutorial
for more details.
If you want to get automated tracing of your model calls you can also set your LangSmith API key by uncommenting below code:
OpenAI is an AI research and deployment company based in San Francisco, dedicated to ensuring that artificial general intelligence benefits all of humanity. Models include the GPT series of language models, such as GPT-4
and GPT-4o
, as well as the DALL·E
series for image generation.
gpt-4o
A versatile, high-intelligence flagship model, cheaper and faster than GPT-4 Turbo.
128,000 tokens
16,384 tokens
Up to October 2023
chatgpt-4o-latest
The continuously updated version of GPT-4o used in ChatGPT.
128,000 tokens
16,384 tokens
Continuously updated
gpt-4o-mini
A smaller, faster model with better performance than GPT-3.5 Turbo.
128,000 tokens
16,384 tokens
Up to October 2023
gpt-4-turbo
The latest GPT-4 Turbo model with vision, JSON mode, and function calling capabilities.
128,000 tokens
4,096 tokens
Up to December 2023
gpt-4o-realtime
A beta model optimized for real-time API use with audio and text inputs.
128,000 tokens
4,096 tokens
Up to October 2023
gpt-4o-audio
A beta model capable of handling audio inputs and outputs via the Chat Completions API.
128,000 tokens
16,384 tokens
Up to October 2023
gpt-3.5-turbo
Optimized for chat and non-chat tasks with natural language and code generation capabilities.
16,385 tokens
4,096 tokens
Up to September 2021
OpenAI offers a variety of model options. A detailed specification of these models can be found at the following link: OpenAI Model Specifications
The basic API options are as follows:
model_name
: str
This option allows you to select the applicable model. can be aliased as model
.
temperature
: float
= 0.7
This option sets the sampling temperature
. Values can range between 0 and 2. Higher values (e.g., 0.8) make the output more random, while lower values (e.g., 0.2) make the output more focused and deterministic.
max_tokens
: int
| None
= None
Specifies the maximum number of tokens to generate in the chat completion. This option controls the length of text the model can generate in one instance.
Detailed information about the available API options can be found here.
The code provided assumes that your OPENAI_API_KEY
is set in your environment variables. If you would like to manually specify your API key and also choose a different model, you can use the following code:
Anthropic is an AI safety and research company based in San Francisco, dedicated to building reliable, interpretable, and steerable AI systems.
Their primary offering is the Claude
family of large language models, including Claude 3.5 Sonnet
and Claude 3.5 Haiku
, designed for various applications such as reasoning, coding, and multilingual tasks.
Claude 3.5 Sonnet
The most intelligent model in the Claude family.
200,000 tokens
8,192 tokens
Up to April 2024
Claude 3.5 Haiku
The fastest model with blazing speed.
200,000 tokens
8,192 tokens
Up to July 2024
Claude 3 Opus
Powerful model for highly complex tasks.
200,000 tokens
4,096 tokens
Up to August 2023
Claude 3 Sonnet
Balanced model offering strong utility for scaled deployments.
200,000 tokens
4,096 tokens
Up to August 2023
Claude 3 Haiku
Fastest and most compact model for near-instant responsiveness.
200,000 tokens
4,096 tokens
Up to August 2023
A detailed specification of these models can be found at the following link: Anthropic Model Specifications
The basic API options are as follows:
model_name
: str
This option allows you to select the applicable model. can be aliased as model
.
temperature
: float
= 0.7
This option sets the sampling temperature
. Values can range between 0 and 2. Higher values (e.g., 0.8) make the output more random, while lower values (e.g., 0.2) make the output more focused and deterministic.
max_tokens
: int
| None
= None
Specifies the maximum number of tokens to generate in the chat completion. This option controls the length of text the model can generate in one instance.
Detailed information about the available API options can be found here.
The code provided assumes that your ANTHROPIC_API_KEY
is set in your environment variables. If you would like to manually specify your API key and also choose a different model, you can use the following code:
Perplexity AI is a conversational search engine that integrates advanced large language models (LLMs) to provide direct answers to user queries with source citations. Their platform supports the following models, optimized for chat completion tasks, with extended context capabilities:
llama-3.1-sonar-small-128k-online
8B
127,072 tokens
Chat Completion
llama-3.1-sonar-large-128k-online
70B
127,072 tokens
Chat Completion
llama-3.1-sonar-huge-128k-online
405B
127,072 tokens
Chat Completion
A detailed specification of these models can be found at the following link: Perplexity Model Cards
The basic API options are as follows:
model
: str
Specifies the language model to use (e.g., "llama-3.1-sonar-small-128k-online"
). This determines the performance and capabilities of the response.
temperature
: float
= 0.7
Controls the randomness of responses. A value of 0 is deterministic, while 1 allows for the most random outputs.
max_tokens
: int
| None
= None
Specifies the maximum number of tokens to generate in the chat completion. This option controls the length of text the model can generate in one instance.
For more detailed information about the available API options, visit Perplexity API Reference.
The code provided assumes that your PPLX_API_KEY
is set in your environment variables. If you would like to manually specify your API key and also choose a different model, you can use the following code:
Together AI is a San Francisco-based company specializing in decentralized cloud services for training and deploying generative AI models. Founded in 2022, they offer a cloud platform that enables researchers, developers, and organizations to train, fine-tune, and run AI models efficiently at scale. Their services include GPU clusters featuring NVIDIA GB200, H200, and H100, and they contribute to open-source AI research, models, and datasets to advance the field.
Offers the fastest inference stack in the industry, up to 4x faster than vLLM.
Operates at 11x lower cost compared to GPT-4 when using Llama-3 70B.
Features auto-scaling capabilities that adjust capacity based on API request volume.
Supports customized AI model training and fine-tuning.
Incorporates cutting-edge optimization technologies like FlashAttention-3.
Ensures full ownership of trained models.
Proprietary inference engine integrating FlashAttention-3 kernels and custom kernels.
Implements speculative decoding algorithms like Medusa and SpecExec.
Employs unique quantization techniques for maximum accuracy and performance.
User data is not used for training new models without explicit consent.
Provides users with complete control over data storage.
Supports over 200 open-source models, including Google Gemma, Meta's Llama 3.3, Qwen2.5, and Mistral/Mixtral from Mistral AI.
Enables multimodal AI models to process various types of data.
A detailed specification of these models can be found at the following link: Together AI Models
The basic API options are as follows:
model_name
: str
This option allows you to select the applicable model. can be aliased as model
.
temperature
: float
= 0.7
This option sets the sampling temperature
. Values can range between 0 and 2. Higher values (e.g., 0.8) make the output more random, while lower values (e.g., 0.2) make the output more focused and deterministic.
max_tokens
: int
| None
= None
Specifies the maximum number of tokens to generate in the chat completion. This option controls the length of text the model can generate in one instance.
Detailed information about the available API options can be found here.
The code provided assumes that your TOGETHER_API_KEY
is set in your environment variables. If you would like to manually specify your API key and also choose a different model, you can use the following code:
Cohere is a leading AI company specializing in enterprise AI solutions, enabling businesses to easily adopt and utilize AI technologies through advanced large language models (LLMs). Their platform is tailored for natural language processing tasks, providing scalable and efficient tools for real-world applications.
Founded: 2020
Key Investors: Inovia Capital, NVIDIA, Oracle, Salesforce Ventures
Series C Funding: Raised $270 million
Mission: To provide an AI platform tailored for enterprise needs
command-r-7b
A small, fast update of the Command R+ model, excelling at RAG, tool use, agents, and multi-step reasoning.
128,000 tokens
4,000 tokens
command-r-plus
An instruction-following conversational model excelling in RAG, tool use, and multi-step reasoning.
128,000 tokens
4,000 tokens
command-r
A conversational model designed for high-quality language tasks and complex workflows like RAG and coding.
128,000 tokens
4,000 tokens
command
An instruction-following conversational model for high-quality tasks, more reliable and longer context than base models.
4,000 tokens
4,000 tokens
command-nightly
Nightly version of the command model with experimental and regularly updated features. Not for production use.
128,000 tokens
4,000 tokens
command-light
A smaller, faster version of the command model, maintaining near-equal capability with improved speed.
4,000 tokens
4,000 tokens
command-light-nightly
Nightly version of the command-light model, experimental and regularly updated. Not for production use.
4,000 tokens
4,000 tokens
c4ai-aya-expanse-8b
A highly performant 8B multilingual model serving 23 languages, designed for superior monolingual performance.
8,000 tokens
4,000 tokens
c4ai-aya-expanse-32b
A highly performant 32B multilingual model serving 23 languages, designed for superior monolingual performance.
128,000 tokens
4,000 tokens
A detailed specification of cohere's models can be found at the following link: Cohere Models
The basic API options are as follows:
model_name
: str
This option allows you to select the applicable model. can be aliased as model
.
temperature
: float
= 0.7
This option sets the sampling temperature
. Values can range between 0 and 2. Higher values (e.g., 0.8) make the output more random, while lower values (e.g., 0.2) make the output more focused and deterministic.
max_tokens
: int
| None
= None
Specifies the maximum number of tokens to generate in the chat completion. This option controls the length of text the model can generate in one instance.
Detailed information about the available API options can be found here.
The code provided assumes that your COHERE_API_KEY
is set in your environment variables. If you would like to manually specify your API key and also choose a different model, you can use the following code:
Upstage is a South Korean startup specializing in artificial intelligence (AI) technologies, particularly large language models (LLMs) and document AI. Their solutions are designed to deliver cost-efficient, high-performance AI capabilities across various industries.
Key Product: Solar LLM Upstage's flagship large language model known for its speed, efficiency, and scalability. Utilizing Depth-Up Scaling (DUS) technology, Solar LLM maximizes performance and is seamlessly integrated into platforms like Amazon SageMaker JumpStart via API.
Document AI Pack A comprehensive document processing solution powered by advanced OCR technology. This tool accurately extracts and digitizes essential information from complex documents.
AskUp Seargest An upgraded version of the AskUp chatbot, offering personalized search and recommendation services, building upon the integration with ChatGPT.
Upstage provides cutting-edge tools for enterprises to enhance automation, streamline workflows, and deliver AI-powered insights.
solar-pro
An enterprise-grade LLM designed for exceptional instruction-following and processing structured formats like HTML and Markdown. It excels in multilingual performance in English, Korean, and Japanese, with domain expertise in Finance, Healthcare, and Legal.
32,768 tokens
Up to May 2024
solar-mini
A compact 10.7B parameter LLM for businesses seeking AI solutions. Solar Mini is an instruction-following conversational model supporting English, Korean, and Japanese. It excels at fine-tuning, providing seamless integration and high-quality language processing.
32,768 tokens
Up to December 2023
solar-mini-ja
Solar Mini with enhanced capabilities in Japanese language processing. It is an instruction-following conversational model supporting Japanese as well as English and Korean.
32,768 tokens
Up to December 2023
A detailed specification of Upstage's models can be found at the following link: Upstage Models
The basic API options are as follows:
model_name
: str
This option allows you to select the applicable model. can be aliased as model
.
temperature
: float
= 0.7
This option sets the sampling temperature
. Values can range between 0 and 2. Higher values (e.g., 0.8) make the output more random, while lower values (e.g., 0.2) make the output more focused and deterministic.
max_tokens
: int
| None
= None
Specifies the maximum number of tokens to generate in the chat completion. This option controls the length of text the model can generate in one instance.
Detailed information about the available API options can be found here.
The code provided assumes that your UPSTAGE_API_KEY
is set in your environment variables. If you would like to manually specify your API key and also choose a different model, you can use the following code:
The Open LLM Leaderboard is a community-driven platform hosted by Hugging Face that tracks, ranks, and evaluates open-source large language models (LLMs) and chatbots. It provides datasets, score results, queries, and collections for various models, enabling users to compare performance across different benchmarks.
By utilizing the Open LLM Leaderboard, you can identify high-performing models suitable for integration with platforms like Hugging Face and Ollama, facilitating seamless deployment and interaction with LLMs.
For more information, visit the Open LLM Leaderboard.
The Vellum LLM Leaderboard is a platform that compares leading commercial and open-source large language models (LLMs) based on capabilities, pricing, and context window sizes. It provides insights into model performance across various tasks, including multitask reasoning, coding, and mathematics, assisting users in selecting the most suitable LLM for their specific needs.
By utilizing the Vellum LLM Leaderboard, you can identify high-performing models suitable for integration with platforms like Hugging Face and Ollama, facilitating seamless deployment and interaction with LLMs.
For more information, visit the Vellum LLM Leaderboard.