Chat Models
Author: PangPangGod
Peer Review : YooKyung Jeon
Proofread : Two-Jay
This is a part of LangChain Open Tutorial
Overview
This tutorial covers an explanation of various Chat models (OpenAI, Anthropic, etc.) along with brief usage examples.
Table of Contents
References
Environment Setup
Set up the environment. You may refer to Environment Setup for more details.
[Note]
langchain-opentutorialis a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials.You can checkout the
langchain-opentutorialfor more details.
If you want to get automated tracing of your model calls you can also set your LangSmith API key by uncommenting below code:
OpenAI
OpenAI is an AI research and deployment company based in San Francisco, dedicated to ensuring that artificial general intelligence benefits all of humanity. Models include the GPT series of language models, such as GPT-4 and GPT-4o, as well as the DALL·E series for image generation.
Model Description
gpt-4o
A versatile, high-intelligence flagship model, cheaper and faster than GPT-4 Turbo.
128,000 tokens
16,384 tokens
Up to October 2023
chatgpt-4o-latest
The continuously updated version of GPT-4o used in ChatGPT.
128,000 tokens
16,384 tokens
Continuously updated
gpt-4o-mini
A smaller, faster model with better performance than GPT-3.5 Turbo.
128,000 tokens
16,384 tokens
Up to October 2023
gpt-4-turbo
The latest GPT-4 Turbo model with vision, JSON mode, and function calling capabilities.
128,000 tokens
4,096 tokens
Up to December 2023
gpt-4o-realtime
A beta model optimized for real-time API use with audio and text inputs.
128,000 tokens
4,096 tokens
Up to October 2023
gpt-4o-audio
A beta model capable of handling audio inputs and outputs via the Chat Completions API.
128,000 tokens
16,384 tokens
Up to October 2023
gpt-3.5-turbo
Optimized for chat and non-chat tasks with natural language and code generation capabilities.
16,385 tokens
4,096 tokens
Up to September 2021
OpenAI offers a variety of model options. A detailed specification of these models can be found at the following link: OpenAI Model Specifications
Basic Model Options
The basic API options are as follows:
model_name:strThis option allows you to select the applicable model. can be aliased asmodel.temperature:float= 0.7 This option sets the samplingtemperature. Values can range between 0 and 2. Higher values (e.g., 0.8) make the output more random, while lower values (e.g., 0.2) make the output more focused and deterministic.max_tokens:int|None=NoneSpecifies the maximum number of tokens to generate in the chat completion. This option controls the length of text the model can generate in one instance.
Detailed information about the available API options can be found here.
The code provided assumes that your OPENAI_API_KEY is set in your environment variables. If you would like to manually specify your API key and also choose a different model, uncomment following section before using the code:
Anthropic
Anthropic is an AI safety and research company based in San Francisco, dedicated to building reliable, interpretable, and steerable AI systems. Their primary offering is the Claude family of large language models, including Claude 3.5 Sonnet and Claude 3.5 Haiku, designed for various applications such as reasoning, coding, and multilingual tasks.
Model Description
Claude 3.5 Sonnet
The most intelligent model in the Claude family.
200,000 tokens
8,192 tokens
Up to April 2024
Claude 3.5 Haiku
The fastest model with blazing speed.
200,000 tokens
8,192 tokens
Up to July 2024
Claude 3 Opus
Powerful model for highly complex tasks.
200,000 tokens
4,096 tokens
Up to August 2023
Claude 3 Sonnet
Balanced model offering strong utility for scaled deployments.
200,000 tokens
4,096 tokens
Up to August 2023
Claude 3 Haiku
Fastest and most compact model for near-instant responsiveness.
200,000 tokens
4,096 tokens
Up to August 2023
A detailed specification of these models can be found at the following link: Anthropic Model Specifications
Basic Model Options
The basic API options are as follows:
model_name:strThis option allows you to select the applicable model. can be aliased asmodel.temperature:float= 0.7 This option sets the samplingtemperature. Values can range between 0 and 2. Higher values (e.g., 0.8) make the output more random, while lower values (e.g., 0.2) make the output more focused and deterministic.max_tokens:int|None=NoneSpecifies the maximum number of tokens to generate in the chat completion. This option controls the length of text the model can generate in one instance.
Detailed information about the available API options can be found here.
The code provided assumes that your ANTHROPIC_API_KEY is set in your environment variables. If you would like to manually specify your API key and also choose a different model, uncomment following section before using the code:
Perplexity
Perplexity AI is a conversational search engine that integrates advanced large language models (LLMs) to provide direct answers to user queries with source citations. Their platform supports the following models, optimized for chat completion tasks, with extended context capabilities:
Supported Models
llama-3.1-sonar-small-128k-online
8B
127,072 tokens
Chat Completion
llama-3.1-sonar-large-128k-online
70B
127,072 tokens
Chat Completion
llama-3.1-sonar-huge-128k-online
405B
127,072 tokens
Chat Completion
A detailed specification of these models can be found at the following link: Perplexity Model Cards
Basic Model Options
The basic API options are as follows:
model: str Specifies the language model to use (e.g., llama-3.1-sonar-small-128k-online). This determines the performance and capabilities of the response.temperature: float = 0.7 Controls the randomness of responses. A value of 0 is deterministic, while 1 allows for the most random outputs.max_tokens: int | None = None Specifies the maximum number of tokens to generate in the chat completion. This option controls the length of text the model can generate in one instance.
For more detailed information about the available API options, visit Perplexity API Reference.
The code provided assumes that your PPLX_API_KEY is set in your environment variables. If you would like to manually specify your API key and also choose a different model, uncomment following section before using the code:
Together AI
Together AI is a San Francisco-based company specializing in decentralized cloud services for training and deploying generative AI models. Founded in 2022, they offer a cloud platform that enables researchers, developers, and organizations to train, fine-tune, and run AI models efficiently at scale. Their services include GPU clusters featuring NVIDIA GB200, H200, and H100, and they contribute to open-source AI research, models, and datasets to advance the field.
Together Inference
Offers the fastest inference stack in the industry, up to 4x faster than vLLM.
Operates at 11x lower cost compared to GPT-4 when using Llama-3 70B.
Features auto-scaling capabilities that adjust capacity based on API request volume.
Together Custom Models
Supports customized AI model training and fine-tuning.
Incorporates cutting-edge optimization technologies like FlashAttention-3.
Ensures full ownership of trained models.
Performance Optimization
Proprietary inference engine integrating FlashAttention-3 kernels and custom kernels.
Implements speculative decoding algorithms like Medusa and SpecExec.
Employs unique quantization techniques for maximum accuracy and performance.
Security and Privacy
User data is not used for training new models without explicit consent.
Provides users with complete control over data storage.
Supported Models
Supports over 200 open-source models, including Google Gemma, Meta's Llama 3.3, Qwen2.5, and Mistral/Mixtral from Mistral AI.
Enables multimodal AI models to process various types of data.
A detailed specification of these models can be found at the following link: Together AI Models
Basic Model Options
The basic API options are as follows:
model_name:strThis option allows you to select the applicable model. can be aliased asmodel.temperature:float= 0.7 This option sets the samplingtemperature. Values can range between 0 and 2. Higher values (e.g., 0.8) make the output more random, while lower values (e.g., 0.2) make the output more focused and deterministic.max_tokens:int|None=NoneSpecifies the maximum number of tokens to generate in the chat completion. This option controls the length of text the model can generate in one instance.
Detailed information about the available API options can be found here.
The code provided assumes that your TOGETHER_API_KEY is set in your environment variables. If you would like to manually specify your API key and also choose a different model, uncomment following section before using the code:
Cohere
Cohere is a leading AI company specializing in enterprise AI solutions, enabling businesses to easily adopt and utilize AI technologies through advanced large language models (LLMs). Their platform is tailored for natural language processing tasks, providing scalable and efficient tools for real-world applications.
Founded: 2020
Key Investors: Inovia Capital, NVIDIA, Oracle, Salesforce Ventures
Series C Funding: Raised $270 million
Mission: To provide an AI platform tailored for enterprise needs
Supported Models
command-r-7b
A small, fast update of the Command R+ model, excelling at RAG, tool use, agents, and multi-step reasoning.
128,000 tokens
4,000 tokens
command-r-plus
An instruction-following conversational model excelling in RAG, tool use, and multi-step reasoning.
128,000 tokens
4,000 tokens
command-r
A conversational model designed for high-quality language tasks and complex workflows like RAG and coding.
128,000 tokens
4,000 tokens
command
An instruction-following conversational model for high-quality tasks, more reliable and longer context than base models.
4,000 tokens
4,000 tokens
command-nightly
Nightly version of the command model with experimental and regularly updated features. Not for production use.
128,000 tokens
4,000 tokens
command-light
A smaller, faster version of the command model, maintaining near-equal capability with improved speed.
4,000 tokens
4,000 tokens
command-light-nightly
Nightly version of the command-light model, experimental and regularly updated. Not for production use.
4,000 tokens
4,000 tokens
c4ai-aya-expanse-8b
A highly performant 8B multilingual model serving 23 languages, designed for superior monolingual performance.
8,000 tokens
4,000 tokens
c4ai-aya-expanse-32b
A highly performant 32B multilingual model serving 23 languages, designed for superior monolingual performance.
128,000 tokens
4,000 tokens
A detailed specification of cohere's models can be found at the following link: Cohere Models
Basic Model Options
The basic API options are as follows:
model_name:strThis option allows you to select the applicable model. can be aliased asmodel.temperature:float= 0.7 This option sets the samplingtemperature. Values can range between 0 and 2. Higher values (e.g., 0.8) make the output more random, while lower values (e.g., 0.2) make the output more focused and deterministic.max_tokens:int|None=NoneSpecifies the maximum number of tokens to generate in the chat completion. This option controls the length of text the model can generate in one instance.
Detailed information about the available API options can be found here.
The code provided assumes that your COHERE_API_KEY is set in your environment variables. If you would like to manually specify your API key and also choose a different model, uncomment following section before using the code:
Upstage
Upstage is a South Korean startup specializing in artificial intelligence (AI) technologies, particularly large language models (LLMs) and document AI. Their solutions are designed to deliver cost-efficient, high-performance AI capabilities across various industries.
Key Product: Solar LLM Upstage's flagship large language model known for its speed, efficiency, and scalability. Utilizing Depth-Up Scaling (DUS) technology, Solar LLM maximizes performance and is seamlessly integrated into platforms like Amazon SageMaker JumpStart via API.
Document AI Pack A comprehensive document processing solution powered by advanced OCR technology. This tool accurately extracts and digitizes essential information from complex documents.
AskUp Seargest An upgraded version of the AskUp chatbot, offering personalized search and recommendation services, building upon the integration with ChatGPT.
Upstage provides cutting-edge tools for enterprises to enhance automation, streamline workflows, and deliver AI-powered insights.
Supported Models
solar-pro
An enterprise-grade LLM designed for exceptional instruction-following and processing structured formats like HTML and Markdown. It excels in multilingual performance in English, Korean, and Japanese, with domain expertise in Finance, Healthcare, and Legal.
32,768 tokens
Up to May 2024
solar-mini
A compact 10.7B parameter LLM for businesses seeking AI solutions. Solar Mini is an instruction-following conversational model supporting English, Korean, and Japanese. It excels at fine-tuning, providing seamless integration and high-quality language processing.
32,768 tokens
Up to December 2023
solar-mini-ja
Solar Mini with enhanced capabilities in Japanese language processing. It is an instruction-following conversational model supporting Japanese as well as English and Korean.
32,768 tokens
Up to December 2023
A detailed specification of Upstage's models can be found at the following link: Upstage Models
Basic Model Options
The basic API options are as follows:
model_name:strThis option allows you to select the applicable model. can be aliased asmodel.temperature:float= 0.7 This option sets the samplingtemperature. Values can range between 0 and 2. Higher values (e.g., 0.8) make the output more random, while lower values (e.g., 0.2) make the output more focused and deterministic.max_tokens:int|None=NoneSpecifies the maximum number of tokens to generate in the chat completion. This option controls the length of text the model can generate in one instance.
Detailed information about the available API options can be found here.
The code provided assumes that your UPSTAGE_API_KEY is set in your environment variables. If you would like to manually specify your API key and also choose a different model, uncomment following section before using the code:
Open LLM Leaderboard
The Open LLM Leaderboard is a community-driven platform hosted by Hugging Face that tracks, ranks, and evaluates open-source large language models (LLMs) and chatbots. It provides datasets, score results, queries, and collections for various models, enabling users to compare performance across different benchmarks.
By utilizing the Open LLM Leaderboard, you can identify high-performing models suitable for integration with platforms like Hugging Face and Ollama, facilitating seamless deployment and interaction with LLMs.
For more information, visit the Open LLM Leaderboard.
Vellum LLM Leaderboard
The Vellum LLM Leaderboard is a platform that compares leading commercial and open-source large language models (LLMs) based on capabilities, pricing, and context window sizes. It provides insights into model performance across various tasks, including multitask reasoning, coding, and mathematics, assisting users in selecting the most suitable LLM for their specific needs.
By utilizing the Vellum LLM Leaderboard, you can identify high-performing models suitable for integration with platforms like Hugging Face and Ollama, facilitating seamless deployment and interaction with LLMs.
For more information, visit the Vellum LLM Leaderboard.
Last updated