Fallbacks
Author: Haseom Shin
Peer Review:
Proofread : Chaeyoon Kim
This is a part of LangChain Open Tutorial
Overview
This tutorial covers how to implement fallback mechanisms in LangChain applications to gracefully handle various types of failures and errors.
Fallbacks are crucial for building robust LLM applications that can handle API errors, rate limits, and other potential failures without disrupting the user experience.
In this tutorial, we will explore different fallback strategies and implement practical examples using multiple LLM providers.
Table of Contents
Key Concepts
Fundamentals of Fallbacks
Core concepts of fallback mechanisms
Setting up basic fallback configurations
Understanding error handling patterns
Implementation of simple fallback chains
API Error Management
Effectively handling rate limit errors
Managing API downtime scenarios
Implementing retry strategies
Simulating errors through mock testing
Advanced Fallback Patterns
Configuring multiple fallback models
Setting up custom exception handling
Sequential fallback execution
Context-aware model switching
Model-specific prompt templating
Practical Implementation
Integration with OpenAI and Anthropic models
Building resilient chains with fallbacks
Real-world usage patterns and best practices
Performance optimization techniques
References
Environment Setup
Set up the environment. You may refer to Environment Setup for more details.
[Note]
langchain-opentutorialis a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials.You can checkout the
langchain-opentutorialfor more details.
You can alternatively set OPENAI_API_KEY in .env file and load it.
[Note] This is not necessary if you've already set OPENAI_API_KEY in previous steps.
What are Fallbacks?
In LLM applications, various errors or failures can occur, such as LLM API issues, degradation in model output quality, and other integration-related problems. The fallback feature gracefully handle and isolate these issues.
Note that they can be applied at both the LLM calls and the level of an entire executable chain.
Handling LLM API Errors
Handling LLM API errors is one of the most common use cases for fallbacks.
API requests can fail due to various reasons. The API might be down, you might have reached usage rate limits, or other issues. By implementing fallbacks, you can protect your application against these types of problems.
Important: By default, many LLM wrappers capture errors and retry. When using fallbacks, it is advisable to disable this default behavior; otherwise, the first wrapper will keep retrying and prevent the fallback from triggering.
Introduction to Rate Limit Testing
First, let's perform a mock test for the RateLimitError that can occur with OpenAI. A RateLimitError is an error that occurs when you exceed the OpenAI API usage limits of the OpenAI API.
Why Handle Rate Limit Errors?
RateLimitError restricts API requests for a certain period, so applications need to handle them appropriately. Mock testing verifies application behaves and error-handling logic during RateLimitErrors.
Benefits of Mock Testing
Mock testing helps prevent potential production issues and ensures stable service delivery.
Setting up LLM Fallback Configuration
Create a ChatOpenAI object and assign it to openai_llm, setting max_retries=0 to prevent retry attempts that might occur due to API call limits or restrictions.
Use with_fallbacks to configure anthropic_llm as the fallback LLM and assign this configuration to llm.
Testing API Rate Limits with Fallback Models
In this example, we'll simulate OpenAI API rate limits and test system behavior during API cost limitation errors.
When the OpenAI GPT model encounters an error, the Anthropic fallback model successfully takes over and performs the inference instead.
When a fallback model, configured with with_fallbacks(), executes successfully, the RateLimitError is not raised, ensuring continuous operation of your application.
💡 This demonstrates LangChain's fallback mechanism, which provides resilience against API limitations and ensures continued application function even when the primary model is unavailable.
A model configured with llm.with_fallbacks() behaves like a regular Runnable model.
The code below also does not throw an error because the fallback model performed successfully.
Specifying Exceptions to Trigger Fallbacks
You can precisely define when a fallback should trigger, allowing for more granular control over the fallback mechanism's behavior.
For example, you can specify certain exception classes or error codes to trigger the fallback logic, reducing unnecessary calls and improving efficiency in error handling.
The example below prints an "error" message because exceptions_to_handle is configured to trigger the fallback only for KeyboardInterrupt. The fallback will not trigger for other exceptions.
Specifying Multiple Fallback Models Sequentially
You can specify multiple fallback models, not just one. They will be tried sequentially if multiple models are specified.
Create two chains: one that causes an error and one that works normally.
Using Different Prompt Templates for Each Model
You can use different prompt templates tailored to each model's characteristics. For example, GPT-4 handles complex instructions, while GPT-3.5 works with simpler ones.
Automatic Model Switching Based on Context Length
For long contexts, automatically switch to models with larger context windows if token limits are exceeded.
Last updated