Fallbacks

Open in ColabOpen in GitHub

Overview

This tutorial covers how to implement fallback mechanisms in LangChain applications to gracefully handle various types of failures and errors.

Fallbacks are crucial for building robust LLM applications that can handle API errors, rate limits, and other potential failures without disrupting the user experience.

In this tutorial, we will explore different fallback strategies and implement practical examples using multiple LLM providers.

Table of Contents

Key Concepts

  1. Fundamentals of Fallbacks

    • Core concepts of fallback mechanisms

    • Setting up basic fallback configurations

    • Understanding error handling patterns

    • Implementation of simple fallback chains

  2. API Error Management

    • Effectively handling rate limit errors

    • Managing API downtime scenarios

    • Implementing retry strategies

    • Simulating errors through mock testing

  3. Advanced Fallback Patterns

    • Configuring multiple fallback models

    • Setting up custom exception handling

    • Sequential fallback execution

    • Context-aware model switching

    • Model-specific prompt templating

  4. Practical Implementation

    • Integration with OpenAI and Anthropic models

    • Building resilient chains with fallbacks

    • Real-world usage patterns and best practices

    • Performance optimization techniques

References


Environment Setup

Set up the environment. You may refer to Environment Setup for more details.

[Note]

  • langchain-opentutorial is a package that provides a set of easy-to-use environment setup, useful functions and utilities for tutorials.

  • You can checkout the langchain-opentutorial for more details.

You can alternatively set OPENAI_API_KEY in .env file and load it.

[Note] This is not necessary if you've already set OPENAI_API_KEY in previous steps.

What are Fallbacks?

In LLM applications, various errors or failures can occur, such as LLM API issues, degradation in model output quality, and other integration-related problems. The fallback feature gracefully handle and isolate these issues.

Note that they can be applied at both the LLM calls and the level of an entire executable chain.

Handling LLM API Errors

Handling LLM API errors is one of the most common use cases for fallbacks.

API requests can fail due to various reasons. The API might be down, you might have reached usage rate limits, or other issues. By implementing fallbacks, you can protect your application against these types of problems.

Important: By default, many LLM wrappers capture errors and retry. When using fallbacks, it is advisable to disable this default behavior; otherwise, the first wrapper will keep retrying and prevent the fallback from triggering.

Introduction to Rate Limit Testing

First, let's perform a mock test for the RateLimitError that can occur with OpenAI. A RateLimitError is an error that occurs when you exceed the OpenAI API usage limits of the OpenAI API.

Why Handle Rate Limit Errors?

RateLimitError restricts API requests for a certain period, so applications need to handle them appropriately. Mock testing verifies application behaves and error-handling logic during RateLimitErrors.

Benefits of Mock Testing

Mock testing helps prevent potential production issues and ensures stable service delivery.

Setting up LLM Fallback Configuration

Create a ChatOpenAI object and assign it to openai_llm, setting max_retries=0 to prevent retry attempts that might occur due to API call limits or restrictions.

Use with_fallbacks to configure anthropic_llm as the fallback LLM and assign this configuration to llm.

Testing API Rate Limits with Fallback Models

In this example, we'll simulate OpenAI API rate limits and test system behavior during API cost limitation errors.

When the OpenAI GPT model encounters an error, the Anthropic fallback model successfully takes over and performs the inference instead.

When a fallback model, configured with with_fallbacks(), executes successfully, the RateLimitError is not raised, ensuring continuous operation of your application.

💡 This demonstrates LangChain's fallback mechanism, which provides resilience against API limitations and ensures continued application function even when the primary model is unavailable.

A model configured with llm.with_fallbacks() behaves like a regular Runnable model.

The code below also does not throw an error because the fallback model performed successfully.

Specifying Exceptions to Trigger Fallbacks

You can precisely define when a fallback should trigger, allowing for more granular control over the fallback mechanism's behavior.

For example, you can specify certain exception classes or error codes to trigger the fallback logic, reducing unnecessary calls and improving efficiency in error handling.

The example below prints an "error" message because exceptions_to_handle is configured to trigger the fallback only for KeyboardInterrupt. The fallback will not trigger for other exceptions.

Specifying Multiple Fallback Models Sequentially

You can specify multiple fallback models, not just one. They will be tried sequentially if multiple models are specified.

Create two chains: one that causes an error and one that works normally.

Using Different Prompt Templates for Each Model

You can use different prompt templates tailored to each model's characteristics. For example, GPT-4 handles complex instructions, while GPT-3.5 works with simpler ones.

Automatic Model Switching Based on Context Length

For long contexts, automatically switch to models with larger context windows if token limits are exceeded.

Last updated