SQL

Open in ColabOpen in GitHub

Overview

This tutorial covers how to use create_sql_query_chain to generate SQL queries, execute them, and derive answers.

Additionally, we'll explore the differences between this method and the SQL Agent.

sql-chain-work-flow

Table of Contents

References


Environment Setup

Setting up your environment is the first step. See the Environment Setup guide for more details.

[Note]

The langchain-opentutorial is a package of easy-to-use environment setup guidance, useful functions and utilities for tutorials. Check out the langchain-opentutorial for more details.

You can alternatively set OPENAI_API_KEY in .env file and load it.

[Note] This is not necessary if you've already set OPENAI_API_KEY in previous steps.

Loading SQL Databases

Usage methods for various databases and required library list:

Database

Required Library

Code Example

PostgreSQL

psycopg2-binary

db = SQLDatabase.from_uri("postgresql://:@:/")

MySQL

pymysql

db = SQLDatabase.from_uri("mysql+pymysql://:@:/")

SQLite

Included in standard lib

db = SQLDatabase.from_uri("sqlite:///path/to/your_database.db")

Oracle

cx_Oracle

db = SQLDatabase.from_uri("oracle+cx_oracle://:@:/")

Example for PostgreSQL :

Load and verify the sample database data.

SQL Query Chain

create_sql_query_chain generates a chain for creating SQL queries based on natural language input.

It leverages LLMs to translate natural language into SQL statements.

[RECOMMEDED] Create an LLM object and generate a chain by providing the LLM and DB as parameters.

Since changing the model may cause unexpected behavior, this tutorial will proceed with gpt-3.5-turbo .

What if the latest model is used?

Using the latest version of OpenAI's LLM may cause issues with the output.

Unnecessary information, such as 'SQLQuery: ' , is included in the output along with the query.

(Optional) You can specify the prompt directly using the method below.

When writing it yourself, you can include table_info along with descriptive column descriptions for better explanation.

Executing the chain generates queries based on the database.

How to use the get_prompts method

The chain.get_prompt method allows you to retrieve the current prompt template used in a LangChain chain.

This prompt contains the instructions given to the LLM, including the input structure, expected variables, and contextual guidelines.

Key Features

  1. Prompt Retrieval:

  • Fetches the active prompt template to inspect or debug the chain's behavior.

  1. Dynamic Variable Substitution:

  • Displays how variables are dynamically substituted within the template.

  1. Customizability:

  • Enables users to modify parts of the prompt dynamically.

check the get_prompts 's contents

There are various elements:

  • input_variables

  • input_types

  • partial_variables

  • template

Modify the variable values and check the results.

You can specify variables, including dialect, when invoking create_sql_query_chain.

QuerySQLDatabaseTool

  1. Executing SQL Queries:

  • Executes the provided SQL query on the connected database and retrieves the results.

  • Encapsulates the functionality for interacting with the database, promoting code reusability.

  1. Integration with LangChain Agents:

  • LangChain agents are used to convert natural language into SQL queries.

  • This tool performs the execution step by running the agent-generated SQL query and returning the results.

  1. Database Abstraction:

  • Supports various types of databases, such as MySQL, PostgreSQL, SQLite, and more.

  • Handles direct database operations internally, reducing the dependency of user code on database-specific details.

Let's verify if the generated query executes correctly.

Enhance and generate answers using the LLM

Using the chain created in the previous step results in short, concise answers. This can be adjusted using an LCEL-style chain to provide more natural and detailed responses.

Using SQL Query Chain with an Agent

What is agent_toolkits ?

agent_toolkits in LangChain is a collection of tools designed to simplify the creation and use of Agents optimized for specific domains or use cases.

Each toolkit encapsulates the functionality and workflows needed for specific tasks (e.g., SQL query processing, file system operations, API calls), enabling developers to perform complex tasks with ease.

create_sql_agent

create_sql_agent is a specialized function within the agent_toolkits library that simplifies the process of interacting with SQL databases.

It is designed to streamline SQL query generation and execution by leveraging LangChain’s agent capabilities. Developers can integrate this tool to enable agents to:

  • Connect to SQL databases seamlessly.

  • Automatically generate SQL queries based on natural language input.

  • Retrieve and format results for easy consumption.

This functionality is particularly useful for scenarios requiring dynamic database interactions, such as reporting, analytics, or user-facing applications that need query-based responses.

Using an agent, you can generate SQL queries and output the results as answers.

Agents work well with models like gpt-4o and gpt-4o-mini , in contrast to the issues encountered with chains when changing the model.

Differences Between create_sql_query_chain and SQL Agent

  1. create_sql_query_chain :

    • Translates user input into a single SQL query and executes it directly.

    • Best for simple, direct query execution.

  2. SQL Agent :

    • Handles more complex workflows, involving multiple queries and reasoning steps.

    • Ideal for dynamic or multi-step tasks.

  3. Conclusion: It is recommended to use create_sql_query_chain for simple queries, while SQL Agent is suggested for complex or iterative processes.


Appendix : Chain with gpt-4o and a Post-Processing Function

As observed earlier, gpt-4o output can be inconsistent.

To improve this, a chain can be constructed with a post-processing function applied.

This part consists of the following procedure:

  • Natural language question input → SQL query generation by LLM → Post-processing of the generated query → Query execution on the database → Natural language answer output by LLM

Load and verify the sample database data.

Create an LLM object and use a custom template to generate a chain by providing the LLM and DB as parameters.

Check the execution result

Define a function for post-processing the SQL query.

Create a tool to execute the generated query.

Combine the chains: SQL query generation → Post-processing of the generated query → Query execution on the database

Combine the chains: SQL query generation → Post-processing of the generated query → Query execution on the database → Natural language answer output by LLM

Last updated