NL2SQL with LangChain and Azure SQL Database

Brian Spendolini

Image Picture1 copy

LangChain is an open-source framework for creating applications that use and are powered by language models (LLM/MLM/SML). In this post, basic LangChain components (toolkits, chains, agents) will be used to create a natural language to SQL prompt that will allow interactions with an Azure SQL Database; just ask the database what you want as if speaking to another person.

Getting Started

Where's the code?


All the code that is used in this post is available in the Azure SQL AI samples GitHub repository located here. The notebook outlined in this post is located here.

Looking at the python notebook used for this sample, you may notice in the repository there are 2. One is using OpenAI and the other is using Azure OpenAI. This post will be using the OpenAI notebook. The differences are very minor with only change being a reference to where the Azure OpenAI endpoints are located.

You can run this example in GitHub Codespace.

The first section in the notebook sets up our python environment by installing the necessary libraries via a requirements.txt file.

pip install -r requirements.txt

Next, install the SQL ODBC driver and referenced in the notebook are links for Windows, MacOS, and Linux. Included in the repository is the script for Ubuntu because that’s the OS that is used with GitHub codespace; the easiest place to run this sample.

Remember to fill out the values in the .env file. For this example, all that needs to be filled out is the OPENAI_API_KEY and connection information for the py-connectionString variable.

AZURE_OPENAI_API_KEY=""
AZURE_OPENAI_ENDPOINT=""
OPENAI_API_KEY=""
py-connectionString="mssql+pyodbc://USERNAME:PASSWORD@SERVER_NAME.database.windows.net/DATABASE_NAME?driver=ODBC+Driver+18+for+SQL+Server"
You can get your OpenAI API key here once logged into the platform.

The last setup section for the LangChain with Azure SQL Database example is the table creation script.

When using NL2SQL agents, its best to connect with a user with as few roles/privileges as possible for maximum security.

create table [dbo].[langtable] (id int Identity, username nvarchar(100))
GO

insert into [dbo].[langtable] (username) values('sammy')
insert into [dbo].[langtable] (username) values('mary')
insert into [dbo].[langtable] (username) values('jane')
insert into [dbo].[langtable] (username) values('fred')
insert into [dbo].[langtable] (username) values('billy')
insert into [dbo].[langtable] (username) values('jonny')
insert into [dbo].[langtable] (username) values('kenny')
insert into [dbo].[langtable] (username) values('dan')
insert into [dbo].[langtable] (username) values('frank')
insert into [dbo].[langtable] (username) values('jenny')
GO

select * from [dbo].[langtable]
GO

The Notebook

Select Kernel


Be sure to select a kernel for the python notebook by use the Select Kernel button in the upper right of the notebook.

Image Screenshot 2024 02 16 at 11 28 19 AM

If using VS Code/Codespace, it will install the Python and Jupyter extensions and result in the kernel being set.

Image Screenshot 2024 02 16 at 11 33 18 AM

 

The first stop in the notebook is setting up the libraries to be used in the python script. The section also loads any environment variables we set in the .env file to be used later.

import pyodbc
import os
from dotenv import load_dotenv
from langchain.agents import create_sql_agent
from langchain.agents.agent_types import AgentType
from langchain.sql_database import SQLDatabase
from langchain_community.agent_toolkits import SQLDatabaseToolkit
from langchain_openai import OpenAI
load_dotenv()
Once this section is run, you will be creating and testing the database connection. Using SQL Alchemy, a database engine is created. This engine is then added to a db variable that initializes the LangChain SQL Database Toolkit integration. The connection is then tested by printing out the results of 3 tests which include listing all tables the agent has access to and a SQL statement.
from sqlalchemy import create_engine

connectionString=os.environ["py-connectionString"]

db_engine = create_engine(connectionString)

db = SQLDatabase(db_engine, view_support=True, schema="dbo")

# test the connection
print(db.dialect)
print(db.get_usable_table_names())
db.run("select convert(varchar(25), getdate(), 120)")

The next section utilizes the LangChain SQL Database Toolkit to create a LangChain SQL agent that uses OpenAI as the LLM and the database connection.

toolkit = SQLDatabaseToolkit(db=db, llm=OpenAI(temperature=0))

agent_executor = create_sql_agent(
llm=OpenAI(temperature=0),
toolkit=toolkit,
verbose=True,
agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
)
Finally, once all the components are set up, we can execute the prompt with natural language asking the database to count the rows in our test table.
agent_executor.invoke("count the rows in the langtable table.")
with the results being similar to the following output:
> Entering new AgentExecutor chain... 
I need to use a SQL query to count the rows in the langtable table. 
Action: sql_db_query 
Action Input: SELECT COUNT(*) FROM langtable[(10,)]10 rows were counted in the langtable table. 
Final Answer: 10 
> Finished chain.
{'input': 'count the rows in the langtable table.', 'output': '10'}

Summary

You have just created your first LangChain with Azure SQL Database application using NL2SQL agents. As AI becomes more important in our development environments, being able to use structured data with an LLM is key to creating prompts that can answers the questions users ask more accurately.

2 comments

Discussion is closed. Login to edit/delete existing comments.

  • Jay Warshaw 0

    Brian, i’m trying to do the same thing but using the Azure AI Assistant and code interpreter, but there are no data connections to SQL (yet..?)any ideas if its possible and if so, how to accomplish that?

    • Brian SpendoliniMicrosoft employee 0

      We are looking into this and how best to have Azure SQL be present in AI Studio. Stay tuned!

Feedback usabilla icon