Guest Post
Valentina is a Microsoft Specialist, focusing on Azure OpenAI. She has been working with Azure OpenAI and Azure SQL lately and in this post she’s sharing how to start using Azure OpenAI and Azure SQL or SQL Server to build next-gen solutions. You can find more of her articles in her blog: https://medium.com/@valentinaalto. Thanks Valentina!In January 2023, Microsoft announced the General Availability of the Azure OpenAI Service (AOAI), which allows Azure customers to access OpenAI models directly within their Azure subscription and with their own capacity.
There are four models families available at the moment:
- GPT: Generative Pre-trained Transformers are powerful generative models which are best suited for understanding and generating natural language.
- Codex: those are a fine-tuned version of GPT which has a specific verticalization on understanding and generating programming languages.
- DALL-E: this is the model which is able to generate images starting from natural language.
- ChatGPT: this is the engine behind ChatGPT. As for now, the ChatGPT API is available as GPT-3.5-turbo and GPT-4.
In this post, we will see how you can use those models to query your SQL tables. The idea is that of generating SQL queries using Azure OpenAI models’ API with Python code.
Prerequisites
- An Azure subscription – Create one for free.
- Access granted to Azure OpenAI in the desired Azure subscription. You can apply for access to Azure OpenAI by completing the form at https://aka.ms/oai/access.
- An Azure OpenAI resource with a model deployed. For more information about model deployment, see the resource deployment guide.
- An Azure SQL Database – you can follow the instructions here.
- Python 3.7.1 or later versions.
- LangChain library installed (you can do so via pip install langchain)
Quickstart Demo
The first thing we want to do is import one of our SQL tables into a pandas dataframe. To do so, we can use the pyodbc library in Python, which you can easily install via pip install pyodc. To connect with my Azure SQL DB, I used an ODBC connection. You can find the information endpoints under the “Connection Strings” tab of your SQL DB instance:
import pyodbc
import pandas as pd
server = 'servername'
database = 'AdventureWorks'
username = 'yourusername'
password = 'databasename'
cnxn = pyodbc.connect("Driver={ODBC Driver 18 for SQL Server};Server=tcp:server,1433;Database=database;Uid=username;Pwd=password;Encrypt=yes;TrustServerCertificate=no;Connection Timeout=30;")
cursor = cnxn.cursor()
# select 10 rows from SQL table to insert in dataframe.
query = "select * from [SalesLT].[Address];"
df = pd.read_sql(query, cnxn)
print(df.head(10))
The result will be something like:
Note: while creating my Azure SQL DB, I’ve also created the sample database AdventureWorks. You can use any SQL table you wish.
The next thing we need to do is initialize our Azure OpenAI model. To do so, we will use LangChain, a Python library that makes it easier to combine the power of Large Language Models (LLMs) with the logic of an application.
To initialize your model, you also need to provide your keys and endpoint for your Azure OpenAI models’ (you can find them under the “Keys and Endpoints” tab in your Azure OpenAI resource) as well as the name of your deployment and model. You can find all this information from the Azure OpenAI playground when exporting your code. Click on the “View Code” link on the top right:
And then get the needed informations from the sample code:
For this demo, I’ve used a text-davinci-003 model.
Let’s have a look at the code:
# Import Azure OpenAI
from langchain.llms import AzureOpenAI
import openai
import os
os.environ["OPENAI_API_TYPE"] = "azure" os.environ["OPENAI_API_KEY"] = "xxxx" os.environ["OPENAI_API_BASE"] = "xxxx" os.environ["OPENAI_API_VERSION"] = "2022-12-01"
llm = AzureOpenAI(deployment_name="text-davinci-003", model_name="text-davinci-003")
Finally, we can use the pandas DataFrame agent in LangChain and start asking questions against our table:
from langchain.agents import create_pandas_dataframe_agent
agent = create_pandas_dataframe_agent(llm, df, verbose=True)
agent.run("how many rows are there?")
As you can see, the agent, powered by Azure OpenAI models, is able to provide the right output. Also, thanks to the LangChain Agent, it is also able to show which is the thought process behind and which are the queries that it used to retrieve the answer.
Let’s try with other queries. For example, let’s check “what is the frequency of each country region”:
Another interesting query – just to check how “clever” the AI is, is to ask for the distinct values of city names available in our database:
Or maybe ask for the count the unique items. State and Provinces in this sample:
Azure OpenAI models are powerful generative AI tools that can be infused into custom applications, thanks to their APIs. With LangChain, the framework to manage those APIs is easier and allows for better interaction with the context, using any data stored in Azure SQL tables.
Try this
from dotenv import dotenv_values
import openai
config = dotenv_values(“.env”)
openai.api_type=config[“OPENAI_API_TYPE”]
openai.api_base=config[“OPENAI_API_BASE”]
openai.api_key=config[“OPENAI_API_KEY”]
openai.api_version=config[“OPENAI_API_VERSION”]
from langchain.llms import AzureOpenAI
from langchain import SQLDatabase,SQLDatabaseChain
mysql=SQLDatabase.from_uri(“mysql+pymysql://zhouboyang:*****@172.16.0.6:3306/students”)
llm=AzureOpenAI(temperature=0,openai_api_key=openai.api_key,deployment_name=’dv’)
db_chain=SQLDatabaseChain(llm=llm,database=mysql,verbose=True)
db_chain.run(‘how many students are there ?’)
Hi,
I’m trying to use LangChain’s AzureOpenAI as below but getting this error.
Do you know how can I fix this?
Setting the environment variables before the importing langchain seems to have fixed this for me –
Hi Hui,
I've met the issue as well, and I've been successfully tackled the issue by creating a virtual environment.
D:\Users\2303906\gitApps\20_Build-Your-Own-AutoGPT-Apps-with-LangChain> python -m venv env 23.1.2
D:\Users\2303906\gitApps\20_Build-Your-Own-AutoGPT-Apps-with-LangChain> env\Scripts\Activate.ps1
(env) PS D:\Users\2303906\gitApps\20_Build-Your-Own-AutoGPT-Apps-with-LangChain> python .\app.py
=> Rome.
After the env was set up, my app could read my Azure API Key. I am using powershell, so commnad line might differ from BASH, Linux...etc.
Other way of creating the virtual environment:
2. Create a Python environment
Python 3.6 or higher using venv or conda.
Using venv:
cd langchain-experiments
python3 -m venv env
source env/bin/activate
Using conda:
cd...
How to get the URL details for the OPENAI_API_BASE?
Hi Muthu,
1. Go to Azure Portal
2. Select Azure OpenAI Service you created
3. On the left hand panel, under “Resource Management”, select “Keys and Endpoint”
4. You will find the endpoint: “https://xxxxx.openai.azure.com/”
5. Replace the xxxxx part with the name of your service
6. For example: “https://aoaino1.openai.azure.com/”
llm = AzureOpenAI(
openai_api_type=”azure”,
deployment_name=”text-davinci-003″,
openai_api_key=os.environ[“OPENAI_API_KEY”])
Like above good resolve your problem
Same here. Anyone able to resolve this InvalidRequestError?
I am also recieving this exact error.
I’ve confirmed that the base URL is correct and is usable for chat completion.
Very interested in a fix for this as well.
Really nice article Valentina. One thing I’ve been wondering is what happens “under-the-hood”? Is create_pandas_dataframe_agent splitting up the data and create embeddings? How does one control memory when using this agent? Will this work for large dataframes? Isn’t there going to be a significant cost associated with chatting with a large dataset? How does this compare to just entering the data in chunks in a regular Chat Agent? Can one use chat templates using this agent?