Make data more human with Azure OpenAI and Azure SQL

Valentina Alto

Azure Specialist Data&AI

Guest Post

Valentina is a Microsoft Specialist, focusing on Azure OpenAI. She has been working with Azure OpenAI and Azure SQL lately and in this post she’s sharing how to start using Azure OpenAI and Azure SQL or SQL Server to build next-gen solutions. You can find more of her articles in her blog: https://medium.com/@valentinaalto. Thanks Valentina!

In January 2023, Microsoft announced the General Availability of the Azure OpenAI Service (AOAI), which allows Azure customers to access OpenAI models directly within their Azure subscription and with their own capacity.

There are four models families available at the moment:

GPT: Generative Pre-trained Transformers are powerful generative models which are best suited for understanding and generating natural language.
Codex: those are a fine-tuned version of GPT which has a specific verticalization on understanding and generating programming languages.
DALL-E: this is the model which is able to generate images starting from natural language.
ChatGPT: this is the engine behind ChatGPT. As for now, the ChatGPT API is available as GPT-3.5-turbo and GPT-4.

In this post, we will see how you can use those models to query your SQL tables. The idea is that of generating SQL queries using Azure OpenAI models’ API with Python code.

Prerequisites

An Azure subscription – Create one for free.
Access granted to Azure OpenAI in the desired Azure subscription. You can apply for access to Azure OpenAI by completing the form at https://aka.ms/oai/access.
An Azure OpenAI resource with a model deployed. For more information about model deployment, see the resource deployment guide.
An Azure SQL Database – you can follow the instructions here.
Python 3.7.1 or later versions.
LangChain library installed (you can do so via pip install langchain)

Quickstart Demo

The first thing we want to do is import one of our SQL tables into a pandas dataframe. To do so, we can use the pyodbc library in Python, which you can easily install via pip install pyodc. To connect with my Azure SQL DB, I used an ODBC connection. You can find the information endpoints under the “Connection Strings” tab of your SQL DB instance:

import pyodbc
import pandas as pd
server = 'servername' 
database = 'AdventureWorks' 
username = 'yourusername' 
password = 'databasename' 

cnxn = pyodbc.connect("Driver={ODBC Driver 18 for SQL Server};Server=tcp:server,1433;Database=database;Uid=username;Pwd=password;Encrypt=yes;TrustServerCertificate=no;Connection Timeout=30;")
cursor = cnxn.cursor()
# select 10 rows from SQL table to insert in dataframe.
query = "select * from [SalesLT].[Address];"
df = pd.read_sql(query, cnxn)
print(df.head(10))

The result will be something like:

Note: while creating my Azure SQL DB, I’ve also created the sample database AdventureWorks. You can use any SQL table you wish.

The next thing we need to do is initialize our Azure OpenAI model. To do so, we will use LangChain, a Python library that makes it easier to combine the power of Large Language Models (LLMs) with the logic of an application.

To initialize your model, you also need to provide your keys and endpoint for your Azure OpenAI models’ (you can find them under the “Keys and Endpoints” tab in your Azure OpenAI resource) as well as the name of your deployment and model. You can find all this information from the Azure OpenAI playground when exporting your code. Click on the “View Code” link on the top right:

And then get the needed informations from the sample code:

For this demo, I’ve used a text-davinci-003 model.

Let’s have a look at the code:

# Import Azure OpenAI
from langchain.llms import AzureOpenAI
import openai
import os

os.environ["OPENAI_API_TYPE"] = "azure"
os.environ["OPENAI_API_KEY"] = "xxxx"
os.environ["OPENAI_API_BASE"] = "xxxx"
os.environ["OPENAI_API_VERSION"] = "2022-12-01"

llm = AzureOpenAI(deployment_name="text-davinci-003", model_name="text-davinci-003")

Finally, we can use the pandas DataFrame agent in LangChain and start asking questions against our table:

from langchain.agents import create_pandas_dataframe_agent
agent = create_pandas_dataframe_agent(llm, df, verbose=True)
agent.run("how many rows are there?")

As you can see, the agent, powered by Azure OpenAI models, is able to provide the right output. Also, thanks to the LangChain Agent, it is also able to show which is the thought process behind and which are the queries that it used to retrieve the answer.

Let’s try with other queries. For example, let’s check “what is the frequency of each country region”:

Another interesting query – just to check how “clever” the AI is, is to ask for the distinct values of city names available in our database:

Or maybe ask for the count the unique items. State and Provinces in this sample:

Azure OpenAI models are powerful generative AI tools that can be infused into custom applications, thanks to their APIs. With LangChain, the framework to manage those APIs is easier and allows for better interaction with the context, using any data stored in Azure SQL tables.

Author

Valentina Alto

Azure Specialist Data&AI

After an academic journey focused on Data Science, I started my career as an Azure Specialist with a strong passion for and focus on Machine Learning and AI in all their forms. I love to envision together with customers the limitless possibilities AI developments are enabling in the market.

10 comments

Discussion is closed. Login to edit/delete existing comments.

Boyang Zhou May 31, 2023

Try this

from dotenv import dotenv_values
import openai
config = dotenv_values(“.env”)
openai.api_type=config[“OPENAI_API_TYPE”]
openai.api_base=config[“OPENAI_API_BASE”]
openai.api_key=config[“OPENAI_API_KEY”]
openai.api_version=config[“OPENAI_API_VERSION”]
from langchain.llms import AzureOpenAI
from langchain import SQLDatabase,SQLDatabaseChain

mysql=SQLDatabase.from_uri(“mysql+pymysql://zhouboyang:*****@172.16.0.6:3306/students”)
llm=AzureOpenAI(temperature=0,openai_api_key=openai.api_key,deployment_name=’dv’)
db_chain=SQLDatabaseChain(llm=llm,database=mysql,verbose=True)
db_chain.run(‘how many students are there ?’)
Hui Sheng April 18, 2023
Hi,

I’m trying to use LangChain’s AzureOpenAI as below but getting this error.
Do you know how can I fix this?
```
openai.error.InvalidRequestError: Resource not found
```
```
# Import Azure OpenAI
from langchain.llms import AzureOpenAI
import openai
import os

os.environ["OPENAI_API_TYPE"] = "azure"
os.environ["OPENAI_API_KEY"] = "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
os.environ["OPENAI_API_BASE"] = "https://XXXXXX-openai.openai.azure.com/"
os.environ["OPENAI_API_VERSION"] = "2022-12-01"

llm = AzureOpenAI(
    openai_api_type="azure",
    deployment_name="text-davinci-003", 
    model_name="text-davinci-003") 

print(llm("hi"))
```
- Paul Lopez June 6, 2023 · Edited
  Setting the environment variables before the importing langchain seems to have fixed this for me –
```
import os

os.environ["OPENAI_API_TYPE"] = "azure"
os.environ["OPENAI_API_KEY"] = "xxxxxxxxxxxxxxxxxxxxxxx"
os.environ["OPENAI_API_BASE"] = "https://xxxxxxxxxxxx.openai.azure.com/"
os.environ["OPENAI_API_VERSION"] = "2022-12-01"

import pyodbc 
import pandas as pd
from langchain.llms import AzureOpenAI
import openai
```
- Lien, Kuanchieh May 4, 2023
  
  Hi Hui,
  
  I've met the issue as well, and I've been successfully tackled the issue by creating a virtual environment.
  
  D:\Users\2303906\gitApps\20_Build-Your-Own-AutoGPT-Apps-with-LangChain> python -m venv env 23.1.2
  D:\Users\2303906\gitApps\20_Build-Your-Own-AutoGPT-Apps-with-LangChain> env\Scripts\Activate.ps1
  (env) PS D:\Users\2303906\gitApps\20_Build-Your-Own-AutoGPT-Apps-with-LangChain> python .\app.py
  => Rome.
  
  After the env was set up, my app could read my Azure API Key. I am using powershell, so commnad line might differ from BASH, Linux...etc.
  
  Other way of creating the virtual environment:
  
  2. Create a Python environment
  Python 3.6 or higher using venv or conda.
  
  Using venv:
  cd langchain-experiments
  python3 -m venv env
  source env/bin/activate
  
  Using conda:
  cd...
  Read more
  Hi Hui,
  
  I’ve met the issue as well, and I’ve been successfully tackled the issue by creating a virtual environment.
  
  D:\Users\2303906\gitApps\20_Build-Your-Own-AutoGPT-Apps-with-LangChain> python -m venv env 23.1.2
  D:\Users\2303906\gitApps\20_Build-Your-Own-AutoGPT-Apps-with-LangChain> env\Scripts\Activate.ps1
  (env) PS D:\Users\2303906\gitApps\20_Build-Your-Own-AutoGPT-Apps-with-LangChain> python .\app.py
  => Rome.
  
  After the env was set up, my app could read my Azure API Key. I am using powershell, so commnad line might differ from BASH, Linux…etc.
  
  Other way of creating the virtual environment:
  
  2. Create a Python environment
  Python 3.6 or higher using venv or conda.
  
  Using venv:
  cd langchain-experiments
  python3 -m venv env
  source env/bin/activate
  
  Using conda:
  cd langchain-experiments
  conda create -n langchain-env python=3.8
  conda activate langchain-env
  
  Hope it helps.
  
  Read less
- Muthu Kasi May 3, 2023
  
  How to get the URL details for the OPENAI_API_BASE?
  - Lien, Kuanchieh May 4, 2023
    
    Hi Muthu,
    
    1. Go to Azure Portal
    2. Select Azure OpenAI Service you created
    3. On the left hand panel, under “Resource Management”, select “Keys and Endpoint”
    4. You will find the endpoint: “https://xxxxx.openai.azure.com/”
    5. Replace the xxxxx part with the name of your service
    6. For example: “https://aoaino1.openai.azure.com/”
- Boyang Zhou April 21, 2023
  
  llm = AzureOpenAI(
  openai_api_type=”azure”,
  deployment_name=”text-davinci-003″,
  openai_api_key=os.environ[“OPENAI_API_KEY”])
  
  Like above good resolve your problem
  - Dwarakanath Tarabanahalli May 1, 2023
    
    Same here. Anyone able to resolve this InvalidRequestError?
- Chris Miller April 19, 2023
  
  I am also recieving this exact error.
  I’ve confirmed that the base URL is correct and is usable for chat completion.
  Very interested in a fix for this as well.
David Comfort April 18, 2023

Really nice article Valentina. One thing I’ve been wondering is what happens “under-the-hood”? Is create_pandas_dataframe_agent splitting up the data and create embeddings? How does one control memory when using this agent? Will this work for large dataframes? Isn’t there going to be a significant cost associated with chatting with a large dataset? How does this compare to just entering the data in chunks in a regular Chat Agent? Can one use chat templates using this agent?