April 14th, 2023

Make data more human with Azure OpenAI and Azure SQL

Valentina Alto
Azure Specialist Data&AI

Guest Post

Valentina is a Microsoft Specialist, focusing on Azure OpenAI. She has been working with Azure OpenAI and Azure SQL lately and in this post she’s sharing how to start using Azure OpenAI and Azure SQL or SQL Server to build next-gen solutions. You can find more of her articles in her blog: https://medium.com/@valentinaalto. Thanks Valentina!

In January 2023, Microsoft announced the General Availability of the Azure OpenAI Service (AOAI), which allows Azure customers to access OpenAI models directly within their Azure subscription and with their own capacity.

Image Screenshot 2023 04 05 203209

There are four models families available at the moment:

  • GPT: Generative Pre-trained Transformers are powerful generative models which are best suited for understanding and generating natural language.
  • Codex: those are a fine-tuned version of GPT which has a specific verticalization on understanding and generating programming languages.
  • DALL-E: this is the model which is able to generate images starting from natural language.
  • ChatGPT: this is the engine behind ChatGPT. As for now, the ChatGPT API is available as GPT-3.5-turbo and GPT-4.

 

Image Screenshot 2023 04 05 203817

In this post, we will see how you can use those models to query your SQL tables. The idea is that of generating SQL queries using Azure OpenAI models’ API with Python code.

Prerequisites

  • An Azure subscription – Create one for free.
  • Access granted to Azure OpenAI in the desired Azure subscription. You can apply for access to Azure OpenAI by completing the form at https://aka.ms/oai/access.
  • An Azure OpenAI resource with a model deployed. For more information about model deployment, see the resource deployment guide.
  • An Azure SQL Database – you can follow the instructions here.
  • Python 3.7.1 or later versions.
  • LangChain library installed (you can do so via pip install langchain)

Quickstart Demo

The first thing we want to do is import one of our SQL tables into a pandas dataframe. To do so, we can use the pyodbc library in Python, which you can easily install via pip install pyodc. To connect with my Azure SQL DB, I used an ODBC connection. You can find the information endpoints under the “Connection Strings” tab of your SQL DB instance:

Image Screenshot 2023 04 05 210810

import pyodbc
import pandas as pd
server = 'servername' 
database = 'AdventureWorks' 
username = 'yourusername' 
password = 'databasename' 

cnxn = pyodbc.connect("Driver={ODBC Driver 18 for SQL Server};Server=tcp:server,1433;Database=database;Uid=username;Pwd=password;Encrypt=yes;TrustServerCertificate=no;Connection Timeout=30;")
cursor = cnxn.cursor()
# select 10 rows from SQL table to insert in dataframe.
query = "select * from [SalesLT].[Address];"
df = pd.read_sql(query, cnxn)
print(df.head(10))

The result will be something like:

Image Screenshot 2023 04 06 085152

Note: while creating my Azure SQL DB, I’ve also created the sample database AdventureWorks. You can use any SQL table you wish.

The next thing we need to do is initialize our Azure OpenAI model. To do so, we will use LangChain, a Python library that makes it easier to combine the power of Large Language Models (LLMs) with the logic of an application.

To initialize your model, you also need to provide your keys and endpoint for your Azure OpenAI models’ (you can find them under the “Keys and Endpoints” tab in your Azure OpenAI resource) as well as the name of your deployment and model. You can find all this information from the Azure OpenAI playground when exporting your code. Click on the  “View Code” link on the top right:

Image Screenshot 2023 04 05 211525

And then get the needed informations from the sample code:

Image Screenshot 2023 04 05 211602

For this demo, I’ve used a text-davinci-003 model.

Let’s have a look at the code:

# Import Azure OpenAI
from langchain.llms import AzureOpenAI
import openai
import os
os.environ["OPENAI_API_TYPE"] = "azure"
os.environ["OPENAI_API_KEY"] = "xxxx"
os.environ["OPENAI_API_BASE"] = "xxxx"
os.environ["OPENAI_API_VERSION"] = "2022-12-01"

llm = AzureOpenAI(deployment_name="text-davinci-003", model_name="text-davinci-003") 

Finally, we can use the pandas DataFrame agent in LangChain and start asking questions against our table:

from langchain.agents import create_pandas_dataframe_agent
agent = create_pandas_dataframe_agent(llm, df, verbose=True)
agent.run("how many rows are there?")

Image Screenshot 2023 04 06 085801

As you can see, the agent, powered by Azure OpenAI models, is able to provide the right output. Also, thanks to the LangChain Agent, it is also able to show which is the thought process behind and which are the queries that it used to retrieve the answer.

Let’s try with other queries. For example, let’s check “what is the frequency of each country region”:

Image Screenshot 2023 04 06 091557

Another interesting query – just to check how “clever” the AI is, is to ask for the distinct values of city names available in our database:

Image Screenshot 2023 04 06 091331

Or maybe ask for the count the unique items. State and Provinces in this sample:

Image Screenshot 2023 04 06 091231

Azure OpenAI models are powerful generative AI tools that can be infused into custom applications, thanks to their APIs. With LangChain, the framework to manage those APIs is easier and allows for better interaction with the context, using any data stored in Azure SQL tables.

Category
Azure SQL

Author

Valentina Alto
Azure Specialist Data&AI

After an academic journey focused on Data Science, I started my career as an Azure Specialist with a strong passion for and focus on Machine Learning and AI in all their forms. I love to envision together with customers the limitless possibilities AI developments are enabling in the market.

10 comments

Discussion is closed. Login to edit/delete existing comments.

  • Boyang ZhouMicrosoft employee

    Try this

    from dotenv import dotenv_values
    import openai
    config = dotenv_values(“.env”)
    openai.api_type=config[“OPENAI_API_TYPE”]
    openai.api_base=config[“OPENAI_API_BASE”]
    openai.api_key=config[“OPENAI_API_KEY”]
    openai.api_version=config[“OPENAI_API_VERSION”]
    from langchain.llms import AzureOpenAI 
    from langchain import SQLDatabase,SQLDatabaseChain

    mysql=SQLDatabase.from_uri(“mysql+pymysql://zhouboyang:*****@172.16.0.6:3306/students”)
    llm=AzureOpenAI(temperature=0,openai_api_key=openai.api_key,deployment_name=’dv’)
    db_chain=SQLDatabaseChain(llm=llm,database=mysql,verbose=True)
    db_chain.run(‘how many students are there ?’)

  • Hui Sheng

    Hi,

    I’m trying to use LangChain’s AzureOpenAI as below but getting this error.
    Do you know how can I fix this?

    openai.error.InvalidRequestError: Resource not found
    # Import Azure OpenAI
    from langchain.llms import AzureOpenAI
    import openai
    import os
    
    os.environ["OPENAI_API_TYPE"] = "azure"
    os.environ["OPENAI_API_KEY"] = "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
    os.environ["OPENAI_API_BASE"] = "https://XXXXXX-openai.openai.azure.com/"
    os.environ["OPENAI_API_VERSION"] = "2022-12-01"
    
    llm = AzureOpenAI(
        openai_api_type="azure",
        deployment_name="text-davinci-003", 
        model_name="text-davinci-003") 
    
    print(llm("hi"))
    • Paul Lopez · Edited

      Setting the environment variables before the importing langchain seems to have fixed this for me –

      import os
      
      os.environ["OPENAI_API_TYPE"] = "azure"
      os.environ["OPENAI_API_KEY"] = "xxxxxxxxxxxxxxxxxxxxxxx"
      os.environ["OPENAI_API_BASE"] = "https://xxxxxxxxxxxx.openai.azure.com/"
      os.environ["OPENAI_API_VERSION"] = "2022-12-01"
      
      import pyodbc 
      import pandas as pd
      from langchain.llms import AzureOpenAI
      import openai
    • Lien, Kuanchieh

      Hi Hui,

      I've met the issue as well, and I've been successfully tackled the issue by creating a virtual environment.

      D:\Users\2303906\gitApps\20_Build-Your-Own-AutoGPT-Apps-with-LangChain> python -m venv env 23.1.2
      D:\Users\2303906\gitApps\20_Build-Your-Own-AutoGPT-Apps-with-LangChain> env\Scripts\Activate.ps1
      (env) PS D:\Users\2303906\gitApps\20_Build-Your-Own-AutoGPT-Apps-with-LangChain> python .\app.py
      => Rome.

      After the env was set up, my app could read my Azure API Key. I am using powershell, so commnad line might differ from BASH, Linux...etc.

      Other way of creating the virtual environment:

      2. Create a Python environment
      Python 3.6 or higher using venv or conda.

      Using venv:
      cd langchain-experiments
      python3 -m venv env
      source env/bin/activate

      Using conda:
      cd...

      Read more
    • Muthu Kasi

      How to get the URL details for the OPENAI_API_BASE?

      • Lien, Kuanchieh

        Hi Muthu,

        1. Go to Azure Portal
        2. Select Azure OpenAI Service you created
        3. On the left hand panel, under “Resource Management”, select “Keys and Endpoint”
        4. You will find the endpoint: “https://xxxxx.openai.azure.com/”
        5. Replace the xxxxx part with the name of your service
        6. For example: “https://aoaino1.openai.azure.com/”

    • Boyang ZhouMicrosoft employee

      llm = AzureOpenAI(
      openai_api_type=”azure”,
      deployment_name=”text-davinci-003″,
      openai_api_key=os.environ[“OPENAI_API_KEY”])

      Like above good resolve your problem

      • Dwarakanath TarabanahalliMicrosoft employee

        Same here. Anyone able to resolve this InvalidRequestError?

    • Chris Miller

      I am also recieving this exact error.
      I’ve confirmed that the base URL is correct and is usable for chat completion.
      Very interested in a fix for this as well.

  • David Comfort

    Really nice article Valentina. One thing I’ve been wondering is what happens “under-the-hood”? Is create_pandas_dataframe_agent splitting up the data and create embeddings? How does one control memory when using this agent? Will this work for large dataframes? Isn’t there going to be a significant cost associated with chatting with a large dataset? How does this compare to just entering the data in chunks in a regular Chat Agent? Can one use chat templates using this agent?