Vanna.ai and Azure SQL Database

Brian Spendolini

Vanna.ai and Microsoft SQL

Vanna.ai, in partnership with Azure SQL Database Product Management, bring you a collaboration for using Vanna’s natural language to SQL (NL2SQL) agent with the Azure SQL Database.

How does Vanna.ai work?

First, Vanna is an open-source Python RAG (Retrieval-Augmented Generation) framework for SQL generation/natural language to SQL (NL2SQL) and provides a fully hosted platform for application development. Get started for free here.

(From the documentation)

At its core, Vanna is a Python package that uses retrieval augmentation to help you generate accurate SQL queries for your database using LLMs.

How Vanna Works

Vanna works in two easy steps – train a RAG “model” on your data, and then ask questions which will return SQL queries that can be set up to automatically run on your database.

BYOLLM

Already have an LLM and want to bring your own? Or have a vector DB for storage? Or both? No problem! Easily configure an environment here and out comes the python notebook needed to get started (Vanna has created notebooks for EVERY combination!).

Choosing a database, LLM and vector DB on vanna.ai

Vanna.ai has always been simple to use, but now, with the addition of Microsoft SQL Server/Azure SQL as a database connecter, it’s even easier.

Why use AI to chat with your database in the first place? 

To start, it allows you spend less time writing SQL and more time generating insights on your data. You no longer have to be an expert in SQL to get information from the database. And if you are an expert, save some time writing a query when you can just ask for it. In fact, everyone benefits by being able to ask the database about business questions and getting the answer; have Vanna find the relevant tables and columns across your data estate.

And why Vanna? 

With it’s retrieval framework and feedback loop, allows you to quickly train up an “AI Analyst” in much the same way that you would onboard a human analyst by supervising it. While using it, you can give instant feedback if the generated SQL was correct and have those answers help train the AI for future questions. And as will be seen in the blog, Vanna can be used as a standalone web app, in Jupyter, as part of an embedded widget, or as part of a MS Teams bot. See examples of this at https://github.com/vanna-ai.

Vanna x SQL x Codespace

This blog post will go over creating a NL2SQL agent using Vanna.ai and SQL Server/Azure SQL using a python notebook and a GitHub codespace. The notebook will be based off of the one hosted by Vanna here which uses Vanna’s hosted LLM and vector database with SQL Server/Azure SQL. The codespace will also be using a local free instance of SQL Server using docker in docker loaded with the Adventure Works sample data.

Get started by:

  1. Going to this GitHub repository
  2. Clicking on the green Code button
  3. Clicking the 3 dots
  4. Selecting + New with options.

Image Screenshot 2024 03 15 at 9 20 23 AM

Then on the following page:

  1. Click the Default project configuration button
  2. Select Python (we will be adding many more configurations as we add to this repository)
  3. Click the green Create codespace button

Image Screenshot 2024 03 15 at 9 22 47 AM

Once the codespace has started, use the terminal to run the following code to install sqlcmd:

curl https://packages.microsoft.com/keys/microsoft.asc | sudo tee /etc/apt/trusted.gpg.d/microsoft.asc
sudo add-apt-repository "$(wget -qO- https://packages.microsoft.com/config/ubuntu/20.04/prod.list)"
sudo apt-get update
sudo apt-get install sqlcmd

then, create the database with the following command:

sqlcmd create mssql --accept-eula --using https://aka.ms/AdventureWorksLT.bak

Image Screenshot 2024 03 15 at 10 00 09 AM

and once the database is finished creating, the connection string for ODBC will be needed for the notebook, run the following command to get the connection string:

sqlcmd config connection-strings

Image Screenshot 2024 03 15 at 9 59 46 AM

sss

Security Check


In the connect string, there is TrustServerCertificate=yes. For local development environments, this is OK but NEVER use this setting in live/production environments. More on this here

One of the files included in this directory is the odbcDriverInstallUbuntu.txt file. This file contains the commands for installing the Microsoft ODBC driver for SQL Server on Ubuntu (Linux). Run the contents of this file in the terminal to install the driver. Once installed, run the following command for checking the version to ensure it has been installed successfully:

more /etc/odbcinst.ini

Image Screenshot 2024 03 15 at 10 18 46 AM

Open your notebook to…

Now that the database is created with the Adventure Works data, open the vanna_and_sql.ipynb notebook in the code editor by clicking on it.

Image Screenshot 2024 03 15 at 10 22 18 AM

The first Setup step installs the vanna and pyodbc python libraries to be used in the notebook. Click on the Execute Cell button to run this command.

Image Screenshot 2024 03 15 at 10 24 50 AM

Choose a Kernel


You may see a dialog box asking you to choose a Python Kernal for the notebook. Choose python environment for the first choice and then the python version installed in this codespace.

Next, execute the following cell that prepares the libraries to be used for the python commands that follow:

Image Screenshot 2024 03 15 at 10 28 13 AM

The last cell in the Setup section of the notebook sets the Vanna API key and model name to be used for the Vanna hosted LLM and vector database. Remember, you can sign up for these free services here. Once you enter this information, execute the cell with the run button.

Image Screenshot 2024 03 15 at 10 31 10 AM

The next step in the notebook defines the database connection. Vanna has help out MS SQL/Azure SQL users here with vn.connect_to_mssql. All that needs to be passed to this function is the ODBC string that was shown previously by running sqlcmd config connection-strings. Copy and paste the ODBC connection string into this cell of the notebook then run it.

Image Screenshot 2024 03 15 at 10 41 18 AM

Training Grounds

Moving to the training section, if the previous sections were filled out correctly, executing this cell should result in seeing all the tables in the database that the user in the connection string allows (which should be all of Adventure Works tables and views).

Image Screenshot 2024 03 15 at 10 43 03 AM

The next cell uses some training to remind the NL2SQL agent to use tables with their schema names. As before, execute this cell:

Image Screenshot 2024 03 15 at 10 55 49 AM

Free Apps!

Part of the Vanna python package is a flask app that can be used as a chat session. Execute this cell to start the flask application:

Image Screenshot 2024 03 15 at 10 59 45 AM

Once the application has started, a dialog box will appear at the bottom of the code space:

Image Screenshot 2024 03 15 at 11 00 25 AM

Click the Open in Browser button to try out the application.

Using the chat box, ask the application a question about the Adventure Work data. For example, “What are the top 10 products sold and their colors? Format any currency columns as dollars.”.

The result is the SQL query itself, a table with the data as asked:

Image Screenshot 2024 03 15 at 11 05 24 AM

as well as a nice chart!

Image Screenshot 2024 03 15 at 11 07 28 AM

If the agent doesn’t add the schema to the table name and you get an error, just give it a quick reminder in the question. Ask “What are the top 10 products sold and their colors? Format any currency columns as dollars. Use the schema and table name”

Wrap-up

Again, a huge thank you to Vanna for adding the simple, one line of code function for accessing SQL databases. It makes an already easy to use NL2SQL tool even easier (if that was possible!). The notebook sample is in the Azure SQL AI Samples repository here for anyone to try out and start using with their applications. As always, tell us how it goes, and we appreciate any feedback you may have.

Resources

0 comments

Leave a comment

Feedback usabilla icon