{"id":2290,"date":"2023-08-25T12:42:35","date_gmt":"2023-08-25T19:42:35","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/azure-sql\/?p=2290"},"modified":"2023-08-25T12:48:00","modified_gmt":"2023-08-25T19:48:00","slug":"building-your-own-db-copilot-for-azure-sql-with-azure-openai-gpt-4","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/azure-sql\/building-your-own-db-copilot-for-azure-sql-with-azure-openai-gpt-4\/","title":{"rendered":"Building your own DB Copilot for Azure\u00a0SQL with Azure OpenAI GPT-4"},"content":{"rendered":"<p><img decoding=\"async\" class=\"aligncenter\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*pMgvjfupl9VypmPDVRFnXw.png\" \/><\/p>\n<p class=\"graf graf--p\" style=\"text-align: justify;\">Large Language Models (LLMs) have proven exceptional capabilities in chatting with unstructured, text data. But what if we want to interact with structured data?<\/p>\n<p class=\"graf graf--p\" style=\"text-align: justify;\">To do so, we will need to build an LLM-powered Agent which is able to reason over structured data, in our specific case an Azure SQL DB. Luckily for us, we don\u2019t have to build this connection from scratch.<\/p>\n<p class=\"graf graf--p\" style=\"text-align: justify;\">LangChain, a lightweight library which make it easy to embed and orchestrate LLMs within applications, provides a set of toolkits to make Agents capable of connecting and execute actions over external tools. In my latest article, we\u2019ve seen how, among the tools LangChain provides out of the box, LLM-powered agents are able to connect and use Azure Cognitive Services skills. In this article, we will use the same framework to connect over an Azure SQL DB where we will upload the Titanic dataset (you can download it from Kaggle <a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/www.kaggle.com\/c\/titanic\/data\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/www.kaggle.com\/c\/titanic\/data\">here<\/a>).<\/p>\n<h4 class=\"graf graf--h4\">Loading Data Into an Azure\u00a0SQL<\/h4>\n<p class=\"graf graf--p\" style=\"text-align: justify;\">The first thing we need is our structured DB. For this purpose, I will be using an Azure SQL Database, a fully managed DB engine running on Azure. To create your Azure SQL DB, you can follow the tutorial <a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/azure-sql\/database\/single-database-create-quickstart?view=azuresql&amp;tabs=azure-portal\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/azure-sql\/database\/single-database-create-quickstart?view=azuresql&amp;tabs=azure-portal\">here<\/a>.<\/p>\n<p class=\"graf graf--p\" style=\"text-align: justify;\">Once we have a DB up and running, we need to populate it with our data. To do so, I will use Python to connect to my DB via ODBC and to push data into it. Thus, we first need to gather our Azure SQL DB credentials and set them as environmental variables:<\/p>\n<pre class=\"prettyprint language-py\"><code class=\"language-py\">import os  \r\n  \r\nos.environ[\"SQL_SERVER_USERNAME\"] = \"username\" \r\nos.environ[\"SQL_SERVER_ENDPOINT\"] = \"endpoint\"\r\nos.environ[\"SQL_SERVER_PASSWORD\"] = \"password\"  \r\nos.environ[\"SQL_SERVER_DATABASE\"] = \"database\"<\/code><\/pre>\n<p class=\"graf graf--p\">You can retrieve those information in the Azure SQL DB pane within the Azure Portal, under the tab \u201cconnection\u201d:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*MVCd24QcFoR5X2QLAjGjzg.png\" \/><\/p>\n<p class=\"graf graf--p\">Next, let\u2019s save into a pandas dataframe our csv file (which I\u2019ve saved into the folder \u2018data\u2019):<\/p>\n<pre class=\"prettyprint language-py\"><code class=\"language-py\">import pandas as pd\r\ntitanic = pd.read_csv(\"data\/titanic.csv\").fillna(value=0)<\/code><\/pre>\n<p class=\"graf graf--p\">Finally, let\u2019s insert our dataframe into the Azure SQL DB using the pyodbc library for connectivity (using the same variables as before).<\/p>\n<pre class=\"prettyprint language-py\"><code class=\"language-py\"># Create dataframe from csv file\r\n\r\ncnxn = pyodbc.connect(f'DRIVER={driver};SERVER={server};PORT=1433;DATABASE={database};UID={username};PWD={password}')  \r\ncursor = cnxn.cursor()  \r\n\r\n# Define the create table statement\r\ncreate_table_query = \"\"\"\r\nCREATE TABLE Titanic (\r\n    PassengerId int PRIMARY KEY,\r\n    Survived bit NOT NULL,\r\n    Pclass int NOT NULL,\r\n    Name varchar(100) NOT NULL,\r\n    Sex varchar(10) NOT NULL,\r\n    Age float NULL,\r\n    SibSp int NOT NULL,\r\n    Parch int NOT NULL,\r\n    Ticket varchar(20) NOT NULL,\r\n    Fare float NOT NULL,\r\n    Cabin varchar(20) NULL,\r\n    Embarked char(1) NULL\r\n);\r\n\"\"\"\r\n\r\n# Execute the statement using cursor.execute\r\ncursor.execute(create_table_query)\r\n\r\nfor index, row in titanic.iterrows():\r\n     cursor.execute(\r\n    \"INSERT INTO dbo.[Titanic] (PassengerId, Survived, Pclass, Name, Sex, Age, SibSp, Parch, Ticket, Fare, Cabin, Embarked) values(?,?,?,?,?,?,?,?,?,?,?,?)\",\r\n    row.PassengerId,\r\n    row.Survived,\r\n    row.Pclass,\r\n    row.Name,\r\n    row.Sex,\r\n    row.Age,\r\n    row.SibSp,\r\n    row.Parch,\r\n    row.Ticket,\r\n    row.Fare,\r\n    row.Cabin,\r\n    row.Embarked\r\n)\r\ncnxn.commit()\r\ncursor.close()<\/code><\/pre>\n<p class=\"graf graf--p\">Let\u2019s see whether our dataframe has been uploaded correctly:<\/p>\n<pre class=\"prettyprint language-py\"><code class=\"language-py\">cnxn = pyodbc.connect(f'DRIVER={driver};SERVER={server};PORT=1433;DATABASE={database};UID={username};PWD={password}')  \r\ncursor = cnxn.cursor()  \r\nprint(cursor.execute(\"select * from dbo.[Titanic]\"))\r\nresults = cursor.fetchall()\r\n\r\nfor row in results:\r\n    print(row)<\/code><\/pre>\n<p><img decoding=\"async\" src=\"https:\/\/cdn-images-1.medium.com\/max\/800\/1*r6APc-pqgVdJapXlQpK2qA.png\" \/><\/p>\n<p class=\"graf graf--p\">Nice! Now we can go ahead and initialize our agent with the SQL connection toolkit.<\/p>\n<h4 class=\"graf graf--h4\">Initializing the LangChain Agent with SQL\u00a0toolkit<\/h4>\n<p class=\"graf graf--p\">To create our LangChain Agent, we need three ingredients:<\/p>\n<ul class=\"postList\">\n<li class=\"graf graf--li\">A Database object. To do so, we will use some functions from the <a class=\"markup--anchor markup--li-anchor\" href=\"https:\/\/pypi.org\/project\/SQLAlchemy\/\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/pypi.org\/project\/SQLAlchemy\/\">SQLAchemy<\/a> library. We will also re-use the environmental variables initialized in the previous section.<\/li>\n<\/ul>\n<pre class=\"prettyprint language-py\"><code class=\"language-py\">from sqlalchemy import create_engine\r\nfrom sqlalchemy.engine.url import URL\r\nfrom langchain.sql_database import SQLDatabase\r\n\r\ndb_config = {  \r\n    'drivername': 'mssql+pyodbc',  \r\n    'username': os.environ[\"SQL_SERVER_USERNAME\"] + '@' + os.environ[\"SQL_SERVER_ENDPOINT\"],  \r\n    'password': os.environ[\"SQL_SERVER_PASSWORD\"],  \r\n    'host': os.environ[\"SQL_SERVER_ENDPOINT\"],  \r\n    'port': 1433,  \r\n    'database': os.environ[\"SQL_SERVER_DATABASE\"],  \r\n    'query': {'driver': 'ODBC Driver 18 for SQL Server'}  \r\n}  \r\n\r\ndb_url = URL.create(**db_config)\r\ndb = SQLDatabase.from_uri(db_url)<\/code><\/pre>\n<ul class=\"postList\">\n<li class=\"graf graf--li\">A Large Language Model. To do so, we will use Azure OpenAI GPT-4 (you can retrieve your secrets under the tab \u201cKeys and Endpoints\u201d of your Azure OpenAI instance).<\/li>\n<\/ul>\n<pre class=\"prettyprint language-py\"><code class=\"language-py\">from langchain.chat_models import AzureChatOpenAI\r\n\r\n#setting Azure OpenAI env variables\r\n\r\nos.environ[\"OPENAI_API_TYPE\"] = \"azure\"\r\nos.environ[\"OPENAI_API_VERSION\"] = \"2023-03-15-preview\"\r\nos.environ[\"OPENAI_API_BASE\"] = \"xxx\"\r\nos.environ[\"OPENAI_API_KEY\"] = \"xxx\"\r\n\r\nllm = AzureChatOpenAI(deployment_name=\"gpt-4\", temperature=0, max_tokens=4000)<\/code><\/pre>\n<ul class=\"postList\">\n<li class=\"graf graf--li\">A SQL toolkit so that our Agent can chat with our Azure SQL DB instance:<\/li>\n<\/ul>\n<pre class=\"prettyprint language-py\"><code class=\"language-py\">from langchain.agents.agent_toolkits import SQLDatabaseToolkit\r\nfrom langchain.sql_database import SQLDatabase\r\n\r\ntoolkit = SQLDatabaseToolkit(db=db, llm=llm)<\/code><\/pre>\n<p class=\"graf graf--p\">Now that we have all the elements, we can initialize our Agent as follows:<\/p>\n<pre class=\"prettyprint language-py\"><code class=\"language-py\">from langchain.agents import AgentExecutor\r\nfrom langchain.agents.agent_types import AgentType\r\n\r\nagent_executor = create_sql_agent(\r\n    llm=llm,\r\n    toolkit=toolkit,\r\n    verbose=True,\r\n    agent_type=AgentType.ZERO_SHOT_REACT_DESCRIPTION,\r\n)<\/code><\/pre>\n<h4 class=\"graf graf--h4\">Start chatting with your\u00a0data<\/h4>\n<p class=\"graf graf--p\">To start chatting with our Azure SQL DB, we can directly input our question into the agent we previously initialized. Let\u2019s start with something simple:<\/p>\n<pre class=\"prettyprint language-py\"><code class=\"language-py\">agent_executor.run(\"how many rows are there in the titanic table?\")<\/code><\/pre>\n<p class=\"graf graf--p\">Below the response:<\/p>\n<pre class=\"prettyprint language-default\"><code class=\"language-default\">&gt; Entering new AgentExecutor chain...\r\nAction: list_tables_sql_db\r\nAction Input: \r\nObservation: covidtracking, BuildVersion, ErrorLog, Titanic\r\nThought:I see the Titanic table in the list of tables. I will now write a query to count the number of rows in the Titanic table.\r\nAction: query_checker_sql_db\r\nAction Input: SELECT COUNT(*) FROM Titanic\r\nObservation: SELECT COUNT(*) FROM Titanic\r\nThought:The query is correct. I will now execute the query to get the number of rows in the Titanic table.\r\nAction: query_sql_db\r\nAction Input: SELECT COUNT(*) FROM Titanic\r\nObservation: [(891,)]\r\nThought:I now know the final answer\r\nFinal Answer: There are 891 rows in the Titanic table.\r\n\r\n&gt; Finished chain.\r\n'There are 891 rows in the Titanic table.'<\/code><\/pre>\n<p class=\"graf graf--p\">Ok, that one was easy\u2026let\u2019s now start challenging our model a bit more:<\/p>\n<pre class=\"prettyprint language-py\"><code class=\"language-py\">agent_executor.run(\"what is the name of the oldest survivor of titanic?\")<\/code><\/pre>\n<p>Output:<\/p>\n<pre class=\"prettyprint language-default\"><code class=\"language-default\">&gt; Entering new AgentExecutor chain...\r\nAction: list_tables_sql_db\r\nAction Input: \r\nObservation: covidtracking, BuildVersion, ErrorLog, Titanic\r\nThought:I should check the schema of the Titanic table to see what columns are available.\r\nAction: schema_sql_db\r\nAction Input: Titanic\r\nObservation: \r\nCREATE TABLE [Titanic] (\r\n [PassengerId] INTEGER NOT NULL, \r\n [Survived] BIT NOT NULL, \r\n [Pclass] INTEGER NOT NULL, \r\n [Name] VARCHAR(100) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL, \r\n [Sex] VARCHAR(10) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL, \r\n [Age] FLOAT(53) NULL, \r\n [SibSp] INTEGER NOT NULL, \r\n [Parch] INTEGER NOT NULL, \r\n [Ticket] VARCHAR(20) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL, \r\n [Fare] FLOAT(53) NOT NULL, \r\n [Cabin] VARCHAR(20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, \r\n [Embarked] CHAR(1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, \r\n CONSTRAINT [PK__Titanic__88915FB0203E0CED] PRIMARY KEY ([PassengerId])\r\n)\r\n...\r\n\r\nSELECT TOP 1 Name, Age FROM Titanic WHERE Survived = 1 ORDER BY Age DESC\r\nThought:The query is correct. Now I will execute it to find the oldest survivor of the Titanic.\r\nAction: query_sql_db\r\nAction Input: SELECT TOP 1 Name, Age FROM Titanic WHERE Survived = 1 ORDER BY Age DESC\r\nOutput is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...\r\nRetrying langchain.chat_models.openai.ChatOpenAI.completion_with_retry.&lt;locals&gt;._completion_with_retry in 1.0 seconds as it raised RateLimitError: Requests to the Creates a completion for the chat message Operation under Azure OpenAI API version 2023-03-15-preview have exceeded token rate limit of your current OpenAI S0 pricing tier. Please retry after 12 seconds. Please go here: https:\/\/aka.ms\/oai\/quotaincrease if you would like to further increase the default rate limit..\r\n\r\nObservation: [('Barkworth, Mr. Algernon Henry Wilson', 80.0)]\r\n\r\nI now know the final answer.\r\nFinal Answer: The oldest survivor of the Titanic is Barkworth, Mr. Algernon Henry Wilson, who was 80 years old.\r\n&gt; Finished chain.\r\n\r\n'The oldest survivor of the Titanic is Barkworth, Mr. Algernon Henry Wilson, who was 80 years old.'<\/code><\/pre>\n<p class=\"graf graf--p\">Let\u2019s go with a final question:<\/p>\n<pre class=\"prettyprint language-py\"><code class=\"language-py\">agent_executor.run(\"which kind of machine learning model should I use to predict the likelihood of survival in titanic?\")<\/code><\/pre>\n<p>Output:<\/p>\n<pre class=\"prettyprint language-default\"><code class=\"language-default\">&gt; Entering new AgentExecutor chain...\r\nAction: list_tables_sql_db\r\nAction Input: \r\nObservation: covidtracking, BuildVersion, ErrorLog, Titanic\r\nThought:I see a Titanic table, I should check its schema to see if it has relevant information.\r\nAction: schema_sql_db\r\nAction Input: Titanic\r\nObservation: \r\nCREATE TABLE [Titanic] (\r\n [PassengerId] INTEGER NOT NULL, \r\n [Survived] BIT NOT NULL, \r\n [Pclass] INTEGER NOT NULL, \r\n [Name] VARCHAR(100) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL, \r\n [Sex] VARCHAR(10) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL, \r\n [Age] FLOAT(53) NULL, \r\n [SibSp] INTEGER NOT NULL, \r\n [Parch] INTEGER NOT NULL, \r\n [Ticket] VARCHAR(20) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL, \r\n [Fare] FLOAT(53) NOT NULL, \r\n [Cabin] VARCHAR(20) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, \r\n [Embarked] CHAR(1) COLLATE SQL_Latin1_General_CP1_CI_AS NULL, \r\n CONSTRAINT [PK__Titanic__88915FB0203E0CED] PRIMARY KEY ([PassengerId])\r\n)\r\n...\r\nThought:I now know the final answer\r\nFinal Answer: I can't directly recommend a specific machine learning model, but you can use the Titanic dataset to train a model. The dataset includes features such as Pclass, Sex, Age, SibSp, Parch, Fare, and Embarked, along with the survival status of each passenger. You can try various classification models like logistic regression, decision trees, or random forests to predict the likelihood of survival.\r\n\r\n&gt; Finished chain.\r\n\r\n\"I can't directly recommend a specific machine learning model, but you can use the Titanic dataset to train a model. The dataset includes features such as Pclass, Sex, Age, SibSp, Parch, Fare, and Embarked, along with the survival status of each passenger. You can try various classification models like logistic regression, decision trees, or random forests to predict the likelihood of survival.\"<\/code><\/pre>\n<p class=\"graf graf--p\" style=\"text-align: justify;\">Well, even if our Agent doesn\u2019t want to take position\u2026.it was yet able to define the problem (binary classification) and suggest some models to address it (decision tree, logistic regression, random forests).<\/p>\n<h4 class=\"graf graf--h4\">Conclusion<\/h4>\n<p class=\"graf graf--p\" style=\"text-align: justify;\">LLMs are paving the ways to a new way of software development, with the introduction of the concept of Copilot. Thanks to the analytical capabilities of the GPT-4, we were able to directly chat with our SQL DB using a natural language interface. Now imagine this concept applied to your whole data estate\u200a\u2014\u200aboth structured and unstructured: it\u2019s really a game changer in the way we interact with and extract value from data.<\/p>\n<h4 class=\"graf graf--h4\">References<\/h4>\n<ul class=\"postList\">\n<li class=\"graf graf--li\"><a class=\"markup--anchor markup--li-anchor\" href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/azure-sql\/database\/single-database-create-quickstart?view=azuresql&amp;tabs=azure-portal\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/azure-sql\/database\/single-database-create-quickstart?view=azuresql&amp;tabs=azure-portal\">Create a single database\u200a\u2014\u200aAzure SQL Database | Microsoft Learn<\/a><\/li>\n<li class=\"graf graf--li\"><a class=\"markup--anchor markup--li-anchor\" href=\"https:\/\/pypi.org\/project\/SQLAlchemy\/\" target=\"_blank\" rel=\"nofollow noopener\" data-href=\"https:\/\/pypi.org\/project\/SQLAlchemy\/\">https:\/\/pypi.org\/project\/SQLAlchemy\/<\/a><\/li>\n<li class=\"graf graf--li\"><a class=\"markup--anchor markup--li-anchor\" href=\"https:\/\/azure.microsoft.com\/en-us\/products\/ai-services\/openai-service\" target=\"_blank\" rel=\"nofollow noopener\" data-href=\"https:\/\/azure.microsoft.com\/en-us\/products\/ai-services\/openai-service\">https:\/\/azure.microsoft.com\/en-us\/products\/ai-services\/openai-service<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Large Language Models (LLMs) have proven exceptional capabilities in chatting with unstructured, text data. But what if we want to interact with structured data? To do so, we will need to build an LLM-powered Agent which is able to reason over structured data, in our specific case an Azure SQL DB. Luckily for us, we [&hellip;]<\/p>\n","protected":false},"author":115964,"featured_media":81,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[579,407],"class_list":["post-2290","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-azure-sql","tag-langchain","tag-python"],"acf":[],"blog_post_summary":"<p>Large Language Models (LLMs) have proven exceptional capabilities in chatting with unstructured, text data. But what if we want to interact with structured data? To do so, we will need to build an LLM-powered Agent which is able to reason over structured data, in our specific case an Azure SQL DB. Luckily for us, we [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/posts\/2290","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/users\/115964"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/comments?post=2290"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/posts\/2290\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/media\/81"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/media?parent=2290"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/categories?post=2290"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/tags?post=2290"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}