{"id":3705,"date":"2024-10-22T11:18:14","date_gmt":"2024-10-22T18:18:14","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/azure-sql\/?p=3705"},"modified":"2024-11-08T09:18:15","modified_gmt":"2024-11-08T17:18:15","slug":"soccer-analytics-copilot-with-azure-sql-and-openai","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/azure-sql\/soccer-analytics-copilot-with-azure-sql-and-openai\/","title":{"rendered":"Soccer Analytics Copilot with Azure SQL and OpenAI"},"content":{"rendered":"<p><div class=\"alert alert-info\"><p class=\"alert-divider\"><i class=\"fabric-icon fabric-icon--Info\"><\/i><strong>Guest Post<\/strong><\/p>Eladio Rinc\u00f3n is a seasoned data technology professional, currently serving as Data &amp; Cloud Director at Verne Tech. He specializes in SQL Server, data analysis, cloud solutions, and focuses on optimizing database performance, high-availability solutions, and cloud migrations. You can find more about him on LinkedIn:\u00a0<a href=\"https:\/\/www.linkedin.com\/in\/erincon\">https:\/\/www.linkedin.com\/in\/erincon<\/a>. Thanks Eladio!\u00a0<\/div><\/p>\n<p>The <a href=\"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2024\/10\/Screenshot-2024-10-22-083906.png\">Football (aka Soccer in US \ud83d\ude00)<\/a> Analisys Copilot provides an intuitive interface for users to interact with complex football data without needing advanced technical skills. By utilizing natural language processing, users can ask questions and retrieve detailed insights from vast datasets, including competitions, matches, teams, players, and events. This makes data exploration accessible to analysts, coaches, and fans who may not be familiar with coding or database queries, enabling them to gain valuable information through simple conversations with the chatbot.<\/p>\n<p>Through this interactive platform, users can perform a wide range of analyses, such as querying specific match events, examining team performance, analyzing player statistics, and exploring tactical dynamics. The chatbot employs a Retrieval Augmented Generation (RAG) pattern, which searches the database for relevant embeddings based on the user&#8217;s query, retrieves pertinent information, and generates coherent summaries using OpenAI&#8217;s models. This process allows for advanced data analysis and real-time feedback, supporting decision-making and providing deeper insights into football analytics.<\/p>\n<p><div  class=\"d-flex justify-content-left\"><a class=\"cta_button_link btn-primary mb-24\" href=\"https:\/\/github.com\/erincon01\/RAG-Challenge\/tree\/main\" target=\"_blank\">Get the code<\/a><\/div><\/p>\n<p>Currently, the chatbot is designed to analyze game actions within football matches, offering insights based on in-game events. Users can explore detailed aspects such as the number of successful passes each team completed during specific time frames, identify players with the highest number of carries and passes, or examine defensive actions like tackles and blocks executed by teams. For instance, questions like &#8220;Count the successful passes each team completed in the first half (minutes 41-47) and the second half (minutes 45-50), and what impact did these passes have on their game performance?&#8221; or &#8220;Which players recorded the highest number of carries and passes in both halves, and how did their performances influence the overall strategies of the teams?&#8221; are supported. This focused approach allows users to delve into specific match dynamics and player performances based on the event data available.<\/p>\n<h2>Data Source<\/h2>\n<p>The StatsBomb Open Data offers comprehensive football match information in JSON format across key tables: Competitions (identifying competitions and seasons), Matches (detailed match data including teams, scores, and metadata), Players (player details and lineups), Teams (team information and management details), and Events (in-game actions with attributes like event type, timestamp, and location). For this POC Events, most of the data has been loaded, but only game results game events has been used.<\/p>\n<p>This is a fragment of the sample data where player id <code>4354<\/code>, <code>Phil Foden<\/code>, received a <code>ground pass<\/code> with his <code>right foot<\/code>:<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2024\/10\/image-sample-data.png\"><img decoding=\"async\" class=\"size-full wp-image-3707 aligncenter\" src=\"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2024\/10\/image-sample-data.png\" alt=\"Image image sample data\" width=\"682\" height=\"611\" srcset=\"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2024\/10\/image-sample-data.png 682w, https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2024\/10\/image-sample-data-300x269.png 300w\" sizes=\"(max-width: 682px) 100vw, 682px\" \/><\/a><\/p>\n<h2>Data loading and processing architecture<\/h2>\n<p>The diagram shows the technological architecture used for this POC:<\/p>\n<ul>\n<li>Open data from Statsbomb &#8211; Foolball data provider &#8211; hosted in Github.<\/li>\n<li>Python application.<\/li>\n<li>Azure SQL Database.<\/li>\n<li>Azure OpenAI Service.<\/li>\n<\/ul>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2024\/10\/image-arq-01.png\"><img decoding=\"async\" class=\"size-full wp-image-3706 aligncenter\" src=\"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2024\/10\/image-arq-01.png\" alt=\"Image image arq 01\" width=\"889\" height=\"493\" srcset=\"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2024\/10\/image-arq-01.png 889w, https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2024\/10\/image-arq-01-300x166.png 300w, https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2024\/10\/image-arq-01-768x426.png 768w\" sizes=\"(max-width: 889px) 100vw, 889px\" \/><\/a><\/p>\n<p>These are the steps to load the data into Azure SQL Database:<\/p>\n<ol>\n<li>Download the JSON data from GitHub to local.<\/li>\n<li>Insert the data into Azure SQL Database.<\/li>\n<li>Convert the JSON data in 15-secs chunks to prose description using the SQL Server <code>get_chat_completion<\/code> stored procedure described below.<\/li>\n<li>Convert the 15-secs prose description to vector embeddings using the SQL Server <code>get_embeddings<\/code> described below.<\/li>\n<li>Steps 1, and 2 are detailed in the repo: <a href=\"https:\/\/github.com\/erincon01\/rag-challenge\">https:\/\/github.com\/erincon01\/rag-challenge<\/a><\/li>\n<\/ol>\n<p>To avoid sharing api-keys in stored procedures I suggest using <code>database credentials<\/code> following this diagram:<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2024\/10\/image-sec-arq.png\"><img decoding=\"async\" class=\"aligncenter wp-image-3713 size-large\" src=\"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2024\/10\/image-sec-arq-1024x486.png\" alt=\"Image image sec arq\" width=\"1024\" height=\"486\" srcset=\"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2024\/10\/image-sec-arq-1024x486.png 1024w, https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2024\/10\/image-sec-arq-300x142.png 300w, https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2024\/10\/image-sec-arq-768x364.png 768w, https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2024\/10\/image-sec-arq.png 1503w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/a><\/p>\n<p>By doing this, you can call the <code>sp_invoke_external_rest_endpoint<\/code> stored procedure referencing the credential instead of putting the api-key in flat text.<\/p>\n<p>Lets get focus on the SQL Azure Database section:<\/p>\n<div>\n<h3>Stored procedure to convert the JSON data to prose summary<\/h3>\n<p>Stored procedure definition, that get as input parameters the data parameters to configure the call to Azure OpenAI, and returns as <code>OUTPUT<\/code> the result of the call. Additionally, includes a retry policy for throttling scenarios:<\/p>\n<pre class=\"prettyprint language-sql\"><code class=\"language-sql\">CREATE PROCEDURE [dbo].[get_chat_completion]\r\n(\r\n    @model VARCHAR(MAX),\r\n    @system_message NVARCHAR(MAX),\r\n    @user_prompt NVARCHAR(MAX),\r\n    @temperature FLOAT,\r\n    @max_tokens INT,\r\n    @max_attempts INT = 3,\r\n    @chat_completion NVARCHAR(MAX) OUTPUT\r\n)<\/code><\/pre>\n<p>Section where the payload, and the Azure OpenAI call is prepared:<\/p>\n<pre class=\"prettyprint language-sql\"><code class=\"language-sql\">SET @url = N'https:\/\/&lt;endpoint&gt;.openai.azure.com\/openai\/deployments\/' + @model + '\/chat\/completions?api-version=2023-05-15';\r\n\r\nSET @payload = JSON_OBJECT('messages': JSON_ARRAY(\r\n    JSON_OBJECT('role': 'system', 'content': @system_message),\r\n    JSON_OBJECT('role': 'user', 'content': @user_prompt)),\r\n    'temperature': @temperature,\r\n    'max_tokens': @max_tokens);<\/code><\/pre>\n<p>And the call to `sp_invoke_external_rest_endpoint` and return message processing:<\/p>\n<pre class=\"prettyprint language-sql\"><code class=\"language-sql\">EXEC dbo.sp_invoke_external_rest_endpoint\r\n    @url = @url,\r\n    @method = 'POST',   \r\n    @payload = @payload,   \r\n    @credential = 'https:\/\/&lt;endpoint&gt;.openai.azure.com',\r\n    @response = @response OUTPUT;\r\n\r\nSET @chat_completion = JSON_VALUE(@response, '$.result.choices[0].message.content');<\/code><\/pre>\n<p>If Azure OpenAI response includes a 429 error, means that you are reaching the limits of your model. We have implemented logic to retry-after the seconds the error suggests.\nadditionally, a 3 retries logic is implemented for cases where the Azure OpenAI model is used concurrently by other user.<\/p>\n<pre class=\"prettyprint language-sql\"><code class=\"language-sql\">IF JSON_VALUE(@response, '$.result.error.code') = '429'\r\nBEGIN\r\n    DECLARE @retryAfter NVARCHAR(10);\r\n    SET @retryAfter = JSON_VALUE(@response, '$.response.headers.\"Retry-After\"');\r\n    -- Declare a variable to store the Retry-After value as an integer\r\n    -- Convert the Retry-After value to an integer and add 1 second\r\n    DECLARE @retryAfterSeconds INT;\r\n    SET @retryAfterSeconds = CAST(@retryAfter AS INT) + 1;\r\n\r\n    -- Calculate hours, minutes, and seconds for the HH:MM:SS format\r\n    DECLARE @hours NVARCHAR(2), @minutes NVARCHAR(2), @seconds NVARCHAR(2);\r\n\r\n    SET @hours = RIGHT('0' + CAST(@retryAfterSeconds \/ 3600 AS NVARCHAR), 2);\r\n    SET @minutes = RIGHT('0' + CAST((@retryAfterSeconds % 3600) \/ 60 AS NVARCHAR), 2);\r\n    SET @seconds = RIGHT('0' + CAST(@retryAfterSeconds % 60 AS NVARCHAR), 2);\r\n\r\n    DECLARE @delay NVARCHAR(8);\r\n    SET @delay = @hours + ':' + @minutes + ':' + @seconds;\r\n\r\n    PRINT 'Rate limit error detected. Retry-After: ' + @delay + '. Attempt ' + CAST(@attempts AS NVARCHAR(10)) + '.';                    \r\n    WAITFOR DELAY @delay;\r\nEND<\/code><\/pre>\n<p>The complete stored procedure code can be downloaded from the repo.<\/p>\n<h3>Stored procedure to convert text summary to vector embedding<\/h3>\n<p>In order to convert the prose summary to vector embedding, the <code>sp_invoke_external_rest_endpoint<\/code> is used too.<\/p>\n<p>This is the code we have used:<\/p>\n<pre class=\"prettyprint language-sql\"><code class=\"language-sql\">CREATE PROCEDURE [dbo].[get_embeddings]\r\n(\r\n    @model VARCHAR(MAX),\r\n    @text NVARCHAR(MAX),\r\n    @embedding VECTOR(1536) OUTPUT\r\n)\r\nAS\r\nBEGIN\r\n\r\n    SET NOCOUNT ON;\r\n\r\n    DECLARE @retval INT, @response NVARCHAR(MAX);\r\n    DECLARE @url VARCHAR(MAX);\r\n    DECLARE @payload NVARCHAR(MAX) = JSON_OBJECT('input': @text, 'dimension': 1536);\r\n\r\n    -- Set the @url variable with proper concatenation before the EXEC statement\r\n    SET @url = 'https:\/\/&lt;endpoint&gt;.openai.azure.com\/openai\/deployments\/' + @model + '\/embeddings?api-version=2023-03-15-preview';\r\n\r\n    EXEC dbo.sp_invoke_external_rest_endpoint \r\n        @url = @url,\r\n        @method = 'POST',   \r\n        @payload = @payload,   \r\n        @credential = [https:\/\/&lt;endpoint&gt;.openai.azure.com],\r\n        -- @headers = '{\"Content-Type\":\"application\/json\", \"api-key\":\"&lt;api-key\"}', \r\n        @response = @response OUTPUT;\r\n\r\n    -- Use JSON_QUERY to extract the embedding array directly\r\n    DECLARE @jsonArray NVARCHAR(MAX) = JSON_QUERY(@response, '$.result.data[0].embedding');\r\n    \r\n    SET @embedding = CAST(@jsonArray as VECTOR(1536));\r\nEND<\/code><\/pre>\n<p>Note that we are using the `database credential` created earlier. The `@header` argument is commented in case you need to use flat api-key.<\/p>\n<h3>Updating the summary and the embeddings columns in the tables<\/h3>\n<p>In order to update the <code>events_details__15secs_agg<\/code> table, we have created a very simple stored procedure, that for a given <code>match_id<\/code>, all the prose details are calculated:<\/p>\n<p>This is the definition:<\/p>\n<pre class=\"prettyprint language-sql\"><code class=\"language-sql\">CREATE PROCEDURE dbo.add_summary_to_json_in_events_details__15secs_agg\r\n    @system_message NVARCHAR(MAX),\r\n    @model VARCHAR(MAX),\r\n    @temperature FLOAT,\r\n    @max_tokens INT,\r\n    @max_attempts INT = 3,\r\n    @match_id INT = NULL<\/code><\/pre>\n<p>That iterates for each row of the match; if a game has 90 minutes, there would be aprox 90 * 4 = 360 rows per match, having this code, as core section:<\/p>\n<pre class=\"prettyprint language-sql\"><code class=\"language-sql\">-- convert json to summary\/script\r\nEXEC dbo.get_chat_completion @model = @model, @system_message = @system_message, @user_prompt = @json, \r\n    @temperature = @temperature, @max_tokens = @max_tokens,\r\n    @max_attempts = @max_attempts,  @chat_completion = @summary OUTPUT;\r\n\r\n-- calculate embeddings\r\nEXEC dbo.get_embeddings @model = 'text-embedding-ada-002', @text = @summary, @embedding = @embedding_ada_002 OUTPUT;\r\nEXEC dbo.get_embeddings @model = 'text-embedding-3-small', @text = @summary, @embedding = @embedding_3_small OUTPUT;\r\n\r\nIF (NOT @summary IS NULL AND @summary &lt;&gt; '')\r\nBEGIN\r\n    UPDATE events_details__15secs_agg\r\n    SET\r\n        summary = @summary,\r\n        embedding_ada_002 = @embedding_ada_002,\r\n        embedding_3_small = @embedding_3_small\r\n    WHERE\r\n        match_id = @_match_id AND\r\n        period = @period AND\r\n        minute = @minute AND\r\n        _15secs = @_15secs;\r\nEND<\/code><\/pre>\n<p>Note that for this POC, for testing purposes, we have calculated embeddings for two models: <code>text-embedding-ada-002<\/code>, and <code>text-embedding-3-small<\/code>.<\/p>\n<h3>Querying the data<\/h3>\n<p>Data is organized around the matches table. For a give match, there are multiple rows in lineups, players, events, events_details, and events_details__xxx_agg tables:<\/p>\n<ul>\n<li>Matches table includes teams involved, referee data, competition name, season name, date of the game, each team trainer, and the result.<\/li>\n<li>In football, the lineup are 11 players each team excluding substitutions.<\/li>\n<li>The game has multiple events, likes passes, shots on goal, goalkeeper saves, faults, goals, penalties, etc.<\/li>\n<li>Events_details tabla includes all the actions that hapenned in a given game.<\/li>\n<li>The For this POC, we have tested 2 splits:\n<ul>\n<li>In the minute aggregations, events_details are split per minute.<\/li>\n<li>In the 15-secs aggregations, events_details are split in 4 groups per minute.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h3>Vector search pattern in SQL Azure Databases<\/h3>\n<p>SQL Azure databases include native functions for vector search for Euclidean (L2), cosine, and negative inner product: <a href=\"https:\/\/devblogs.microsoft.com\/azure-sql\/announcing-eap-native-vector-support-in-azure-sql-database\/#vector_distance\">https:\/\/devblogs.microsoft.com\/azure-sql\/announcing-eap-native-vector-support-in-azure-sql-database\/#vector_distance<\/a><\/p>\n<pre class=\"prettyprint language-sql\"><code class=\"language-sql\">DECLARE @v1 VARBINARY(8000) = JSON_ARRAY_TO_VECTOR('[1,1]')\r\nDECLARE @v2 VARBINARY(8000) = JSON_ARRAY_TO_VECTOR('[-1,-1]')\r\n\r\nSELECT \r\n    VECTOR_DISTANCE('euclidean', @v1, @v2) AS euclidean,\r\n    VECTOR_DISTANCE('cosine', @v1, @v2) AS cosine,\r\n    VECTOR_DISTANCE('dot', @v1, @v2) AS negative_dot_product;\r\n<\/code><\/pre>\n<p>Using the <code>text-embedding-3-small<\/code> embedding model, the query pattern for the three types of vector searchs would be like this:<\/p>\n<pre class=\"prettyprint language-sql\"><code class=\"language-sql\">DECLARE @e_3small VECTOR(1536);\r\nDECLARE @string nvarchar(max) = \r\n    'Which players recorded the highest number of carries and passes in both halves, \r\n    and how did their performances influence the overall strategies of the teams?'\r\n\r\nEXEC dbo.get_embeddings @model = 'text-embedding-3-small', @text = @string, @embedding = @e_3small OUTPUT;\r\n\r\nSELECT TOP (10) \r\n    VECTOR_DISTANCE('cosine', @e_3small, embedding_3_small) AS distance_cosine_3_small, *\r\nfrom events_details__15secs_agg where match_id = 3943043 \r\nORDER BY 1;\r\n\r\nSELECT TOP (10) \r\n    VECTOR_DISTANCE('euclidean', @e_3small, embedding_3_small) AS distance_euclidean_3_small, *\r\nfrom events_details__15secs_agg where match_id = 3943043 \r\nORDER BY 1;\r\n\r\nSELECT TOP (10) \r\n    VECTOR_DISTANCE('dot', @e_3small, embedding_3_small) AS distance_dot_3_small, *\r\nfrom events_details__15secs_agg\r\nwhere match_id = 3943043 \r\nORDER BY 1 desc;<\/code><\/pre>\n<p>If you need to encapsulate all three vector search in a single pattern, the result would be like this:<\/p>\n<pre class=\"prettyprint language-sql\"><code class=\"language-sql\">DECLARE @e VECTOR(1536);\r\nDECLARE @string nvarchar(max) = '&lt;search_string&gt;'\r\n\r\nEXEC dbo.get_embeddings @model = '&lt;embedding_model&gt;', @text = @string, @embedding = @e OUTPUT;\r\n\r\nSELECT TOP (&lt;top_n&gt;) \r\n    VECTOR_DISTANCE('&lt;search_type&gt;', @e, &lt;embedding_column_name&gt;) AS distance, *\r\nfrom &lt;table&gt; \r\nORDER BY 1 &lt;ASC|DESC&gt;;<\/code><\/pre>\n<p>The tricky part is the <code>ORDER BY<\/code> clause that can be addressed with this logic in Python:<\/p>\n<pre class=\"prettyprint language-py\"><code class=\"language-py\">k_search = \"#\"\r\nk_search_sqlazure = \"cosine\"\r\norder_by = \" ASC\"\r\nif search_type.lower() == \"cosine\":\r\n    k_search = \"=\"\r\n    k_search_sqlazure = \"cosine\"\r\n    order_by = \" ASC\"\r\nif search_type.lower() == \"innerp\":\r\n    k_search = \"#\"\r\n    k_search_sqlazure = \"dot\"\r\n    order_by = \" DESC\"\r\nif search_type.lower() == \"l1\":\r\n    k_search = \"+\"\r\n    k_search_sqlazure = \"euclidean\"\r\n    order_by = \" ASC\"\r\nif search_type.lower() == \"l2\":\r\n    k_search = \"-\"<\/code><\/pre>\n<p>The Transact-SQL query coded in Python would look like this:<\/p>\n<pre class=\"prettyprint language-py\"><code class=\"language-py\">if source == \"azure-sql\":\r\n    table_name = \"events_details__15secs_agg\"\r\n    query = f\"\"\"\r\n        DECLARE @e VECTOR(1536);\r\n        EXEC dbo.get_embeddings @model = '{model_name}', @text = '{search_term}', @embedding = @e OUTPUT;\r\n\r\n        SELECT TOP {top_n} id, {summary}\r\n            , VECTOR_DISTANCE ('{k_search_sqlazure}', @e, {column_name_sql}) AS Distance\r\n        FROM {table_name} \r\n        WHERE match_id = {match_id}\r\n        ORDER BY Distance {order_by};\r\n    \"\"\"\r\n\r\ndf1 = pd.read_sql(query, conn)\r\nrowCount = df1.shape[0]<\/code><\/pre>\n<h3>RAG pattern in Python<\/h3>\n<p>Once we have the rows that match the search criteria, we need to add the adjacent rows. The reason is to add more context to the data before calling <code>chat_completion<\/code>. In our scenario, <code>id<\/code> column is an <code>IDENTITY<\/code> column, that for our testing scenario was sufficient to simulate data-ordering load. In Python we have created some code in order to find the previous and the next row from the searched rows. The code includes:<\/p>\n<ul>\n<li>Removing duplicate rows.<\/li>\n<li>Go to Azure SQL database to get the text script using the function <code>get_dataframe_from_ids<\/code>.<\/li>\n<li>Merge and order the data.<\/li>\n<\/ul>\n<pre class=\"prettyprint language-py\"><code class=\"language-py\"># get the ids from the seach (id is the primary key)\r\nids = df1['id'].values\r\nextended_ids = []\r\n\r\n# for each id, add the previous and next id\r\nfor i in ids:\r\n    extended_ids.append(i-1)\r\n    extended_ids.append(i+1)\r\n\r\n# remove duplicates and order by id\r\nids = list(set(extended_ids))\r\nids.sort()\r\n\r\n# add to the dataframe the previous and next rows\r\ndf2 = get_dataframe_from_ids(source, table_name, summary, ids)\r\n\r\n# concatenate results\r\ndf = pd.concat([df1, df2], ignore_index=True)\r\ndf = df.drop_duplicates(subset='id', keep='first')\r\ndf = df.sort_values(by='id')\r\n# remove the id column from df\r\ndf = df.drop(columns=['id'])<\/code><\/pre>\n<p>Additionally, we have included some `business data`, like game results, and we did some testing adding game lineups, but was not relevant for our tests.<\/p>\n<p>And then, we build the prompt that will call the `get_chat_completion_from_azure_open_ai` python funtion:<\/p>\n<pre class=\"prettyprint language-py\"><code class=\"language-py\">prompt=\"\"\r\nprompt += f\"### EVENTS\\n\" + df.to_string(index=False, justify='left')\r\n\r\nif add_match_info.lower() == \"yes\":\r\n    d_match_info = get_game_result_data(source, match_id)\r\n    prompt += f\"\\n\\n### GAME RESULT\\n\" + d_match_info\r\n\r\nprompt += f\"\\n\\n### PROMPT\\n{search_term}\"<\/code><\/pre>\n<p>The <code>get_chat_completion_from_azure_open_ai<\/code> Python function is as simple as this:<\/p>\n<pre class=\"prettyprint language-py\"><code class=\"language-py\">def get_chat_completion_from_azure_open_ai(system_message, user_prompt, temperature, tokens):\r\n    \"\"\"\r\n    Retrieves a chat completion from Azure OpenAI API.\r\n    Args:\r\n        system_message (str): The system message.\r\n        user_prompt (str): The user prompt.\r\n        temperature (float): The temperature value for generating chat completions.\r\n        tokens (int): The maximum number of tokens for generating chat completions.\r\n    Returns:\r\n        str: The generated chat completion.\r\n    \"\"\"\r\n    client = AzureOpenAI(\r\n        azure_endpoint=os.getenv('OPENAI_ENDPOINT'),\r\n        api_key=os.getenv('OPENAI_KEY'),\r\n        api_version=\"2023-05-15\"\r\n    )\r\n\r\n    response = client.chat.completions.create(\r\n        model=os.getenv('OPENAI_MODEL'),\r\n        messages=[\r\n            {\"role\": \"system\", \"content\": system_message},\r\n            {\"role\": \"user\", \"content\": user_prompt},\r\n        ],\r\n        temperature=temperature,\r\n        max_tokens=tokens,\r\n    )\r\n\r\n    output = response.choices[0].message.content\r\n\r\n    output = output.replace('\\n\\n', '\\n').replace('\\n\\n', '\\n')\r\n\r\n    return output<\/code><\/pre>\n<h2>Conclusion: Explore the Power of RAG with Azure SQL Databases<\/h2>\n<p>Throughout this article, we have demonstrated how easily complex data retrieval and analysis can be orchestrated using Retrieval Augmented Generation (RAG) patterns with Azure SQL Databases. By showcasing a business case that\u2019s simple to understand, we aimed to illustrate the potential of this technology in a real-world scenario.<\/p>\n<p>Now it\u2019s your turn to adapt these capabilities to your own needs. Whether you&#8217;re working in finance, healthcare, retail, or any other industry, the principles we\u2019ve used here can be applied to your unique business cases. The flexibility and ease of SQL Server PaaS combined with RAG patterns empower you to extract actionable insights from your data with minimal effort.<\/p>\n<p>The GitHub repository includes all the necessary code to deploy this solution on Azure SQL Databases, along with step-by-step instructions. You\u2019ll learn how to:<\/p>\n<ol>\n<li>Load your data into SQL Server PaaS.<\/li>\n<li>Generate embeddings using OpenAI models for efficient retrieval.<\/li>\n<li>Activate queries that combine SQL and AI to extract meaningful insights using vector search.<\/li>\n<\/ol>\n<p>Discover how you can seamlessly integrate this solution into your data workflows and unlock the potential of your business insights.<\/p>\n<table style=\"border-collapse: collapse; width: 100%;\">\n<tbody>\n<tr>\n<td style=\"width: 33.3333%;\"><a href=\"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2024\/10\/image-02.png\"><img decoding=\"async\" class=\"wp-image-3718 aligncenter\" src=\"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2024\/10\/image-02-300x277.png\" alt=\"Image image 02\" width=\"200\" height=\"185\" srcset=\"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2024\/10\/image-02-300x277.png 300w, https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2024\/10\/image-02.png 692w\" sizes=\"(max-width: 200px) 100vw, 200px\" \/><\/a><\/td>\n<td style=\"width: 33.3333%;\"><a href=\"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2024\/10\/image-03.png\"><img decoding=\"async\" class=\"wp-image-3717 aligncenter\" src=\"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2024\/10\/image-03-227x300.png\" alt=\"Image image 03\" width=\"200\" height=\"264\" srcset=\"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2024\/10\/image-03-227x300.png 227w, https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2024\/10\/image-03.png 659w\" sizes=\"(max-width: 200px) 100vw, 200px\" \/><\/a><\/td>\n<td style=\"width: 33.3333%;\"><img decoding=\"async\" class=\"wp-image-3716 aligncenter\" src=\"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2024\/10\/image-04-270x300.png\" alt=\"Image image 04\" width=\"200\" height=\"222\" srcset=\"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2024\/10\/image-04-270x300.png 270w, https:\/\/devblogs.microsoft.com\/azure-sql\/wp-content\/uploads\/sites\/56\/2024\/10\/image-04.png 660w\" sizes=\"(max-width: 200px) 100vw, 200px\" \/><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p><div  class=\"d-flex justify-content-left\"><a class=\"cta_button_link btn-primary mb-24\" href=\"https:\/\/github.com\/erincon01\/RAG-Challenge\/tree\/main\" target=\"_blank\">Get the code<\/a><\/div><\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>The Football (aka Soccer in US \ud83d\ude00) Analisys Copilot provides an intuitive interface for users to interact with complex football data without needing advanced technical skills. By utilizing natural language processing, users can ask questions and retrieve detailed insights from vast datasets, including competitions, matches, teams, players, and events. This makes data exploration accessible to [&hellip;]<\/p>\n","protected":false},"author":24720,"featured_media":3729,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1,582,411,615],"tags":[590,602,632,407,633,591],"class_list":["post-3705","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-azure-sql","category-openai","category-python","category-vectors","tag-ai","tag-azure-openai","tag-genai","tag-python","tag-soccer","tag-vector-search"],"acf":[],"blog_post_summary":"<p>The Football (aka Soccer in US \ud83d\ude00) Analisys Copilot provides an intuitive interface for users to interact with complex football data without needing advanced technical skills. By utilizing natural language processing, users can ask questions and retrieve detailed insights from vast datasets, including competitions, matches, teams, players, and events. This makes data exploration accessible to [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/posts\/3705","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/users\/24720"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/comments?post=3705"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/posts\/3705\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/media\/3729"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/media?parent=3705"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/categories?post=3705"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/azure-sql\/wp-json\/wp\/v2\/tags?post=3705"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}