{"id":3372,"date":"2023-08-10T11:00:47","date_gmt":"2023-08-10T18:00:47","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/surface-duo\/?p=3372"},"modified":"2024-01-03T16:22:52","modified_gmt":"2024-01-04T00:22:52","slug":"android-openai-chatgpt-13","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/surface-duo\/android-openai-chatgpt-13\/","title":{"rendered":"Dynamic Sqlite queries with OpenAI chat functions"},"content":{"rendered":"<p>\n  Hello prompt engineers,\n<\/p>\n<p>\n  Previous blogs explained how to add droidcon session favorites to a database and also cache the embedding vectors in a database \u2013 but what if we stored everything in a database and then let the model query it directly?\n<\/p>\n<p>\n  The OpenAI Cookbook examples repo includes a section on <a href=\"https:\/\/github.com\/openai\/openai-cookbook\/blob\/main\/examples\/How_to_call_functions_with_chat_models.ipynb\">how to call functions with model generated arguments<\/a>, which includes a python demo of a function that understands a database schema and generates SQL that is executed to answer questions from the chat. There\u2019s also a <a href=\"https:\/\/platform.openai.com\/examples\/default-sql-translate\">natural language to SQL<\/a> demo that demonstrates the model\u2019s understanding of SQL.\n<\/p>\n<p>\n  In this post, we\u2019ll implement the model generated arguments function call in Kotlin, using the droidcon SF 2023 conference schedule data as the source.\n<\/p>\n<p>\n  <img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/07\/a-screenshot-of-a-chat-description-automatically-7.png\" class=\"wp-image-3373\" alt=\"Phone screenshot with AI chat about droidcon sessions\" width=\"450\" srcset=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/07\/a-screenshot-of-a-chat-description-automatically-7.png 1895w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/07\/a-screenshot-of-a-chat-description-automatically-7-195x300.png 195w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/07\/a-screenshot-of-a-chat-description-automatically-7-665x1024.png 665w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/07\/a-screenshot-of-a-chat-description-automatically-7-768x1183.png 768w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/07\/a-screenshot-of-a-chat-description-automatically-7-998x1536.png 998w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/07\/a-screenshot-of-a-chat-description-automatically-7-1330x2048.png 1330w\" sizes=\"(max-width: 1895px) 100vw, 1895px\" \/><br\/><em>Figure 1: JetchatAI droidcon SF demo answering questions with dynamically generated and executed SQL<\/em>\n<\/p>\n<h2>SQLite on Android<\/h2>\n<p>\n  Just like the previous posts on storing favorites and vectors, we\u2019ll use the Android Sqlite APIs described in the the <a href=\"https:\/\/developer.android.com\/training\/data-storage\/sqlite\">SQLite training documentation<\/a>. Unlike those functions, however, we\u2019ll be using the <code>rawQuery()<\/code> function rather than building up \u201cstrongly typed\u201d queries because the SQL will be generated by the model.\n<\/p>\n<p>\n  As the <a href=\"https:\/\/github.com\/openai\/openai-cookbook\/blob\/main\/examples\/How_to_call_functions_with_chat_models.ipynb\">example<\/a> explains, this can be dangerous in production applications, as the model is not perfectly reliable at generating valid SQL, so there\u2019s a possibility that it\u2019ll generate code that won\u2019t run (or an even more remote possibility that it could generate destructive queries). It could also expose data you don\u2019t want the user to see (from system tables, internal tables, or data they shouldn\u2019t have access to).\n<\/p>\n<p>\n  In our use case, we\u2019ll only be executing against a read-only database API, so there\u2019s no risk of data loss, and there\u2019s only a single-user\u2019s favorites stored so no privacy concerns either.\n<\/p>\n<h2>Creating the database<\/h2>\n<p>\n  See the <a href=\"https:\/\/github.com\/conceptdev\/droidcon-sf-23\/blob\/conceptd3v\/sqlquery-test\/Jetchat\/app\/src\/main\/java\/com\/example\/compose\/jetchat\/data\/DroidconDatabase.kt\"><strong>data\/DroidconDatabase.kt<\/strong><\/a> file for the complete database schema and data loading functionality. Since this is a sample focused on AI and not SQL, you\u2019ll notice the schema and code is somewhat simplistic. In a production application, you would spend more time on a robust data strategy.\n<\/p>\n<h2>Creating an SQL query function<\/h2>\n<p>\n  We\u2019ll start by creating the call function class, using the template from the <a href=\"https:\/\/devblogs.microsoft.com\/surface-duo\/android-openai-chatgpt-11\/\">save favorites<\/a> blog post. You can view the result in the <a href=\"https:\/\/github.com\/conceptdev\/droidcon-sf-23\/blob\/conceptd3v\/sqlquery-test\/Jetchat\/app\/src\/main\/java\/com\/example\/compose\/jetchat\/functions\/AskDatabaseFunction.kt\"><strong>functions\/AskDatabaseFunction.kt<\/strong><\/a> file, and the descriptive properties are shown in Figure 2:\n<\/p>\n<pre>class AskDatabaseFunction {\r\n    companion object {\r\n        fun name(): String {\r\n            return \"askDatabase\"\r\n        }\r\n        fun description(): String {\r\n            return \"Answer user questions about conference sessions like what room sessions are presented in. Output should be a fully formed SQL query.\"\r\n        }\r\n        \/\/ ...<\/pre>\n<p><em>Figure 2: Starting point for the <code>AskDatabaseFunction<\/code><\/em>\n<\/p>\n<p>\n  Now we need to figure out what the parameters are, and then how to handle the function when called.\n<\/p>\n<h2>Teaching the model about a schema<\/h2>\n<p>\n  Exposing the table schema to the chat model via a function can enable it to generate its own queries. The ideal way to achieve this is by inferring the actual database schema and feeding it to the model as a function parameter. Because we have a simple data structure in the droidcon demo, it was quick and easy to build a \u2018proof of concept\u2019 by hardcoding the schema.\n<\/p>\n<p>\n  Figure 3 shows the hardcoded schema description along with prompt prefix and suffix that guides the model in how to interpret it.\n<\/p>\n<pre>fun params(): Parameters {\r\n    val params = Parameters.buildJsonObject {\r\n        put(\"type\", \"object\")\r\n        putJsonObject(\"properties\") {\r\n            putJsonObject(\"query\") {\r\n                put(\"type\", \"string\")\r\n                put(\"\"\"\r\n  SQL query extracting info to answer the user's question.\r\n  SQL should be written using this database schema:\r\n  Table: sessions\r\n  Columns: session_id, speaker, role, location_id, date, time, subject, description\r\n  Table: favorites\r\n  Columns: session_id, is_favorite\r\n  Table: locations\r\n  Columns: location_id, description\r\n  The query should be returned in plain text, not in JSON.\"\"\".trimIndent()\r\n                )\r\n            }\r\n        }\r\n        putJsonArray(\"required\") {\r\n            add(\"query\")\r\n        }\r\n    }\r\n    return params\r\n}<\/pre>\n<p><em>Figure 3: Create a hardcoded schema to test the function and the SQL generated by the model<\/em>\n<\/p>\n<p>\n  The class is complete enough to do some debugging \u2013 add the following code to the <code>chatCompletionRequest<\/code> and you can use <em>logcat<\/em> or debugging to view the SQL that the model generates in response to your questions.\n<\/p>\n<pre>function {\r\n      name = AskDatabaseFunction.name()\r\n      description = AskDatabaseFunction.description()\r\n      parameters = AskDatabaseFunction.params()\r\n}<\/pre>\n<p><em>Figure 4: Adding the hardcoded function params allows us to test the model\u2019s responses<\/em>\n<\/p>\n<p>\n  If you plan to manage your database schema carefully, you could probably get away with shipping a hardcoded schema like this. It would also make it easy to omit some tables or columns, preventing the model from exposing some data.\n<\/p>\n<h2>Teaching the model with the latest schema<\/h2>\n<p>\n  To fully implement the OpenAI cookbook example we need to dynamically generate the schema using system tables. This will make the code less fragile and prevent a hardcoded schema from getting \u2018out of sync\u2019 with any changes made in <strong>DroidconDatabase.kt<\/strong>. By dynamically generating the tables and columns each time, the model will always be getting the latest schema that reflects exactly what\u2019s present in Sqlite.\n<\/p>\n<p>\n  To generate the schema, we need to query <code>sqlite_master<\/code> and then the <code>table_info<\/code> to get the table and column names, then format them into a structure like the hardcoded version shown above. Figure 3 shows the function that uses Sqlite system tables and outputs the schema shape as text:\n<\/p>\n<pre>fun generateSimpleSchema(): String {\r\n    val db = readableDatabase\r\n    var out = \"\"\r\n    val tableCursor = db.rawQuery(\"SELECT name FROM sqlite_master WHERE type='table'\", null)\r\n    with(tableCursor) {\r\n        while (moveToNext()) {\r\n            val tableName = getString(0)  \/\/ table name\r\n            out += \"Table: $tableName\\nColumns: \"\r\n            val columnCursor = db.rawQuery(\"PRAGMA table_info('$tableName');\", null)\r\n            var needComma = false\r\n            with(columnCursor) {\r\n                while (moveToNext()) {\r\n                    val columnName = getString(1)  \/\/ column name\r\n                    if (needComma) out += \", \" else needComma = true\r\n                    out += \"$columnName\"\r\n                }\r\n            }\r\n            columnCursor.close()\r\n            out += \"\\n\\n\"\r\n        }\r\n    }\r\n    tableCursor.close()\r\n    return out\r\n}<\/pre>\n<p><em>Figure 5: Use the system tables to generate the database schema<\/em>\n<\/p>\n<p>\n  To use the dynamically generated schema, call <code>db.generateSimpleSchema()<\/code> in  the <code>params<\/code> method and use the result in the query.\n<\/p>\n<p>\n  Now that the model has access to the latest database schema, all that remains is to execute the queries that it generates.\n<\/p>\n<h2>Dynamic SQL results<\/h2>\n<p>\n  As noted above, we\u2019re showing a lot of trust in the model by executing its SQL queries directly in our database. The code is shown in Figure 6 \u2013 notice the first line creates a read-only access to the database and then we execute the raw query directly. The rest of the function is just string formatting to return the columns in a specific format \u2013 in this case mimicking the OpenAI demo which uses a layout like this: <code>[('abc', 1),('def', 2)]<\/code>:\n<\/p>\n<pre>fun function(dbHelper: DroidconDbHelper, query: String): String {\r\n    val db = dbHelper.readableDatabase\r\n    val cursor = db.rawQuery(query,null)  \/\/ execute the query!\r\n    var rowCount = 0\r\n    var out = \"\"\r\n    var needOuterComma = false\r\n    with(cursor) {\r\n        out += \"[\"\r\n        var needComma = false\r\n        while (moveToNext()) {\r\n            if (needOuterComma) out += \",\" else needOuterComma = true\r\n            out += \"(\"\r\n            for (i in 0 until cursor.columnCount) {\r\n                if (needComma) out += \",\" else needComma = true\r\n                out += when (getType(i)) {\r\n                    FIELD_TYPE_INTEGER -&gt;\r\n                        getString(i)\r\n                    else -&gt; {\r\n                        \"'${getString(i)}'\"\r\n                    }\r\n                }\r\n            }\r\n            out += \")\"\r\n            rowCount++\r\n        }\r\n        out += \"]\"\r\n    }\r\n    cursor.close()\r\n    return if (out == \"\") {\r\n        \"0 rows affected\"\r\n    } else {\r\n        out\r\n    }\r\n}<\/pre>\n<p><em>Figure 6: Chat function class that can execute SQL directly on the local database<\/em>\n<\/p>\n<p>\n  The model seems to have no trouble understanding the output format and matching the tuples with the query that it created, so it can generate responses that correctly interpret complex return datasets.\n<\/p>\n<h2>How the model uses SQL querying capabilities<\/h2>\n<p>\n  The droidcon SF chat demo already has a couple of mechanisms for interacting with conference data:\n<\/p>\n<ul>\n<li>\n    Conference metadata is hardcoded in the system prompt\n  <\/li>\n<li>\n    Embeddings are used to for retrieval augmented generation\n  <\/li>\n<li>\n    There\u2019s a data-driven function to return sessions by time\n  <\/li>\n<li>\n    The favorites feature has its own insert\/select\/delete functionality over the data\n  <\/li>\n<\/ul>\n<p>\n  When we add the <code>SqlQueryFunction<\/code> to the mix, the model might not always choose the most effective way to answer user queries. When all these options are available, the <code>AskDatabaseFunction<\/code> gets called mostly for queries like \u201chow many sessions is craig presenting?\u201d or \u201chow many sessions are in Robertson 1?\u201d, e.g.:\n<\/p>\n<p>\n  <img decoding=\"async\" width=\"1761\" height=\"972\" src=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/07\/a-screenshot-of-a-chat-description-automatically-8.png\" class=\"wp-image-3374\" alt=\"Chat answers about numbers of sessions, using generated SQL\" srcset=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/07\/a-screenshot-of-a-chat-description-automatically-8.png 1761w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/07\/a-screenshot-of-a-chat-description-automatically-8-300x166.png 300w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/07\/a-screenshot-of-a-chat-description-automatically-8-1024x565.png 1024w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/07\/a-screenshot-of-a-chat-description-automatically-8-768x424.png 768w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/07\/a-screenshot-of-a-chat-description-automatically-8-1536x848.png 1536w\" sizes=\"(max-width: 1761px) 100vw, 1761px\" \/>\n<\/p>\n<p><em>Figure 7a: the types of questions that could get routed to the SQL querying function<\/em>\n<\/p>\n<pre>SELECT COUNT(*) FROM sessions WHERE speaker = 'CRAIG DUNN'<\/pre>\n<p><em>Figure 7b: example query from logcat<\/em>\n<\/p>\n<p>\n  For a query like \u201chow many sessions are in Robertson 1?\u201d, even though the database query would produce a better result, the model calls the <code>SessionsByTimeFunction<\/code> which gives a partial answer (only the sessions in Robertson 1 within the default time parameters, so in this case it answers \u201c2 sessions\u201d).\n<\/p>\n<p>\n  Other questions that are possible include \u201cwhat rooms are there?\u201d and \u201chow do I get to fisher east\u201d:\n<\/p>\n<p>\n  <img decoding=\"async\" width=\"1768\" height=\"1036\" src=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/07\/a-screenshot-of-a-chat-description-automatically-9.png\" class=\"wp-image-3375\" alt=\"Chat answers about the rooms available generated by SQL\" srcset=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/07\/a-screenshot-of-a-chat-description-automatically-9.png 1768w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/07\/a-screenshot-of-a-chat-description-automatically-9-300x176.png 300w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/07\/a-screenshot-of-a-chat-description-automatically-9-1024x600.png 1024w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/07\/a-screenshot-of-a-chat-description-automatically-9-768x450.png 768w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/07\/a-screenshot-of-a-chat-description-automatically-9-1536x900.png 1536w\" sizes=\"(max-width: 1768px) 100vw, 1768px\" \/>\n<\/p>\n<p><em>Figure 8a: more questions that can be answered by the AskDatabaseFunction<\/em>\n<\/p>\n<p>\n  Which result in the following queries and responses:\n<\/p>\n<pre>SELECT DISTINCT location_id FROM locations\r\n[('Fisher East'),(,'Fisher West'),(,'Robertson 1'),(,'Robertson 2')]\r\n\r\nSELECT directions FROM locations WHERE location_id = 'Fisher East'\r\n[('Outside off the courtyard')]\r\n<\/pre>\n<p><em>Figure 8b: the generated SQL and query result for the examples in Figure 9a<\/em>\n<\/p>\n<p>\n  With the ability to generate and execute SQL the model can answer many different types of questions beyond the <a href=\"https:\/\/devblogs.microsoft.com\/surface-duo\/android-openai-chatgpt-7\/\">embeddings-based version<\/a> that was originally presented at droidcon SF 2023.\n<\/p>\n<h2>Resources and feedback<\/h2>\n<p>\n  The Jetchat sample with the droidcon schedule is available at <a href=\"https:\/\/github.com\/conceptdev\/droidcon-sf-23\">github.com\/conceptdev\/droidcon-sf-23<\/a>. \n<\/p>\n<p>\n  We\u2019d love your feedback on this post &#8211; if you have any thoughts or questions, use the\u00a0<a href=\"http:\/\/aka.ms\/SurfaceDuoSDK-Feedback\" target=\"_blank\" rel=\"noopener\">feedback forum<\/a>\u00a0or message us on\u00a0<a href=\"https:\/\/twitter.com\/surfaceduodev\" target=\"_blank\" rel=\"noopener\">Twitter @surfaceduodev<\/a>.\n<\/p>\n<p>\n  There will be no livestream this week, but you can check out the\u00a0<a href=\"https:\/\/youtube.com\/c\/surfaceduodev\" target=\"_blank\" rel=\"noopener\">archives on YouTube<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Hello prompt engineers, Previous blogs explained how to add droidcon session favorites to a database and also cache the embedding vectors in a database \u2013 but what if we stored everything in a database and then let the model query it directly? The OpenAI Cookbook examples repo includes a section on how to call functions [&hellip;]<\/p>\n","protected":false},"author":570,"featured_media":3375,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[741],"tags":[734,733],"class_list":["post-3372","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai","tag-chatgpt","tag-openai"],"acf":[],"blog_post_summary":"<p>Hello prompt engineers, Previous blogs explained how to add droidcon session favorites to a database and also cache the embedding vectors in a database \u2013 but what if we stored everything in a database and then let the model query it directly? The OpenAI Cookbook examples repo includes a section on how to call functions [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/posts\/3372","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/users\/570"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/comments?post=3372"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/posts\/3372\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/media\/3375"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/media?parent=3372"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/categories?post=3372"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/tags?post=3372"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}