{"id":3380,"date":"2023-08-03T11:00:54","date_gmt":"2023-08-03T18:00:54","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/surface-duo\/?p=3380"},"modified":"2024-01-03T16:23:05","modified_gmt":"2024-01-04T00:23:05","slug":"android-openai-chatgpt-12","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/surface-duo\/android-openai-chatgpt-12\/","title":{"rendered":"Embedding vector caching (redux)"},"content":{"rendered":"<p>\n  Hello prompt engineers,\n<\/p>\n<p>\n  Earlier this year I tried to create a <a href=\"https:\/\/devblogs.microsoft.com\/surface-duo\/android-openai-chatgpt-8\/\">hardcoded cache of embedding vectors<\/a>, only to be thwarted by the limitations of Kotlin (the combined size of the arrays of numbers exceeded Kotlin\u2019s maximum function size). Now that we\u2019ve added Sqlite to the solution to <a href=\"#post-3380-last-post\">support memory and querying<\/a>, we can use that infrastructure to <em>also<\/em> cache the embedding vectors.\n<\/p>\n<p>\n  Note that the version of Sqlite we\u2019ll use on Android does not have any special \u201cvector database\u201d features \u2013 instead, the embedding vectors will just be serialized\/deserialized and stored in a <code>TEXT<\/code> column. Embedding vector similarity comparisons will continue to be done with the <code>dot<\/code> function defined in <a href=\"https:\/\/github.com\/conceptdev\/droidcon-sf-23\/blob\/conceptdev\/db-test\/Jetchat\/app\/src\/main\/java\/com\/example\/compose\/jetchat\/DroidconEmbeddingsWrapper.kt\"><strong>DroidconEmbeddingsWrapper.kt<\/strong><\/a>.\n<\/p>\n<p>\n  The demo app we\u2019re working on is the <strong>JetchatAI<\/strong> sample that allows you to query the conference schedule for the <em>droidcon SF 2023<\/em> event. The sample code is available at <a href=\"https:\/\/github.com\/conceptdev\/droidcon-sf-23\/blob\/main\/Jetchat\">github.com\/conceptdev\/droidcon-sf-23<\/a>. Don\u2019t forget to choose the <strong>droidcon-chat<\/strong> from the top-left menu to test out these features.\n<\/p>\n<p>\n  <img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/07\/a-screenshot-of-a-chat-description-automatically-10.png\" class=\"wp-image-3381\" alt=\"Chat using embeddings\" width=\"400\" srcset=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/07\/a-screenshot-of-a-chat-description-automatically-10.png 757w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/07\/a-screenshot-of-a-chat-description-automatically-10-244x300.png 244w\" sizes=\"(max-width: 757px) 100vw, 757px\" \/>\n<\/p>\n<p><em>Figure 1: Embeddings with similarity comparisons enable retrieval augmented generation (RAG) chat interactions<\/em>\n<\/p>\n<h2>Configure the database<\/h2>\n<p>\n  The database setup uses the same pattern that we used for the <a href=\"https:\/\/devblogs.microsoft.com\/surface-duo\/android-openai-chatgpt-11\/\">favorites feature<\/a>. The schema and create\/delete scripts for an embeddings table are defined in the <a href=\"https:\/\/github.com\/conceptdev\/droidcon-sf-23\/blob\/conceptdev\/db-test\/Jetchat\/app\/src\/main\/java\/com\/example\/compose\/jetchat\/data\/DroidconDatabase.kt\"><strong>data\/DroidconDatabase.kt<\/strong><\/a> file:\n<\/p>\n<pre>object EmbeddingEntry : BaseColumns {\r\n      const val TABLE_NAME = \"embedding\"\r\n      const val COLUMN_NAME_SESSIONID = \"session_id\"\r\n      const val COLUMN_NAME_VECTOR = \"vector\"\r\n}\r\nprivate const val SQL_CREATE_EMBEDDING_ENTRIES =\r\n    \"CREATE TABLE ${DroidconContract.EmbeddingEntry.TABLE_NAME} (\" +\r\n            \"${DroidconContract.EmbeddingEntry.COLUMN_NAME_SESSIONID} TEXT PRIMARY KEY,\" +\r\n            \"${DroidconContract.EmbeddingEntry.COLUMN_NAME_VECTOR} TEXT)\"\r\nprivate const val SQL_DELETE_EMBEDDING_ENTRIES = \"DROP TABLE IF EXISTS ${DroidconContract.EmbeddingEntry.TABLE_NAME}\"<\/pre>\n<p><em>Figure 2: database schema declaration<\/em>\n<\/p>\n<p>\n  The <code>DroidconDbHelper<\/code> then takes care of creating the table (don\u2019t forget to update the <code>DATABASE_VERSION<\/code> whenever you make schema changes).\n<\/p>\n<h2>Store vectors with SQLite<\/h2>\n<p>\n  There is already an <code>initVectorCache<\/code> function in the <strong>DroidconEmbeddingsWrapper.kt<\/strong> file which generates all the vectors in a loop over the <code>droidconSessions<\/code> collection and stores the vectors in-memory in the <code>vectorCache<\/code> variable. The updated code leverages this existing loop to <em>also<\/em> insert each vector into the local database.\n<\/p>\n<p>\n  As mentioned above, there are no special vector-handling features in the existing version of Sqlite, so the code will treat the vector as a string for data storage and retrieval.\n<\/p>\n<p>\n  In Figure 2, you can see the first half of the code snippet has not changed \u2013 it still loops through the conference schedule and generates an embedding for each session via the web API. The second half of the snippet adds the database customization \u2013 first the vector (which is an array of <code>Double<\/code> values) gets serialized, and then it is inserted into the database using the <code>dbHelper<\/code> class that has been added to the function:\n<\/p>\n<pre>\r\n\/\/ Gets the data repository in write mode\r\nval db = dbHelper.writableDatabase\r\n    for (session in DroidconSessionData.droidconSessions) {\r\n        val embeddingRequest = EmbeddingRequest(\r\n            model = ModelId(Constants.OPENAI_EMBED_MODEL),\r\n            input = listOf(session.value)\r\n        )\r\n        val embedding = openAI.embeddings(embeddingRequest)\r\n        val vector = embedding.embeddings[0].embedding.toDoubleArray()\r\n        \/\/ add to in-memory cache\r\n        vectorCache[session.key] = vector\r\n        \/\/ serialize the vector to a string - removed for clarity\r\n        \/\/ Create a new map of values, where column names are the keys\r\n        val values = ContentValues().apply {\r\n            put(DroidconContract.EmbeddingEntry.COLUMN_NAME_SESSIONID, session.key)\r\n            put(DroidconContract.EmbeddingEntry.COLUMN_NAME_VECTOR, vectorString)\r\n        }\r\n        \/\/ Insert the new row, returning the primary key value of the row (would be -1 if error)\r\n        val newRowId =\r\n            db?.insert(DroidconContract.EmbeddingEntry.TABLE_NAME, null, values)\r\n    }<\/pre>\n<p><em>Figure 3: code that generates embedding vectors then inserts into a local database<\/em>\n<\/p>\n<p>\n  The function also has a new check <code>if (vectorCache.isEmpty())<\/code> so that once the data has been loaded once, it does not need to be re-loaded.\n<\/p>\n<h2>Retrieve vectors with SQLite<\/h2>\n<p>\n  Once the database has been loaded with embeddings, the existing <code>initVectorCache<\/code> function should now be able to use the locally stored embeddings rather than regenerating from the web API. The code to load the embedding vectors is encapsulated in a new <code>loadVectorCache<\/code> function:\n<\/p>\n<pre>private fun loadVectorCache(dbHelper: DroidconDbHelper): Int {\r\n    var rowCount = 0\r\n    val db = dbHelper.readableDatabase\r\n    val projection = arrayOf(\r\n        DroidconContract.EmbeddingEntry.COLUMN_NAME_SESSIONID,\r\n        DroidconContract.EmbeddingEntry.COLUMN_NAME_VECTOR)\r\n    val cursor = db.query(\r\n        DroidconContract.EmbeddingEntry.TABLE_NAME,   \/\/ The table to query\r\n        projection,             \/\/ The array of columns to return \r\n        null, null, null, null, null  \/\/ unused customizations\r\n    )\r\n    with(cursor) {\r\n        while (moveToNext()) {\r\n            val sessionId = getString(getColumnIndexOrThrow(DroidconContract.EmbeddingEntry.COLUMN_NAME_SESSIONID))\r\n            val vectorString = getString(getColumnIndexOrThrow(DroidconContract.EmbeddingEntry.COLUMN_NAME_VECTOR))\r\n            \/\/ deserialize vector - removed for clarity\r\n            \/\/ add to in-memory cache\r\n            vectorCache[sessionId] = vector.toDoubleArray()\r\n            rowCount++\r\n        }\r\n    }\r\n    cursor.close()\r\n    return rowCount\r\n}<\/pre>\n<p><em>Figure 4: read the embedding vectors from the database and store in-memory cache<\/em>\n<\/p>\n<p>\n  The <code>initVectorCache<\/code> is responsible for loading via the most efficient method \u2013 it will check in-memory first, if the embeddings aren\u2019t cached it will attempt to load from the database. If the database is empty, it will generate new embeddings from the web API and update the database for next time.\n<\/p>\n<h2>Profit!<\/h2>\n<p>\n  The old version of the code needed to get embeddings each time the app was started. There are around 70 sessions in the droidcon schedule, so that was 70 web API calls every time! Now the web API calls happen <em>once<\/em> and the results are stored in the database, available in each subsequent app restart. The only time they\u2019ll be regenerated is if the database schema changes \u2013 but you can optimize that too by spending some time on database migration strategies (beyond the scope of this post).\n<\/p>\n<p>\n  Not only is this cheaper (no more unnecessary embeddings API calls), but it\u2019ll also make the initial conversation startup time much faster.\n<\/p>\n<h2>Resources and feedback<\/h2>\n<p>\n  We\u2019d love your feedback on this post, including any tips or tricks you\u2019ve learning from playing around with ChatGPT prompts.\n<\/p>\n<p>\n  If you have any thoughts or questions, use the\u00a0<a href=\"http:\/\/aka.ms\/SurfaceDuoSDK-Feedback\" target=\"_blank\" rel=\"noopener\">feedback forum<\/a>\u00a0or message us on\u00a0<a href=\"https:\/\/twitter.com\/surfaceduodev\" target=\"_blank\" rel=\"noopener\">Twitter @surfaceduodev<\/a>.\n<\/p>\n<p>\n  There will be no livestream this week, but you can check out the\u00a0<a href=\"https:\/\/youtube.com\/c\/surfaceduodev\" target=\"_blank\" rel=\"noopener\">archives on YouTube<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Hello prompt engineers, Earlier this year I tried to create a hardcoded cache of embedding vectors, only to be thwarted by the limitations of Kotlin (the combined size of the arrays of numbers exceeded Kotlin\u2019s maximum function size). Now that we\u2019ve added Sqlite to the solution to support memory and querying, we can use that [&hellip;]<\/p>\n","protected":false},"author":570,"featured_media":3318,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[741],"tags":[734,733],"class_list":["post-3380","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai","tag-chatgpt","tag-openai"],"acf":[],"blog_post_summary":"<p>Hello prompt engineers, Earlier this year I tried to create a hardcoded cache of embedding vectors, only to be thwarted by the limitations of Kotlin (the combined size of the arrays of numbers exceeded Kotlin\u2019s maximum function size). Now that we\u2019ve added Sqlite to the solution to support memory and querying, we can use that [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/posts\/3380","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/users\/570"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/comments?post=3380"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/posts\/3380\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/media\/3318"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/media?parent=3380"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/categories?post=3380"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/tags?post=3380"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}