{"id":3284,"date":"2023-06-15T10:57:45","date_gmt":"2023-06-15T17:57:45","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/surface-duo\/?p=3284"},"modified":"2024-01-03T16:25:24","modified_gmt":"2024-01-04T00:25:24","slug":"android-openai-chatgpt-7","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/surface-duo\/android-openai-chatgpt-7\/","title":{"rendered":"JetchatAI gets smarter with embeddings"},"content":{"rendered":"<p>\n  Hello prompt engineers,\n<\/p>\n<p>\n  A few weeks ago we finished a series of posts building an <a href=\"https:\/\/devblogs.microsoft.com\/surface-duo\/android-openai-chatgpt-6\/\">AI chatbot using the Jetchat sample<\/a> with OpenAI. The sample uses the chat and image endpoints, but has the same limitation as many LLMs, which is that its knowledge is limited to the training data (for example, anything after September 2021 is not included). A common requirement for extending these models is to respond with newer data, or internal corporate data, that isn\u2019t part of the model. Re-training isn\u2019t an option, so other patterns have emerged to incorporate additional datasets into chat conversations.\n<\/p>\n<p>\n  I presented a <a href=\"https:\/\/sf.droidcon.com\/craig-dunn\/\">session<\/a> at <a href=\"https:\/\/sf.droidcon.com\/\">droidcon SF<\/a> last week, so it seemed appropriate to take our existing JetchatAI sample and enable it to answer questions about the <a href=\"https:\/\/sf.droidcon.com\/agenda\/\">conference schedule<\/a>.\n<\/p>\n<p>\n  <img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/06\/word-image-3284-1.png\" class=\"wp-image-3285\" width=\"400\" alt=\"Android device screenshot showing a chat application with a conversation about droidcon SF sessions\" srcset=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/06\/word-image-3284-1.png 688w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/06\/word-image-3284-1-243x300.png 243w\" sizes=\"(max-width: 688px) 100vw, 688px\" \/><br\/><em>Figure 1: JetchatAI with droidcon embeddings<\/em>\n<\/p>\n<h2>It\u2019s RAG time<\/h2>\n<p><a href=\"https:\/\/www.promptingguide.ai\/techniques\/rag\">RAG<\/a> is short for Retrieval Augmented Generation (see also this <a href=\"https:\/\/arxiv.org\/pdf\/2005.11401.pdf\">paper from Meta<\/a>) and describes a pattern for querying LLMs where we first examine the user\u2019s input and try to determine what data they\u2019re interested in, and then if needed pre-fetch related information to include as \u2018context\u2019 in the actual request to the model. \n<\/p>\n<p>\n  This additional context could come from the internet, a database, or any other query-able source, including internal documentation or personal content.\n<\/p>\n<p>\n  The hard part is figuring out, from the user\u2019s input, exactly what information do we need to pre-fetch? There could be a number of approaches, from a custom model trained on extracting query intent for a given use-case, to something simple like extracting keywords and conducting a search. Some different approaches for generating responses are discussed in this blog about using <a href=\"https:\/\/techcommunity.microsoft.com\/t5\/ai-applied-ai-blog\/revolutionize-your-enterprise-data-with-chatgpt-next-gen-apps-w\/ba-p\/3762087\">OpenAI with Azure Cognitive Search<\/a>.\n<\/p>\n<blockquote><p>NOTE: A common question on this approach is \u201conce you\u2019ve pre-fetched information, why not just show that to the user? Why send the augmented query to the LLM at all?\u201d. The answer lies in the model\u2019s ability to summarize the collected information, hopefully ignoring irrelevant data, and crafting a text response that most directly answers the original question.<\/p><\/blockquote>\n<p>\n  This diagram from the Azure blog helps to visualize the process:\n<\/p>\n<ol>\n<li>\n  User enters their question into a chat session.\n<\/li>\n<li>\n  The orchestrator looks at the query and decides if more information is required.\n<\/li>\n<li>\n  The orchestrator \u201cretrieves\u201d relevant content from its data sources. Information should be relevant and concise (there are size limits on LLM queries).\n<\/li>\n<li>\n  The final prompt is constructed by concatenating (\u201caugmenting\u201d!) the user question with the pre-fetched data. Additional \u201cmeta-prompt\u201d text may be added to instruct the model to use the additional data to answer the question.\n<\/li>\n<li>\n  The LLM will incorporate the pre-fetched data into its answer, appearing to have knowledge not included in its training dataset! \n<\/li>\n<\/ol>\n<p>\n  <img decoding=\"async\" width=\"999\" height=\"475\" src=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/06\/thumbnail-image-1-of-blog-post-titled.png\" class=\"wp-image-3286\" alt=\"Revolutionize your Enterprise Data with ChatGPT: Next-gen Apps w\/ Azure OpenAI and Cognitive Search\" srcset=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/06\/thumbnail-image-1-of-blog-post-titled.png 999w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/06\/thumbnail-image-1-of-blog-post-titled-300x143.png 300w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/06\/thumbnail-image-1-of-blog-post-titled-768x365.png 768w\" sizes=\"(max-width: 999px) 100vw, 999px\" \/><br\/><em>Figure 2: Architecture diagram showing cognitive search being used to augment a ChatGPT prompt<\/em><\/p>\n<p>\n  For the droidcon demo I extracted the session information into a collection of text chunks to serve as the data source. To determine whether there were any relevant conference sessions in a given user question, I used another LLM feature: embeddings.\n<\/p>\n<h2>What are embeddings?<\/h2>\n<p>\n  The OpenAI blog has a great <a href=\"https:\/\/openai.com\/blog\/introducing-text-and-code-embeddings\">introduction to embeddings<\/a>. An embedding is a numerical representation of content within an LLMs conceptual space, a vector with hundreds or thousands of dimensions. OpenAI has an embedding endpoint that will return the vector for any text input.\n<\/p>\n<p>\n  This visualization from the OpenAI blog shows how text snippets with related concepts cluster together in the embedding space. The number of dimensions in the vectors has been mathematically reduced from 2,048 to 3 dimensions to make it easier to read. Each colored point on the chart represents one or two sentences, such as \u201cPhil Gilbert (born 15 November 1969) is an Australian rules footballer. He played for both the Melbourne and Freemantle Football clubs in the Australian Football League\u201d or \u201cTSS Olga was a steam turbine cargo vessel operated by the London and North Western Railway from 1887 to 1908\u201d. The chart shows how the embeddings (the vector, or coordinates) for similar concepts are clustered together in the embedding space, and color coded to show how text about different categories is distributed.\n<\/p>\n<p>\n  <img decoding=\"async\" width=\"659\" height=\"49\" src=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/06\/word-image-3284-3.png\" class=\"wp-image-3287\" srcset=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/06\/word-image-3284-3.png 659w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/06\/word-image-3284-3-300x22.png 300w\" sizes=\"(max-width: 659px) 100vw, 659px\" \/><br\/><img decoding=\"async\" width=\"1153\" height=\"879\" src=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/06\/word-image-3284-4.png\" class=\"wp-image-3288\" srcset=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/06\/word-image-3284-4.png 1153w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/06\/word-image-3284-4-300x229.png 300w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/06\/word-image-3284-4-1024x781.png 1024w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/06\/word-image-3284-4-768x585.png 768w\" sizes=\"(max-width: 1153px) 100vw, 1153px\" \/><br\/><em>Figure 3: simplified visualization of different embeddings (source: <a href=\"https:\/\/openai.com\/blog\/introducing-text-and-code-embeddings\">OpenAI.com\/blog<\/a>)<\/em>\n<\/p>\n<p>\n  This property of embeddings \u2013 where similar concepts are \u201cclose\u201d to each other \u2013 can be used evaluate similarities between two chunks of text using vector operations like dot product. For additional chunks of text, we can create an embedding vector, \u201cmap\u201d it in the visualization, and determine whether that text fits into one of these five categories based on how close it is to other embeddings.\n<\/p>\n<h2>Use embeddings to augment chat completions<\/h2>\n<p>\n  To extend JetchatAI to be able to answer questions about the droidcon schedule:\n<\/p>\n<ol>\n<li>\n  Create a \u201cdatabase\u201d of the droidcon session information\n<\/li>\n<li>\n  Generate an embedding vector for each session\n<\/li>\n<li>\n  For each chat input message, generate an embedding vector, and then compare it (via dot product) with each of the sessions\n<\/li>\n<li>\n  Construct LLM prompt including relevant session information (determined by dot product similarity scores)\n<\/li>\n<li>\n  Show LLM response in chat\n<\/li>\n<\/ol>\n<h3>1. Session \u201cdatabase\u201d<\/h3>\n<p>\n  Although you can use vector databases to efficiently store embeddings and their associated content, this simple demo just stores the data in memory using a <code>Map<\/code>. An example of the raw session data is shown below. Note that the data has been semi-structured as key-value pairs \u2013 this helps the LLM understand context when formulating its final response.\n<\/p>\n<pre>  val droidconSessions: Map&lt;String, String&gt; = <em>mapOf<\/em>(\r\n  \"craig-dunn\" <em>to <\/em>\"\"\"Speaker: CRAIG DUNN\r\n  Role: Software Engineer at Microsoft\r\n  Location: Robertson 1\r\n  Date: 2023-06-09\r\n  Time: 16:30\r\n  Subject: AI for Android on- and off-device\r\n  Description: AI and ML bring powerful new features to app developers, for processing text, images, audio, video, and more. In this session we\u2019ll compare and contrast the opportunities available with on-device models using ONNX and the ChatGPT model running in the cloud.\"\"\"\r\n  \/\/...<\/pre>\n<p>\n  There are about 70 sessions in the example file <strong><a href=\"https:\/\/github.com\/conceptdev\/droidcon-sf-23\/blob\/main\/Jetchat\/app\/src\/main\/java\/com\/example\/compose\/jetchat\/data\/DroidconSessionData.kt\">DroidconSessionData.kt<\/a><\/strong><a href=\"https:\/\/github.com\/conceptdev\/droidcon-sf-23\/blob\/main\/Jetchat\/app\/src\/main\/java\/com\/example\/compose\/jetchat\/data\/DroidconSessionData.kt\">.<\/a>\n<\/p>\n<h3>2. Generate embeddings<\/h3>\n<p>\n  The code to generate embeddings is in the <code>initVectorCache<\/code> method in <a href=\"https:\/\/github.com\/conceptdev\/droidcon-sf-23\/blob\/main\/Jetchat\/app\/src\/main\/java\/com\/example\/compose\/jetchat\/DroidconEmbeddingsWrapper.kt#L135\"><strong>DroidconEmbeddingsWrapper.kt<\/strong><\/a>. Once again, in a production app you would pre-calculate and store these in a database of some kind. For this demo, they are calculated on-the-fly.\n<\/p>\n<p>\n  This code uses the OpenAI embedding endpoint to loop through all the sessions and create an embedding vector to be stored in memory:\n<\/p>\n<pre>  for (session in DroidconSessionData.droidconSessions) {\r\n      val embeddingRequest = EmbeddingRequest(\r\n          model = ModelId(\"text-embedding-ada-002\"),\r\n          input = <em>listOf<\/em>(session.value)\r\n      )\r\n      val embeddingResult = openAI.embeddings(embeddingRequest)\r\n      val vector = embeddingResult.embeddings[0].embedding.<em>toDoubleArray<\/em>()\r\n      vectorCache[session.key] = vector\r\n  }<\/pre>\n<h3>3. Process new chat messages<\/h3>\n<p>\n  Creating an embedding for each message is done in the <code>grounding<\/code> method using the same OpenAI endpoint. The message vector is then compared against the embeddings for each session \u2013 similar to an inefficient index lookup in a database. The results are sorted so that the best matches can be easily extracted:\n<\/p>\n<pre>\r\n  for (session in vectorCache) {\r\n      val v = messageVector <em>dot <\/em>session.value\r\n      sortedVectors[v] = session.key\r\n  }\r\n<\/pre>\n<h3>4. Construct augmented prompt<\/h3>\n<p>\n  When the similarity (calculated by the dot product) is above a certain threshold (0.8 for this demo), the session text is included in the prompt that will be sent to the LLM.\n<\/p>\n<p>\n  This code builds the augmented prompt, adding additional context and instructions at the end.\n<\/p>\n<pre>  if (sortedVectors.lastKey() &gt; 0.8) { <em>\/\/ arbitrary match threshold<\/em>\r\n      messagePreamble =\r\n          \"Following are some talks\/sessions scheduled for the droidcon San Francisco conference in June 2023:\\n\\n\"\r\n      for (dpKey in sortedVectors.tailMap(0.8)) {\r\n          messagePreamble += DroidconSessionData.droidconSessions[dpKey.value] + \"\\n\\n\"\r\n      }\r\n      messagePreamble += \"\\n\\nUse the above information to answer the following question. Summarize and provide date\/time and location if appropriate.\\n\\n\"\r\n  }<\/pre>\n<p>\n  The initial chat message will be added to the end of this prompt before being appended to the <code>conversation<\/code> data structure and sent to the model. The augmented prompt is never shown in the app.\n<\/p>\n<h3>5. Show response<\/h3>\n<p>\n  The response from the LLM is displayed directly in the chat. If the prompt was augmented with additional data, the response will probably include some of that information, although the model may also decide to ignore information that it doesn\u2019t feel relevant.\n<\/p>\n<p>\n  The additional instructions added to the augmented prompt \u2013 <code>\"Summarize and provide date\/time and location if appropriate\"<\/code> \u2013 help the model give better answers by ensuring this information is included each time.\n<\/p>\n<p>\n  <img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/06\/word-image-3284-6.png\" class=\"wp-image-3290\" width=\"300\" srcset=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/06\/word-image-3284-6.png 703w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/06\/word-image-3284-6-244x300.png 244w\" sizes=\"(max-width: 703px) 100vw, 703px\" \/> \n <img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/06\/word-image-3284-5.png\" class=\"wp-image-3289\" width=\"300\" srcset=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/06\/word-image-3284-5.png 698w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/06\/word-image-3284-5-243x300.png 243w\" sizes=\"(max-width: 698px) 100vw, 698px\" \/>  \n<\/p>\n<p>You can download and try out the sample from the <a href=\"https:\/\/github.com\/conceptdev\/droidcon-sf-23\/\">droidcon-sf-23 repo<\/a> by adding your own <a href=\"https:\/\/platform.openai.com\/account\/api-keys\">OpenAI API key<\/a>.<\/p>\n<blockquote><p>NOTE: A number of shortcuts have been created to keep this demo simple \u2013 such as calculating embeddings on-the-fly, including the session metadata for embedding, picking an arbitrary 0.8 cutoff to test similarity, and probably other hacks. While it shows how easy it is to bring additional data to LLM responses, please look for better solutions if you take the next step and start incorporating LLM chat into your production apps.<\/p><\/blockquote>\n<h2>Resources and feedback<\/h2>\n<p>\n  The code for this sample and the others that were presented at droidcon SF 2023 is available on <a href=\"https:\/\/github.com\/conceptdev\/droidcon-sf-23\/\">GitHub<\/a>.\n<\/p>\n<p>\n  If you have any questions, use the <a href=\"http:\/\/aka.ms\/SurfaceDuoSDK-Feedback\">feedback forum<\/a> or message us on <a href=\"https:\/\/twitter.com\/surfaceduodev\">Twitter @surfaceduodev<\/a>.\n<\/p>\n<p>\n  There will be no livestream this week, but you can check out the <a href=\"https:\/\/youtube.com\/c\/surfaceduodev\">archives on YouTube<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Hello prompt engineers, A few weeks ago we finished a series of posts building an AI chatbot using the Jetchat sample with OpenAI. The sample uses the chat and image endpoints, but has the same limitation as many LLMs, which is that its knowledge is limited to the training data (for example, anything after September [&hellip;]<\/p>\n","protected":false},"author":570,"featured_media":3295,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[741],"tags":[734,692,729,733],"class_list":["post-3284","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai","tag-chatgpt","tag-jetpack-compose","tag-machine-learning","tag-openai"],"acf":[],"blog_post_summary":"<p>Hello prompt engineers, A few weeks ago we finished a series of posts building an AI chatbot using the Jetchat sample with OpenAI. The sample uses the chat and image endpoints, but has the same limitation as many LLMs, which is that its knowledge is limited to the training data (for example, anything after September [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/posts\/3284","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/users\/570"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/comments?post=3284"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/posts\/3284\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/media\/3295"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/media?parent=3284"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/categories?post=3284"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/tags?post=3284"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}