{"id":3582,"date":"2023-11-03T20:28:10","date_gmt":"2023-11-04T03:28:10","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/surface-duo\/?p=3582"},"modified":"2024-01-03T16:05:19","modified_gmt":"2024-01-04T00:05:19","slug":"android-openai-chatgpt-24","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/surface-duo\/android-openai-chatgpt-24\/","title":{"rendered":"Document chat with OpenAI on Android"},"content":{"rendered":"<p>\n  Hello prompt engineers,\n<\/p>\n<p>\n  In last week\u2019s discussion on improving embedding efficiency, we mentioned the concept of \u201cchunking\u201d. Chunking is the process of breaking up a longer document (ie. too big to fit under a model\u2019s token limit) into smaller pieces of text, which will be used to generate embeddings for vector similarity comparisons with user queries (just like the <a href=\"https:\/\/devblogs.microsoft.com\/surface-duo\/android-openai-chatgpt-7\/\">droidcon conference session data<\/a>). \n<\/p>\n<p>\n  Inspired by this <a href=\"https:\/\/github.com\/azure-samples\/azure-search-openai-demo\">Azure Search OpenAI demo<\/a>, and also the fact that <a href=\"https:\/\/chat.openai.com\/\">ChatGPT<\/a> itself released a PDF-ingestion feature this week, we\u2019ve added a \u201cdocument chat\u201d feature to the <a href=\"https:\/\/github.com\/conceptdev\/droidcon-sf-23\/tree\/main\/Jetchat\">JetchatAI Android sample<\/a> app. To access the document chat demo, open <em>JetchatAI<\/em> and use the navigation panel to change to the <strong>#document-chat<\/strong> conversation:\n<\/p>\n<p>\n  <img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/11\/a-screenshot-of-a-chat-description-automatically.png\" class=\"wp-image-3583\" alt=\"Screenshot of JetchatAI on Android, showing the slide-out navigation panel\" width=\"300\" srcset=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/11\/a-screenshot-of-a-chat-description-automatically.png 667w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/11\/a-screenshot-of-a-chat-description-automatically-297x300.png 297w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/11\/a-screenshot-of-a-chat-description-automatically-150x150.png 150w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/11\/a-screenshot-of-a-chat-description-automatically-24x24.png 24w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/11\/a-screenshot-of-a-chat-description-automatically-48x48.png 48w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/11\/a-screenshot-of-a-chat-description-automatically-96x96.png 96w\" sizes=\"(max-width: 667px) 100vw, 667px\" \/><br\/><em>Figure 1: access the #document-chat<\/em>\n<\/p>\n<p>\n  To build the <strong>#document-chat<\/strong> we re-used a lot of code and added some PDF document content from an Azure <a href=\"https:\/\/github.com\/azure-samples\/azure-search-openai-demo\">chat sample<\/a>.\n<\/p>\n<h2>Code foundations<\/h2>\n<p>\n  In the <a href=\"https:\/\/github.com\/conceptdev\/droidcon-sf-23\/pull\/20\/files\">pull-request<\/a> for this feature, you\u2019ll see a number of new files that were cloned from existing code to create the <strong>#document-chat <\/strong>channel:\n<\/p>\n<ul>\n<li>\n    <code>DocumentChatWrapper<\/code> \u2013 sets the system prompt to guide the model to only answer \u201cContoso employee\u201d questions\n  <\/li>\n<li>\n    <code>DocumentDatabase<\/code> \u2013 functions to store the text chunks and embeddings in Sqlite so they are persisted across app restarts\n  <\/li>\n<li>\n    <code>AskDocumentFunction<\/code> \u2013 SQL generating function that can attempt searches on the text chunks in the database. Ideally, we would provide a semantic full-text search backend, but in this example only basic SQL text matching is supported.\n  <\/li>\n<\/ul>\n<p>\n  The bulk of this code is identical to the <em>droidcon conference<\/em> chat demo, except instead of a hardcoded database of session details, we needed to write new code to parse and store the content from PDF documents. This new code exists mainly in the <code>loadVectorCache<\/code> and <code>initVectorCache<\/code> functions (as well as a new column in the embeddings Sqlite database to hold the corresponding content).\n<\/p>\n<h2>Reading the source documents<\/h2>\n<p>\n  To create the data store, we used the test data associated with the Azure Search demo on <a href=\"https:\/\/github.com\/Azure-Samples\/azure-search-openai-demo\/tree\/main\/data\">GitHub<\/a>: six documents that describe the fictitious <em>Contoso<\/em> company\u2019s employee handbook and benefits. These are provided as PDFs, but to keep our demo simple I manually copied the text into .txt files which are added to the JetchatAI <code>raw<\/code> resources folder. This means we don\u2019t have to worry about PDF file format parsing, but can still play around with different ways of chunking the content.\n<\/p>\n<p>\n  The code to load these documents from the resources folder is shown in Figure 2:\n<\/p>\n<pre>  var documentId = -1\r\n  val rawResources = listOf(R.raw.benefit_options) \/\/ R.raw.employee_handbook, R.raw.perks_plus, R.raw.role_library, R.raw.northwind_standard_benefits_details, R.raw.northwind_health_plus_benefits_details\r\n  for (resId in rawResources) {\r\n      documentId++\r\n      val inputStream = context.resources.openRawResource(resId)\r\n      val documentText = inputStream.bufferedReader().use { it.readText() }<\/pre>\n<p><em>Figure 2: loading the source document contents<\/em>\n<\/p>\n<p>\n  Once we\u2019ve loaded the contents of each document, we need to break it up before creating embeddings that can be used to match against user queries (and ultimately answer their questions with retrieval augmented generation). \n<\/p>\n<h2>Chunking the documents<\/h2>\n<p>\n  This explanation of <a href=\"https:\/\/www.pinecone.io\/learn\/chunking-strategies\/\">chunking strategies<\/a> outlines some of the considerations and methods for breaking up text to use for RAG-style LLM interactions. For our initial implementation we are going to take a very simplistic approach, which is to create an embedding for each sentence:\n<\/p>\n<pre>   val documentSentences = documentText.split(Regex(\"[.!?]\\\\s*\"))\r\n   var sentenceId = -1\r\n   for (sentence in documentSentences){\r\n       if (sentence.isNotEmpty()){\r\n           sentenceId++\r\n           val embeddingRequest = EmbeddingRequest(\r\n               model = ModelId(Constants.OPENAI_EMBED_MODEL),\r\n               input = listOf(sentence)\r\n           )\r\n           val embedding = openAI.embeddings(embeddingRequest)\r\n           val vector = embedding.embeddings[0].embedding.toDoubleArray()\r\n           \/\/ add to in-memory cache\r\n           vectorCache[\"$documentId-$sentenceId\"] = vector\r\n           documentCache[\"$documentId-$sentenceId\"] = sentence<\/pre>\n<p><em>Figure 3: uses regex to break into sentences and creates\/stores an embedding vector for each sentence<\/em>\n<\/p>\n<p>\n  Although this is the simplest chunking method, there are some drawbacks:\n<\/p>\n<ul>\n<li>\n    Headings and short sentences probably don\u2019t have enough information to make useful prompt grounding.\n  <\/li>\n<li>\n    Longer sentences might still lack context that would help the model answer questions accurately.\n  <\/li>\n<\/ul>\n<p>\n  Even so, short embeddings like this can be functional, as shown in the next section.\n<\/p>\n<blockquote><p>NOTE: The app needs to parse and generate embeddings for ALL the documents before it can answer any user queries. Generating the embeddings can take a few minutes because of the large number of embedding API requests required. Be prepared to wait the first time you use the demo if parsing all six source files. Alternatively, changing the <code>rawResources<\/code> array to only load a single document (like <code>R.raw.benefit_options<\/code>) will start faster and still be able to answer basic questions (as shown in the examples below). The app saves the embeddings to Sqlite so subsequent executions will be faster (unless the Sqlite schema is changed or the app is deleted and re-installed).<\/p><\/blockquote>\n<h2>Document answers from embeddings and SQL search<\/h2>\n<p>\n  With just this relatively minor change to our existing chat code (and adding the embedded files), we can ask fictitious employee questions (similar to those shown in the <a href=\"https:\/\/github.com\/azure-samples\/azure-search-openai-demo\">Azure Search OpenAI demo<\/a>):\n<\/p>\n<p>\n  <img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/11\/a-screenshot-of-a-chat-description-automatically-1.png\" class=\"wp-image-3584\" alt=\"Screenshot of JetchatAI with questions and answers about loaded documents\" width=\"600\" srcset=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/11\/a-screenshot-of-a-chat-description-automatically-1.png 1895w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/11\/a-screenshot-of-a-chat-description-automatically-1-243x300.png 243w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/11\/a-screenshot-of-a-chat-description-automatically-1-831x1024.png 831w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/11\/a-screenshot-of-a-chat-description-automatically-1-768x946.png 768w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/11\/a-screenshot-of-a-chat-description-automatically-1-1247x1536.png 1247w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/11\/a-screenshot-of-a-chat-description-automatically-1-1662x2048.png 1662w\" sizes=\"(max-width: 1895px) 100vw, 1895px\" \/><br\/><em>Figure 4: Ask questions about documents in JetchatAI<\/em>\n<\/p>\n<p>\n  These two example queries are discussed below, showing the text chunks that are used for grounding. \n<\/p>\n<h3>\u201cdoes my plan cover annual eye exams\u201d<\/h3>\n<p>\n  The first test user query returns ten chunks where the vector similarity score was above the arbitrary <code>0.8<\/code> threshold. Figure 5 shows a selection of the matches (some removed for space), but you can also see that the grounding prompt has the introduction <code>The following information is extract from Contoso employee handbooks and help plans:<\/code> and instruction <code>Use the above information to answer the following question:<\/code> to guide the model when this is included in the prompt:\n<\/p>\n<pre>The following information is extract from Contoso employee handbooks and health plans:\r\n  \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \r\nComparison of Plans\r\nBoth plans offer coverage for routine physicals, well-child visits, immunizations, and other preventive care services\r\n\r\nThis plan also offers coverage for preventive care services, as well as prescription drug coverage\r\n\r\nNorthwind Health Plus offers coverage for vision exams, glasses, and contact lenses, as well as dental exams, cleanings, and fillings\r\n\r\nNorthwind Standard only offers coverage for vision exams and glasses\r\n\r\nBoth plans offer coverage for vision and dental services, as well as medical services\r\n\r\nUse the above information to answer the following question:<\/pre>\n<p><em>Figure 5: the grounding information for the user query \u201cdoes my plan have annual eye exams\u201d<\/em>\n<\/p>\n<p>\n  Because we have also registered the <code>AskDocumentFunction<\/code> an SQL query (Figure 6) is also generated for the query, however the exact phrase \u201cannual eye exam\u201d does not have any matches and no additional grounding is provided by the function call.\n<\/p>\n<pre>SELECT DISTINCT content FROM embedding WHERE content LIKE '%annual eye exams%'<\/pre>\n<p><em>Figure 6: text search is too specific and returns zero results<\/em>\n<\/p>\n<p>\n  The grounding in Figure 5 is enough for the model to answer the question with <strong>\u201cYes your plan covers annual eye exams\u201d<\/strong>.\n<\/p>\n<blockquote><p>Note that the user query mentioned \u201cmy plan\u201d, and the model\u2019s response asserts that \u201cyour plan covers\u2026\u201d, probably because in the grounding data the statements include \u201cBoth plans offer coverage\u2026\u201d. We have not provided any grounding on what plan the user is signed up for, but that could be another improvement (perhaps in the system prompt) that would help answer more accurately.<\/p><\/blockquote>\n<h3>\u201cwhat about dental\u201d<\/h3>\n<p>\n  The second test query only returns three chunks with a vector similarity score above <code>0.8<\/code> (shown in Figure 7)\n<\/p>\n<pre>The following information is extract from Contoso employee handbooks and health plans:\r\n  \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \u00a0 \r\nNorthwind Health Plus offers coverage for vision exams, glasses, and contact lenses, as well as dental exams, cleanings, and fillings\r\n\r\nBoth plans offer coverage for vision and dental services, as well as medical services\r\n\r\nBoth plans offer coverage for vision and dental services\r\n\r\nUse the above information to answer the following question:<\/pre>\n<p><em>Figure 7: the grounding information for the user query \u201cwhat about dental\u201d<\/em>\n<\/p>\n<p>\n  The model once again triggers the dynamic SQL function to perform a text search for \u201c%dental%\u201d, which returns the four matches shown in Figure 8.\n<\/p>\n<pre>SELECT DISTINCT content FROM embedding WHERE content LIKE '%dental%'\r\n -------\r\n\r\n[('Northwind Health Plus\r\nNorthwind Health Plus is a comprehensive plan that provides comprehensive coverage for medical, vision, and dental services')\r\n,('Northwind Standard Northwind Standard is a basic plan that provides coverage for medical, vision, and dental services')\r\n,('Both plans offer coverage for vision and dental services')\r\n,('Northwind Health Plus offers coverage for vision exams, glasses, and contact lenses, as well as dental exams, cleanings, and fillings')\r\n,('Both plans offer coverage for vision and dental services, as well as medical services')]<\/pre>\n<p><em>Figure 8: SQL function results for the user query \u201cwhat about dental?\u201d<\/em>\n<\/p>\n<p>\n  The chunks returned from the SQL query mostly overlap with the embeddings matches. The model uses this information to generate the response <strong>\u201cBoth plans offer coverage for dental services, including dental exams, cleanings, and fillings.\u201d<\/strong>\n<\/p>\n<blockquote><p>If you look closely at the grounding data, there\u2019s only evidence that the \u201cHealth Plus\u201d plan covers fillings (there is no explicit mention that the \u201cStandard\u201d plan offers anything beyond \u201cdental services\u201d). This means that the answer given <em>could<\/em> be giving misleading information about fillings being covered by both plans \u2013 it may be a reasonable assumption given the grounding, or it could fall into the \u2018hallucination\u2019 category. If the chunks were larger then the model might have more context to understand which features are associated with which plan.<\/p><\/blockquote>\n<p>\n  This example uses the simplest possible chunking strategy, and while some questions can be answered it\u2019s likely that a more sophisticated chunking strategy will support more accurate responses. In addition, including more information about the user could result in more personalized responses.\n<\/p>\n<h2>Resources and feedback<\/h2>\n<p>\n  Some additional samples that demonstrate building document chat services with more sophisticated search support:\n<\/p>\n<ul>\n<li><a href=\"https:\/\/techcommunity.microsoft.com\/t5\/azure-ai-services-blog\/revolutionize-your-enterprise-data-with-chatgpt-next-gen-apps-w\/ba-p\/3762087\">Revolutionize your Enterprise Data with ChatGPT: Next-gen Apps w\/ Azure OpenAI and Cognitive Search<\/a> is the blog post that introduces the Azure demo mentioned above.\n  <\/li>\n<li><a href=\"https:\/\/techcommunity.microsoft.com\/t5\/educator-developer-blog\/teach-chatgpt-to-answer-questions-using-azure-cognitive-search\/ba-p\/3969713\">Teach ChatGPT to Answer Questions: Using Azure Cognitive Search &amp; Azure OpenAI Services<\/a> to work with large files and large numbers of files as input for a ChatGPT question-answering service.\n  <\/li>\n<li><a href=\"https:\/\/techcommunity.microsoft.com\/t5\/startups-at-microsoft\/build-a-chatbot-to-query-your-documentation-using-langchain-and\/ba-p\/3833134\">Build a chatbot to query your documentation using Langchain and Azure OpenAI<\/a> for an example using LangChain.\n  <\/li>\n<\/ul>\n<p>\n  We\u2019d love your feedback on this post, including any tips or tricks you\u2019ve learned from playing around with ChatGPT prompts.\n<\/p>\n<p>\n  If you have any thoughts or questions, use the <a href=\"http:\/\/aka.ms\/SurfaceDuoSDK-Feedback\">feedback forum<\/a> or message us on <a href=\"https:\/\/twitter.com\/surfaceduodev\">Twitter @surfaceduodev<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Hello prompt engineers, In last week\u2019s discussion on improving embedding efficiency, we mentioned the concept of \u201cchunking\u201d. Chunking is the process of breaking up a longer document (ie. too big to fit under a model\u2019s token limit) into smaller pieces of text, which will be used to generate embeddings for vector similarity comparisons with user [&hellip;]<\/p>\n","protected":false},"author":570,"featured_media":3584,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[741],"tags":[734,733],"class_list":["post-3582","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai","tag-chatgpt","tag-openai"],"acf":[],"blog_post_summary":"<p>Hello prompt engineers, In last week\u2019s discussion on improving embedding efficiency, we mentioned the concept of \u201cchunking\u201d. Chunking is the process of breaking up a longer document (ie. too big to fit under a model\u2019s token limit) into smaller pieces of text, which will be used to generate embeddings for vector similarity comparisons with user [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/posts\/3582","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/users\/570"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/comments?post=3582"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/posts\/3582\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/media\/3584"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/media?parent=3582"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/categories?post=3582"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/tags?post=3582"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}