{"id":3632,"date":"2023-12-10T18:12:58","date_gmt":"2023-12-11T02:12:58","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/surface-duo\/?p=3632"},"modified":"2024-01-03T16:04:35","modified_gmt":"2024-01-04T00:04:35","slug":"android-openai-chatgpt-28","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/surface-duo\/android-openai-chatgpt-28\/","title":{"rendered":"OpenAI Assistant code interpreter on Android"},"content":{"rendered":"<p>\n  Hello prompt engineers,\n<\/p>\n<p>\n  Over the last few weeks, we\u2019ve looked at different aspects of the new OpenAI Assistant API, both <a href=\"https:\/\/devblogs.microsoft.com\/surface-duo\/openai-assistants\/\">prototyping in the playground<\/a> and using <a href=\"https:\/\/devblogs.microsoft.com\/surface-duo\/android-openai-chatgpt-27\/\">Kotlin in the JetchatAI sample<\/a>. In this post we\u2019re going to add the <a href=\"https:\/\/platform.openai.com\/docs\/assistants\/tools\/code-interpreter\">Code Interpreter<\/a> feature which allows the Assistants API to write and run Python code in a sandboxed execution environment. By using the code interpreter, chat interactions can solve complex math problems, code problems, read and parse data files, and output formatted data files and charts.\n<\/p>\n<p>\n  To keep with the theme of the last few examples, we are going to test the code interpreter with a simple math problem related to the fictitious Contoso health plans used in earlier posts.\n<\/p>\n<h2>Enabling the code interpreter<\/h2>\n<p>\n  The code interpreter is just a setting to be enabled, either in code or via the playground (depending on where you have set up your assistant). Figure 1 shows the Kotlin for creating an assistant using the code interpreter \u2013 including setting a very basic system prompt\/meta prompt\/instructions:\n<\/p>\n<pre>val assistant = openAI.assistant(\r\n    request = AssistantRequest(\r\n        name = \"doc chat\",\r\n        instructions = \"answer questions about health plans\",\r\n        tools = listOf(AssistantTool.CodeInterpreter), \/\/ enables the code interpreter\r\n        model = ModelId(\"gpt-4-1106-preview\")\r\n    )\r\n)<\/pre>\n<p><em>Figure 1: enabling the code interpreter in Kotlin when creating an <code>assistant<\/code><\/em>\n<\/p>\n<p>\n  As discussed in the first <a href=\"https:\/\/devblogs.microsoft.com\/surface-duo\/android-openai-chatgpt-27\/\">Kotlin assistant post<\/a>, <em>JetchatAI<\/em> loads an assistant definition that was configured in the OpenAI playground, so it\u2019s even easier to enable the <strong>Code interpreter<\/strong> by flipping this switch:\n<\/p>\n<p>\n  <img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/12\/word-image-3632-1.png\" class=\"wp-image-3633\" width=\"400\" srcset=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/12\/word-image-3632-1.png 658w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/12\/word-image-3632-1-300x117.png 300w\" sizes=\"(max-width: 658px) 100vw, 658px\" \/>\n<\/p>\n<p><em>Figure 2: enabling the code interpreter in the OpenAI playground<\/em>\n<\/p>\n<h2>Extending JetchatAI #assistant-chat<\/h2>\n<p>\n  With the code interpreter enabled, we can test it both interactively in the playground and in the <em>JetchatAI<\/em> app. The test question is <strong>\u201cif the health plan costs $1000 a month, what is deducted weekly from my paycheck?\u201d<\/strong>.\n<\/p>\n<p>\n  The playground output includes the code that was generated and executed in the interpreter \u2013 in Figure 3 you can see it creates variables from the values mentioned in the query and then a calculation that returns a value to the model, which is incorporated into the response to the user.\n<\/p>\n<p>\n  <img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/12\/screenshot-of-the-openai-assistant-playground-show.png\" class=\"wp-image-3634\" alt=\"Screenshot of the OpenAI Assistant playground showing a user query about how much $1000 a month is in weekly payments, along with the output from the code interpreter and the model's answer of approx $230 a week.\" width=\"500\" srcset=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/12\/screenshot-of-the-openai-assistant-playground-show.png 1292w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/12\/screenshot-of-the-openai-assistant-playground-show-300x202.png 300w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/12\/screenshot-of-the-openai-assistant-playground-show-1024x688.png 1024w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/12\/screenshot-of-the-openai-assistant-playground-show-768x516.png 768w\" sizes=\"(max-width: 1292px) 100vw, 1292px\" \/><br\/><em>Figure 3: Testing the code interpreter in the playground with a simple math question (using GPT-4)<\/em>\n<\/p>\n<p>\n  Notice that the code interpreter just returns a numeric answer, and the model decides to \u2018round up\u2019 to \u201c<strong>$230.77<\/strong>\u201d and format as a dollar amount.\n<\/p>\n<p>\n  When implementing the Assistant API in an app, the code interpreter step (or steps) would not be rendered (in the same way that you don\u2019t render function call responses), although they are available in the run\u2019s <code>step_details<\/code> data structure so you could still retrieve and display them to the user, or use for logging\/telemetry or some other purpose in your app.\n<\/p>\n<p>\n  <img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/12\/a-screenshot-of-a-chat-description-automatically.png\" class=\"wp-image-3635\" alt=\"\" width=\"500\" srcset=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/12\/a-screenshot-of-a-chat-description-automatically.png 1895w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/12\/a-screenshot-of-a-chat-description-automatically-300x194.png 300w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/12\/a-screenshot-of-a-chat-description-automatically-1024x661.png 1024w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/12\/a-screenshot-of-a-chat-description-automatically-768x496.png 768w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/12\/a-screenshot-of-a-chat-description-automatically-1536x991.png 1536w\" sizes=\"(max-width: 1895px) 100vw, 1895px\" \/><br\/><em>Figure 4: Testing the same query on Android (also using GPT-4)<\/em><\/p>\n<p>No code changes were made to the <a href=\"https:\/\/github.com\/conceptdev\/droidcon-sf-23\/tree\/main\/Jetchat\">JetchatAI Android sample<\/a> for this testing, it just needs the Assistant&#8217;s configuration updated as shown.<\/p>\n<\/p>\n<h2>Why add the code interpreter?<\/h2>\n<p>\n  One of the reasons that the code interpreter option exists is because LLMs \u201con their own\u201d can be terribly <a href=\"https:\/\/arxiv.org\/abs\/2305.18618\">bad at math<\/a>. Here are some examples using the same prompt on different models without a code interpreter to help:\n<\/p>\n<ul>\n<li>\n    GPT 3.5 Turbo playground \u2013 answers <strong>$250<\/strong>\n  <\/li>\n<li>\n    GPT 4 playground \u2013 answers <strong>$231.17<\/strong>\n  <\/li>\n<li>\n    ChatGPT \u2013 answers <strong>$231.18 <\/strong>\n  <\/li>\n<\/ul>\n<p>\n  The full response for each of these is shown below \u2013 I included the ChatGPT response separately because although it\u2019s likely using similar models to what I have access to in their playground, it has its own system prompt\/meta prompt which can affect how it approaches these types of problems. All three of these examples appear to be following some sort of <a href=\"https:\/\/arxiv.org\/abs\/2201.11903\">chain of thought<\/a> prompting, although they take different approaches. While some of these answers are close to the code interpreter result, that\u2019s probably not ideal when money\u2019s involved! How chat applications respond to queries with real-world implications (like financial advice, for example) is something that should be evaluated through the lens of responsible AI \u2013 remember this is just a fictitious example to test the LLM\u2019s math skills.\n<\/p>\n<h3>GPT 3.5 Turbo playground (without code interpreter)<\/h3>\n<p>\n  GPT 3.5 Turbo attempts to \u2018walk through\u2019 the calculation, but doesn\u2019t seem to understand all months will not necessarily contain exactly two pay days. At least the response includes a disclaimer to consult HR or payroll to verify!\n<\/p>\n<p>\n  <img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/12\/a-close-up-of-a-calculator-description-automatica.png\" class=\"wp-image-3636\" alt=\"\" width=\"500\" srcset=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/12\/a-close-up-of-a-calculator-description-automatica.png 1142w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/12\/a-close-up-of-a-calculator-description-automatica-300x148.png 300w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/12\/a-close-up-of-a-calculator-description-automatica-1024x506.png 1024w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/12\/a-close-up-of-a-calculator-description-automatica-768x379.png 768w\" sizes=\"(max-width: 1142px) 100vw, 1142px\" \/><br\/><em>Figure 5: output from gpt-3.5-turbo for the same query<\/em>\n<\/p>\n<p>\n  It\u2019s worth noting that <em>when<\/em> the code interpreter is enabled with gpt-3.5-turbo in the playground, it returns $230.77 (just like gpt-4 with the interpreter does in Figures 3 and 4).\n<\/p>\n<h3>GPT 4 playground (without code interpreter)<\/h3>\n<p>\n  Using the GPT 4 model results in a different set of steps to solve, but unlike the code interpreter which multiplies out the total cost first, this solution calculates the average number of weeks in a month. The first mistake is that value is 4.3(repeating), and by rounding to 4.33 it will get a slightly different answer. The second mistake is that $1000\/4.33= 230.9468822170901; however it returns a result of $231.17 which is about 22 cents different to what the rounded answer should be. It also includes a disclaimer and advice to confirm with HR or payroll.\n<\/p>\n<p>\n  <img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/12\/a-screenshot-of-a-questionnaire-description-autom.png\" class=\"wp-image-3637\" alt=\"\" width=\"500\" srcset=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/12\/a-screenshot-of-a-questionnaire-description-autom.png 1172w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/12\/a-screenshot-of-a-questionnaire-description-autom-300x139.png 300w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/12\/a-screenshot-of-a-questionnaire-description-autom-1024x474.png 1024w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/12\/a-screenshot-of-a-questionnaire-description-autom-768x355.png 768w\" sizes=\"(max-width: 1172px) 100vw, 1172px\" \/><br\/><em>Figure 6: output from gpt-4 for the same query<\/em>\n<\/p>\n<h2>ChatGPT (public chat)<\/h2>\n<p>\n  The public ChatGPT follows similar logic to the GPT-4 playground, although it has better math <em>rendering<\/em> skills. It makes the same two mistakes, first rounding an intermediate value in the calculation, and then still failing to divide 1000\/4.33 accurately.\n<\/p>\n<p>\n  <img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/12\/a-screenshot-of-a-calculator-description-automati.png\" class=\"wp-image-3638\" alt=\"\" width=\"500\" srcset=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/12\/a-screenshot-of-a-calculator-description-automati.png 1118w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/12\/a-screenshot-of-a-calculator-description-automati-282x300.png 282w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/12\/a-screenshot-of-a-calculator-description-automati-964x1024.png 964w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/12\/a-screenshot-of-a-calculator-description-automati-768x816.png 768w\" sizes=\"(max-width: 1118px) 100vw, 1118px\" \/><br\/><em>Figure 7: output from ChatGPT for the same query<\/em>\n<\/p>\n<h2>Summary<\/h2>\n<p>\n  OpenAI models without the code interpreter feature seem to have some trouble with mathematical questions, returning different answers for the same question depending on the model and possibly the system prompt and context. In the simple testing above, the code interpreter feature does a better job of calculating a reasonable answer, and can do so more consistently on both gpt-4 and gpt-3.5-turbo models.\n<\/p>\n<h2>Resources and feedback<\/h2>\n<p>\n  Refer to the <a href=\"https:\/\/openai.com\/blog\/new-models-and-developer-products-announced-at-devday\">OpenAI blog<\/a> for more details on the Dev Day announcements, and the <a href=\"https:\/\/github.com\/aallam\/openai-kotlin\/blob\/main\/guides\/Assistants.md\">openai-kotlin repo<\/a> for updates on support for the new features like the Assistant API.\u00a0\n<\/p>\n<p>\n  We\u2019d love your feedback on this post, including any tips or tricks you\u2019ve learned from playing around with ChatGPT prompts.\u00a0\n<\/p>\n<p>\n  If you have any thoughts or questions, use the <a href=\"http:\/\/aka.ms\/SurfaceDuoSDK-Feedback\">feedback forum<\/a> or message us on <a href=\"https:\/\/twitter.com\/surfaceduodev\">Twitter @surfaceduodev<\/a>.\u00a0<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Hello prompt engineers, Over the last few weeks, we\u2019ve looked at different aspects of the new OpenAI Assistant API, both prototyping in the playground and using Kotlin in the JetchatAI sample. In this post we\u2019re going to add the Code Interpreter feature which allows the Assistants API to write and run Python code in a [&hellip;]<\/p>\n","protected":false},"author":570,"featured_media":3634,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[741],"tags":[734,733],"class_list":["post-3632","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai","tag-chatgpt","tag-openai"],"acf":[],"blog_post_summary":"<p>Hello prompt engineers, Over the last few weeks, we\u2019ve looked at different aspects of the new OpenAI Assistant API, both prototyping in the playground and using Kotlin in the JetchatAI sample. In this post we\u2019re going to add the Code Interpreter feature which allows the Assistants API to write and run Python code in a [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/posts\/3632","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/users\/570"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/comments?post=3632"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/posts\/3632\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/media\/3634"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/media?parent=3632"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/categories?post=3632"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/tags?post=3632"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}