{"id":3524,"date":"2023-10-05T08:58:05","date_gmt":"2023-10-05T15:58:05","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/surface-duo\/?p=3524"},"modified":"2024-01-03T16:17:32","modified_gmt":"2024-01-04T00:17:32","slug":"android-openai-chatgpt-21","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/surface-duo\/android-openai-chatgpt-21\/","title":{"rendered":"Android tokenizer for OpenAI"},"content":{"rendered":"<p>\n  Hello prompt engineers,\n<\/p>\n<p>\n  The past few weeks we\u2019ve been extending JetchatAI\u2019s <a href=\"https:\/\/devblogs.microsoft.com\/surface-duo\/android-openai-chatgpt-16\/\">sliding window<\/a> which manages the size of the chat API calls to stay under the model\u2019s token limit. The code we\u2019ve written so far has used a VERY rough estimate for determining the number of tokens being used in our LLM requests:\n<\/p>\n<pre>val tokens = text.length \/ 4 \/\/ hack!<\/pre>\n<p>\n  This very simple approximation is used to calculate prompt sizes to support the sliding window and history summarization functions. Because it\u2019s not an accurate result, it\u2019s either inefficient or risks still exceeding the prompt token limit.\n<\/p>\n<p>\n  Turns out that there is an Android-compatible open-source tokenizer suitable for OpenAI &#8211; <a href=\"https:\/\/github.com\/knuddelsgmbh\/jtokkit\">JTokkit<\/a> &#8211; so this week we\u2019ll update the <code>Tokenizer<\/code> class to use this more accurate approach.\n<\/p>\n<h2>More accurate token counts<\/h2>\n<p>\n  Here are some strings from past blog posts, evaluated with the Java tokenizer and OpenAI\u2019s online version. The first example is shown here is a screenshot from the <a href=\"https:\/\/platform.openai.com\/tokenizer\">OpenAI website tokenizer<\/a> \u2013 it\u2019s a response from the LLM for the query <em>\u201ctell me about the golden gate bridge\u201d <\/em>with the tokens highlighted in alternating colors:\n<\/p>\n<p>\n  <img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/10\/a-screenshot-of-a-computer-description-automatica.png\" class=\"wp-image-3525\" alt=\"Example token breakdown of a string\" width=\"600\" srcset=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/10\/a-screenshot-of-a-computer-description-automatica.png 1056w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/10\/a-screenshot-of-a-computer-description-automatica-300x130.png 300w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/10\/a-screenshot-of-a-computer-description-automatica-1024x444.png 1024w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/10\/a-screenshot-of-a-computer-description-automatica-768x333.png 768w\" sizes=\"(max-width: 1056px) 100vw, 1056px\" \/>\n<\/p>\n<p>\n  Here\u2019s a table comparing the original \u201cdivide by 4\u201d estimate to the JTokkit open-source package for a few different context strings:\n<\/p>\n<table>\n<tr>\n<td>\n<p><strong>Content<\/strong><\/p>\n<\/td>\n<td>\n<p><strong>Original \u201c\/4\u201d estimate<\/strong><\/p>\n<\/td>\n<td>\n<p><strong>JTokkit GPT-3.5<\/strong><\/p>\n<\/td>\n<\/tr>\n<tr>\n<td>\n<code>The Golden Gate Bridge is an iconic suspension bridge located in San Francisco, California. It spans the Golden Gate Strait, the entrance to the San Francisco Bay from the Pacific Ocean. The bridge is known for its distinctive orange-red color and its majestic views of the city skyline, the bay, and the Pacific Ocean. It is considered one of the modern wonders of the world and is a popular tourist attraction. The bridge is also an important transportation route, connecting San Francisco to Marin County. It is approximately 1.7 miles long and can be crossed by vehicles, bicycles, and pedestrians.<\/code>\n<\/td>\n<td align=\"right\">\n<p>\n  150\n<\/p>\n<\/td>\n<td align=\"right\">\n<p>\n  117\n<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td><code>What's the weather in San Diego<\/code><\/td>\n<td align=\"right\">\n<p>\n  7\n<\/p>\n<\/td>\n<td align=\"right\">\n<p>\n  8\n<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td><code>Some popular places in San Francisco are:\n  1. Golden Gate Bridge\n  2. Alcatraz Island\n  3. Fisherman's Wharf\n  4. Lombard Street\n  5. Union Square\n  6. Chinatown\n  7. Golden Gate Park\n  8. Painted Ladies (Victorian houses)\n  9. Pier 39\n  10. Cable Cars\n<\/cde><\/td>\n<td align=\"right\">\n<p>\n  58\n<\/p>\n<\/td>\n<td align=\"right\">\n<p>\n  71\n<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td><code>currentWeather{\n    \"latitude\": \"37.773972\",\n    \"longitude\": \"-122.431297\"\n  }\n<\/code><\/td>\n<td align=\"right\">\n<p>\n  18\n<\/p>\n<\/td>\n<td align=\"right\">\n<p>\n  24\n<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td><code>{latitude:\"37.773972\",longitude:\"-122.431297\",temperature:\"68\",unit:\"F\",forecast:\"[This Afternoon, Sunny. High near 68, with temperatures falling to around 64 in the afternoon. West southwest wind around 14 mph.]\"}<\/code><\/td>\n<td align=\"right\">\n<p>\n  53\n<\/p>\n<\/td>\n<td align=\"right\">\n<p>\n  57\n<\/p>\n<\/td>\n<\/tr>\n<\/table>\n<p>\n  These examples show the JTokkit package calculation result often varies greatly from the previous code, and being based on the algorithm provided by OpenAI should be much more accurate.\n<\/p>\n<h2>Adding to the Jetchat-based sample<\/h2>\n<p>\n  In theory, we should be able to substitute our existing \u2018rough estimate\u2019 function with the more accurate Java package and not impact the functionality of the JetchatAI demo. Here\u2019s the <a href=\"https:\/\/github.com\/conceptdev\/droidcon-sf-23\/pull\/17\">PR with the code changes<\/a> \u2013 adding the package and then updating the <code>Tokenizer<\/code> class.\n<\/p>\n<p>\n  First add the package in <strong>build.gradle.kts<\/strong>:\n<\/p>\n<pre>implementation(\"com.knuddels:jtokkit:0.6.1\")\r\n<\/pre>\n<p>\n  Then follow the instructions in the <a href=\"https:\/\/jtokkit.knuddels.de\/docs\/getting-started\/usage\">JTokkit getting started docs<\/a> to update the <code>Tokenizer<\/code> class \u2013 both counting tokens and truncating a string based on token length:\n<\/p>\n<pre>class Tokenizer {\r\n    companion object {\r\n        var registry: EncodingRegistry = Encodings.newLazyEncodingRegistry()\r\n        fun countTokensIn (text: String?): Int {\r\n          if (text == null) return 0\r\n            val encoding: Encoding = registry.getEncodingForModel(ModelType.GPT_3_5_TURBO)\r\n            return encoding.countTokens(text)\r\n        }\r\n        fun trimToTokenLimit (text: String?, tokenLimit: Int): String? {\r\n            val encoding = registry.getEncoding(ModelType.GPT_3_5_TURBO)\r\n            val encoded = encoding.encodeOrdinary(text, tokenLimit)\r\n            if (encoded.isTruncated) {\r\n                return encoding.decode(encoded.tokens)\r\n            }\r\n            return text \/\/ wasn't truncated\r\n        }\r\n    }\r\n}\r\n<\/pre>\n<p>\n  The more accurate token count should improve the efficiency of the sliding window algorithm, so that more context is included in each API request without the risk of exceeding the token count. \n<\/p>\n<h2>Pricing<\/h2>\n<p>\n  Beside avoiding exceeding the token limit for requests, the other reason you might want to count tokens is to track usage \u2013 OpenAI APIs are charged per thousand tokens (see the <a href=\"https:\/\/openai.com\/pricing\">pricing page<\/a>). For example (in USD), the cheapest GPT-4 API costs 3 cents for 1K input tokens and 6 cents for 1K output tokens. The cheapest GPT-3.5 Turbo averages less than 1\/5<sup>th<\/sup> of a cent per 1K tokens \u2013 twenty to thirty times cheaper than GPT-4!\n<\/p>\n<p>\n  You might choose to calculate, log, and track token usage just during your research, development, and testing to get a sense of how much your service is going to cost. With appropriate permissions and safeguards you might also log and track token usage in production, where it could be used to create a per-user quota or rate-limiting system to prevent excessive costs or abuse.\n<\/p>\n<h2>Not <em>quite<\/em> there\u2026<\/h2>\n<p>\n  While we\u2019re now able to more accurately calculate the tokens in our user- and LLM-generated queries and responses, we cannot determine the <em>exact size<\/em> of the payload being processed by the model due to the specific formatting that occurs when a chat completion request is constructed. This <a href=\"https:\/\/jtokkit.knuddels.de\/docs\/getting-started\/recipes\/chatml\">JTokkit recipe<\/a> contains some tips on how to more accurately determine the total token count for a request if you\u2019d like to get even more accurate results.\n<\/p>\n<h2>Resources and feedback<\/h2>\n<p>\n  See the <a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/ai-services\/\">Azure OpenAI documentation<\/a> for more information on the wide variety of services available for your apps.\n<\/p>\n<p>\n  We\u2019d love your feedback on this post, including any tips or tricks you\u2019ve learning from playing around with ChatGPT prompts.\n<\/p>\n<p>\n  If you have any thoughts or questions, use the <a href=\"http:\/\/aka.ms\/SurfaceDuoSDK-Feedback\">feedback forum<\/a> or message us on <a href=\"https:\/\/twitter.com\/surfaceduodev\">Twitter @surfaceduodev<\/a>.\n<\/p>\n<p>\n  There will be no livestream this week, but you can check out the <a href=\"https:\/\/youtube.com\/c\/surfaceduodev\">archives on YouTube<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Hello prompt engineers, The past few weeks we\u2019ve been extending JetchatAI\u2019s sliding window which manages the size of the chat API calls to stay under the model\u2019s token limit. The code we\u2019ve written so far has used a VERY rough estimate for determining the number of tokens being used in our LLM requests: val tokens [&hellip;]<\/p>\n","protected":false},"author":570,"featured_media":3526,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[741],"tags":[734,733],"class_list":["post-3524","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai","tag-chatgpt","tag-openai"],"acf":[],"blog_post_summary":"<p>Hello prompt engineers, The past few weeks we\u2019ve been extending JetchatAI\u2019s sliding window which manages the size of the chat API calls to stay under the model\u2019s token limit. The code we\u2019ve written so far has used a VERY rough estimate for determining the number of tokens being used in our LLM requests: val tokens [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/posts\/3524","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/users\/570"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/comments?post=3524"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/posts\/3524\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/media\/3526"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/media?parent=3524"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/categories?post=3524"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/tags?post=3524"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}