{"id":3428,"date":"2023-08-24T11:12:18","date_gmt":"2023-08-24T18:12:18","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/surface-duo\/?p=3428"},"modified":"2024-01-03T16:22:00","modified_gmt":"2024-01-04T00:22:00","slug":"android-openai-chatgpt-15","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/surface-duo\/android-openai-chatgpt-15\/","title":{"rendered":"OpenAI tokens and limits"},"content":{"rendered":"<p>\n  Hello prompt engineers,\n<\/p>\n<p>\n  The Jetchat demo that we\u2019ve been <a href=\"https:\/\/devblogs.microsoft.com\/surface-duo\/android-openai-chatgpt-13\/\">covering in this blog series<\/a> uses the OpenAI Chat API, and in each blog post where we add new features, it supports conversations with a reasonable number of replies. However, just like any LLM request API, there are limits to the number of tokens that can be processed, and the APIs are stateless meaning that all context needed for a given request must be included in the prompt.\n<\/p>\n<p>\n  This means that each chat request and response gets added to the conversation history, and the whole history is sent to the API after each new input so that the context can be used to give the best answer. Eventually the number of tokens in the combined chat history will exceed your model\u2019s limit (eg. ChatGPT 3.5 originally had a 4,096 token limit)\u2026 note that the limit for a given API request is the combination of the prompt AND the completion, so if the prompt (including chat history) is 3,000 tokens, the completion cannot be more than around 1000 tokens.\n<\/p>\n<p>\n  Even if your model has a higher limit, eventually an ongoing conversation is going to run out of tokens. \n<\/p>\n<p>\n  In the next few weeks we\u2019ll discuss strategies for continuing a chat \u201cconversation\u201d beyond the token limit, starting with a discussion of tokens in this post.\n<\/p>\n<h2>What are tokens?<\/h2>\n<p>\n  To understand what \u201ctokens\u201d are, read <a href=\"https:\/\/help.openai.com\/en\/articles\/4936856-what-are-tokens-and-how-to-count-them\">what are tokens and how to count them<\/a> in the OpenAI documentation. You can think about tokens as roughly analogous to a single word in English, although that\u2019s more true for simple\/common words, and other words might consist of multiple tokens. Tokens may also be punctuation and could include spaces. Tokens can also be non-English characters.\n<\/p>\n<p>\n  OpenAI model token limits are shown on the <a href=\"https:\/\/platform.openai.com\/docs\/models\/overview\">model overview<\/a>. The default for <code>gpt-3.5<\/code> was originally 4,096 tokens however newer models are available with up to 32,768 tokens. You\u2019ll also see that pricing is based on token usage, so more tokens costs more money!\n<\/p>\n<h2>Visualizing tokens<\/h2>\n<p>\n  The <a href=\"https:\/\/platform.openai.com\/tokenizer\">OpenAI tokenizer<\/a> can help you to visualize how your prompt is broken down into tokens \u2013 for the most accurate count you should represent the prompt syntax (including your system prompt) exactly as the API expects (including any JSON formatting).\n<\/p>\n<p>\n  As an example, here is a chat interaction with the Jetchat demo app where the model\u2019s response is grounded in the system prompt:\n<\/p>\n<p>\n  <img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/08\/screenshot-of-jetchat-with-a-question-about-sessio.png\" class=\"wp-image-3429\" alt=\"Screenshot of Jetchat with a question about sessions that is answered from the system prompt\" width=\"450\" srcset=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/08\/screenshot-of-jetchat-with-a-question-about-sessio.png 1024w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/08\/screenshot-of-jetchat-with-a-question-about-sessio-300x146.png 300w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/08\/screenshot-of-jetchat-with-a-question-about-sessio-768x375.png 768w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/>\n<\/p>\n<p>\n  For reference, this is what the underlying chat API structure would be:\n<\/p>\n<pre>{\r\n\"messages\": [\r\n{\"role\": \"system\", \"content\": \"You are a personal assistant called JetchatAI.\r\nYou will answer questions about the speakers and sessions at the droidcon SF conference.\r\nThe conference is on June 8th and 9th, 2023 on the UCSF campus in Mission Bay. It starts at 8am and finishes by 6pm.\r\nYour answers will be short and concise, since they will be required to fit on a mobile device display.\r\nWhen showing session information, always include the subject, speaker, location, and time. \r\nONLY show the description when responding about a single session. Only use the functions you have been provided with.\"},\r\n{\"role\": \"user\", \"content\": \"what sessions are on now?\"},\r\n]\r\n}\r\n<\/pre>\n<p>\n  Visualized on the tokenizer, the chat prompt is only 166 tokens (the bulk of which is the system prompt):\n<\/p>\n<p>\n  <img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/08\/tokenizer-visualization-for-the-json-system-and-us.png\" class=\"wp-image-3430\" alt=\"Tokenizer visualization for the JSON system and user message that makes up the earlier screenshot for the question &quot;what sessions are on now&quot;. The data uses 166 tokens.\" width=\"550\" srcset=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/08\/tokenizer-visualization-for-the-json-system-and-us.png 1532w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/08\/tokenizer-visualization-for-the-json-system-and-us-300x201.png 300w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/08\/tokenizer-visualization-for-the-json-system-and-us-1024x685.png 1024w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/08\/tokenizer-visualization-for-the-json-system-and-us-768x514.png 768w\" sizes=\"(max-width: 1532px) 100vw, 1532px\" \/>\n<\/p>\n<p>\n  And the model\u2019s completion response is only 30 tokens.\n<\/p>\n<p>\n  <img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/08\/word-image-3428-3.png\" class=\"wp-image-3431\" width=\"550\" srcset=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/08\/word-image-3428-3.png 1546w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/08\/word-image-3428-3-300x58.png 300w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/08\/word-image-3428-3-1024x199.png 1024w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/08\/word-image-3428-3-768x150.png 768w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/08\/word-image-3428-3-1536x299.png 1536w\" sizes=\"(max-width: 1546px) 100vw, 1546px\" \/>\n<\/p>\n<p>\n  If subsequent user inputs and responses were roughly the same size (around 60 tokens), the chat could contain 65 interactions with the user before the 4,096 token limit was reached and the API would no longer respond. While 65 questions seems like a lot (!), enhancing your chat with embeddings or functions can consume tokens too meaning the chat can &#8220;end&#8221; much sooner than expected.\n<\/p>\n<h2>Tokens and embeddings<\/h2>\n<p>\n  Using embeddings is a great way to include custom data in a model\u2019s response (or even just data more up-to-date than the model\u2019s training).\n<\/p>\n<p>\n  In the <a href=\"https:\/\/devblogs.microsoft.com\/surface-duo\/android-openai-chatgpt-7\/\">JetchatAI gets smarter with embeddings<\/a> blog post we showed how to add context to a prompt by matching the embedding for a user\u2019s question against an up-to-date dataset (in the demo, a list of conference sessions). Here\u2019s an example query that uses embeddings to generate the correct result:\n<\/p>\n<p>\n  <img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/08\/screenshot-of-jetchat-with-a-question-are-there-a.png\" class=\"wp-image-3432\" alt=\"Screenshot of Jetchat with a question &quot;are there any sessions on AI&quot; that requires embeddings to answer\" width=\"450\" srcset=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/08\/screenshot-of-jetchat-with-a-question-are-there-a.png 1061w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/08\/screenshot-of-jetchat-with-a-question-are-there-a-300x170.png 300w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/08\/screenshot-of-jetchat-with-a-question-are-there-a-1024x579.png 1024w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/08\/screenshot-of-jetchat-with-a-question-are-there-a-768x434.png 768w\" sizes=\"(max-width: 1061px) 100vw, 1061px\" \/>\n<\/p>\n<p>\n  Although the user\u2019s question is only a few words, notice how the API call below contains the details of conference session that was retrieved by comparing the embedding value of the query with all the conference sessions:\n<\/p>\n<pre>{\r\n\"messages\": [\r\n{\"role\": \"system\", \"content\": \"You are a personal assistant called JetchatAI.\r\nYou will answer questions about the speakers and sessions at the droidcon SF conference.\r\nThe conference is on June 8th and 9th, 2023 on the UCSF campus in Mission Bay.It starts at 8am and finishes by 6pm.\r\nYour answers will be short and concise, since they will be required to fit on a mobile device display.\r\nWhen showing session information, always include the subject, speaker, location, and time. \r\nONLY show the description when responding about a single session. Only use the functions you have been provided with.\"},\r\n{\"role\": \"user\", \"content\": \"Following are some talks\/sessions scheduled for the droidcon San Francisco conference in June 2023:\r\n                                                                                                \r\nSpeaker: CRAIG DUNN\r\nRole: Software Engineer at Microsoft\r\nLocation: Robertson 1\r\nDate: 2023-06-09\r\nTime: 16:30\r\nSubject: AI for Android on- and off-device\r\nDescription: AI and ML bring powerful new features to app developers, for processing text, images, audio, video, and more. In this session we\u2019ll compare and contrast the opportunities available with on-device models using ONNX and the ChatGPT model running in the cloud.\r\n\r\nUse the above information to answer the following question. Summarize and provide date\/time and location if appropriate.\r\n\r\nare there any sessions on AI?\"}\r\n]}<\/pre>\n<p>\n  The tokenizer visualization is shown below \u2013 this single interaction is now 511 tokens for the request and 77 for the response: 588 tokens in total. \n<\/p>\n<p>\n  <img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/08\/tokenizer-visualization-for-the-json-system-and-us-1.png\" class=\"wp-image-3433\" alt=\"Tokenizer visualization for the JSON system and user message that makes up the earlier screenshot. The data uses 511 tokens.\" width=\"550\" srcset=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/08\/tokenizer-visualization-for-the-json-system-and-us-1.png 1399w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/08\/tokenizer-visualization-for-the-json-system-and-us-1-300x230.png 300w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/08\/tokenizer-visualization-for-the-json-system-and-us-1-1024x785.png 1024w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/08\/tokenizer-visualization-for-the-json-system-and-us-1-768x588.png 768w\" sizes=\"(max-width: 1399px) 100vw, 1399px\" \/>\n<\/p>\n<p>\n  <img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/08\/word-image-3428-6.png\" class=\"wp-image-3434\" width=\"550\" srcset=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/08\/word-image-3428-6.png 1407w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/08\/word-image-3428-6-300x89.png 300w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/08\/word-image-3428-6-1024x303.png 1024w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/08\/word-image-3428-6-768x227.png 768w\" sizes=\"(max-width: 1407px) 100vw, 1407px\" \/>\n<\/p>\n<p>\n  While 558 tokens still seems small compared to a 4,096 token limit, tokens could be \u201cused up\u201d a lot more quickly if there are multiple embeddings matching the user\u2019s question. For example, the query <strong>\u201cis there a talk about gradle\u201d<\/strong> returns <em>six<\/em> embeddings matches including the session details and description.\n<\/p>\n<p>\n  <img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/08\/screenshot-of-jetchat-with-the-question-is-ther-a.png\" class=\"wp-image-3435\" alt=\"Screenshot of Jetchat with the question &quot;is ther a talk about gradle&quot; which will require embeddings to answer\" width=\"300\" srcset=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/08\/screenshot-of-jetchat-with-the-question-is-ther-a.png 428w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/08\/screenshot-of-jetchat-with-the-question-is-ther-a-300x86.png 300w\" sizes=\"(max-width: 428px) 100vw, 428px\" \/>\n<\/p>\n<p>\n  The full API request is omitted for clarity, but it\u2019s much larger than the previous examples, at nearly 2000 tokens it\u2019s almost half the original 4,096 token limit!\n<\/p>\n<p>\n  <img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/08\/tokenizer-visualization-for-the-json-system-and-us-2.png\" class=\"wp-image-3436\" alt=\"Tokenizer visualization for the JSON system and user message that makes up the earlier screenshot. Including the embeddings data uses up 1,967 tokens.\" width=\"550\" srcset=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/08\/tokenizer-visualization-for-the-json-system-and-us-2.png 1409w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/08\/tokenizer-visualization-for-the-json-system-and-us-2-300x227.png 300w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/08\/tokenizer-visualization-for-the-json-system-and-us-2-1024x776.png 1024w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/08\/tokenizer-visualization-for-the-json-system-and-us-2-768x582.png 768w\" sizes=\"(max-width: 1409px) 100vw, 1409px\" \/>\n<\/p>\n<p>\n  While newer OpenAI models can have limits up to 32k tokens, queries that require a lot of embedding context will quickly fill up the token limit after just a few interactions.\n<\/p>\n<h2>Tokens and functions<\/h2>\n<p>\n  Declaring functions as part of your OpenAI chat API uses up tokens in a different way, more like the system prompt. The blog post <a href=\"https:\/\/devblogs.microsoft.com\/surface-duo\/android-openai-chatgpt-9-functions\/\">OpenAI chat functions on Android<\/a> shows how to add functions in Kotlin using a client library to abstract away the JSON syntax, but under the hood it is still adding tokens to your chat API requests.\n<\/p>\n<p>\n  Here is an example user query that triggers the weather function:\n<\/p>\n<p>\n  <img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/08\/screenshot-of-jetchat-with-the-question-what-is-t.png\" class=\"wp-image-3437\" alt=\"Screenshot of Jetchat with the question &quot;what is the weather like in SF&quot; which will require a function to answer\" width=\"400\" srcset=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/08\/screenshot-of-jetchat-with-the-question-what-is-t.png 1226w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/08\/screenshot-of-jetchat-with-the-question-what-is-t-300x93.png 300w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/08\/screenshot-of-jetchat-with-the-question-what-is-t-1024x317.png 1024w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/08\/screenshot-of-jetchat-with-the-question-what-is-t-768x238.png 768w\" sizes=\"(max-width: 1226px) 100vw, 1226px\" \/>\n<\/p>\n<p>\n  And here is the an example of the kind of API request that\u2019s sent:\n<\/p>\n<pre>{\r\n\"messages\": [\r\n{\"role\": \"system\", \"content\": \"You are a personal assistant called JetchatAI.\r\nYour answers will be short and concise, since they will be required to fit on a mobile device display.\r\nOnly use the functions you have been provided with.\"},\r\n{\"role\": \"user\", \"content\": \"what's the weather like in SF\"}\r\n],\r\n\"functions\":[\r\n{\r\n\"name\": \"currentWeather\",\r\n\"description\": \"Get the current weather in a given location\",\r\n\"parameters\": {\r\n    \"type\": \"object\",\r\n    \"properties\": {\r\n        \"latitude\": {\r\n            \"type\": \"string\",\r\n            \"description\": \"The latitude of the requested location, e.g. 37.773972 for San Francisco, CA\",\r\n        },\r\n        \"longitude\": {\r\n            \"type\": \"string\",\r\n            \"description\": \"The longitude of the requested location, e.g. -122.431297 for San Francisco, CA\",\r\n        },\r\n        \"unit\": {\"type\": \"string\", \"enum\": [\"celsius\", \"fahrenheit\"]},\r\n    },\r\n    \"required\": [\"latitude\",\"longitude\"],\r\n}\r\n],\r\n\"function_call\": \"auto\"\r\n}<\/pre>\n<p>\n  The functions JSON uses 274 tokens of the 360 total tokens in this request:\n<\/p>\n<p>\n  <img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/08\/tokenizer-visualization-for-the-json-system-and-us-3.png\" class=\"wp-image-3438\" alt=\"Tokenizer visualization for the JSON system and user message that makes up the earlier screenshot - the functions declaration means the data uses 360 tokens.\" width=\"550\" srcset=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/08\/tokenizer-visualization-for-the-json-system-and-us-3.png 1405w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/08\/tokenizer-visualization-for-the-json-system-and-us-3-300x230.png 300w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/08\/tokenizer-visualization-for-the-json-system-and-us-3-1024x784.png 1024w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/08\/tokenizer-visualization-for-the-json-system-and-us-3-768x588.png 768w\" sizes=\"(max-width: 1405px) 100vw, 1405px\" \/>\n<\/p>\n<p>\n  Functions also use tokens in another way \u2013 there are <a href=\"https:\/\/openai.com\/blog\/function-calling-and-other-api-updates\">two more interactions<\/a> with the model \u2013 where it \u201ccalls\u201d the function and your code responds with the result before the final model completion that\u2019s displayed to the user.\n<\/p>\n<pre>{\"role\": \"assistant\", \"content\": null, \"function_call\": {\"name\": \"currentWeather\", \"arguments\": \"{ \\\"latitude\\\": \\\"37.773972\\\", \\\"longitude\\\": \\\"-122.431297\\\"}\"}},\r\n{\"role\": \"function\", \"name\": \"currentWeather\", \"content\": \"{\\\"temperature\\\": \"73\", \\\"unit\\\": \\\"fahrenheit\\\", \\\"description\\\": \\\"Mostly sunny. High near 73, with temperatures falling to around 68 in the afternoon.\\\"}\"}<\/pre>\n<p>\n  For the weather result these messages are short \u2013 only 113 tokens \u2013 but as with the embeddings example above, if the function returns a large chunk of text as a result, it will eat into your token limit.\n<\/p>\n<p>\n  <img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/08\/tokenizer-visualization-for-the-json-system-and-us-4.png\" class=\"wp-image-3439\" alt=\"Tokenizer visualization for the JSON system and user message that makes up the earlier screenshot. The extra function messages use an additional 113 tokens.\" width=\"550\" srcset=\"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/08\/tokenizer-visualization-for-the-json-system-and-us-4.png 1403w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/08\/tokenizer-visualization-for-the-json-system-and-us-4-300x101.png 300w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/08\/tokenizer-visualization-for-the-json-system-and-us-4-1024x346.png 1024w, https:\/\/devblogs.microsoft.com\/surface-duo\/wp-content\/uploads\/sites\/53\/2023\/08\/tokenizer-visualization-for-the-json-system-and-us-4-768x259.png 768w\" sizes=\"(max-width: 1403px) 100vw, 1403px\" \/>\n<\/p>\n<p>\n  Finally, you may declare more than one function, such as in the blog post <a href=\"https:\/\/devblogs.microsoft.com\/surface-duo\/android-openai-chatgpt-10\/\">Combining OpenAI function calls with embeddings<\/a>. The examples in that blog post use tokens for both function declarations, as well as potential embeddings matches, and the function results can be verbose since they\u2019ll contain wordy session descriptions.\n<\/p>\n<h2>Building an infinite chat\u2026<\/h2>\n<p>\n  While simple LLM queries can be supported for a large number of chat interactions, grounding responses with embeddings or functions can quickly use up the token limits for your model. Once the request size exceeds the limit, the chat will not return further responses.\n<\/p>\n<p>\n  To get around the limit we need strategies to give as much context as possible to the model without just blindly sending the entire chat history each time.\n<\/p>\n<p>\n  Approaches to solve this include:\n<\/p>\n<ul>\n<li>\n    Sliding window (first in first out) \n  <\/li>\n<li>\n    Summarization\n  <\/li>\n<li>\n    Embeddings\n  <\/li>\n<\/ul>\n<p>\n  Over the coming weeks we\u2019ll dive deeper into these approaches.\n<\/p>\n<h2>Resources and feedback<\/h2>\n<p>\n  Here are some links to discussions about chat API usage on the OpenAI developer community forum:\n<\/p>\n<ul>\n<li><a href=\"https:\/\/community.openai.com\/t\/i-wish-that-when-using-the-gpt-api-it-would-be-possible-to-have-a-contextual-conversation-like-chatgpt\/141785\/10\">I wish that when using the GPT API, it would be possible to have a contextual conversation like chatGPT<\/a> \n  <\/li>\n<li><a href=\"https:\/\/community.openai.com\/t\/multi-turn-conversation-best-practice\/282349\/13\">Multi-turn conversation best practice<\/a>\n  <\/li>\n<li><a href=\"https:\/\/community.openai.com\/t\/openai-api-chat-completion-pruning-methods\/85237\">OpenAI API: chat completion pruning methods<\/a>\n  <\/li>\n<\/ul>\n<p>\n  Note that in all the JSON above the <code>\"model\": \"gpt-3.5-turbo-0613\"<\/code> model specification argument as been omitted for clarity.\n<\/p>\n<p>\n  We\u2019d love your feedback on this post, including any tips or tricks you\u2019ve learning from playing around with ChatGPT prompts.\n<\/p>\n<p>\n  If you have any thoughts or questions, use the <a href=\"http:\/\/aka.ms\/SurfaceDuoSDK-Feedback\">feedback forum<\/a> or message us on <a href=\"https:\/\/twitter.com\/surfaceduodev\">Twitter @surfaceduodev<\/a>.\n<\/p>\n<p>\n  There will be no livestream this week, but you can check out the <a href=\"https:\/\/youtube.com\/c\/surfaceduodev\">archives on YouTube<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Hello prompt engineers, The Jetchat demo that we\u2019ve been covering in this blog series uses the OpenAI Chat API, and in each blog post where we add new features, it supports conversations with a reasonable number of replies. However, just like any LLM request API, there are limits to the number of tokens that can [&hellip;]<\/p>\n","protected":false},"author":570,"featured_media":3430,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[741],"tags":[734,733],"class_list":["post-3428","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai","tag-chatgpt","tag-openai"],"acf":[],"blog_post_summary":"<p>Hello prompt engineers, The Jetchat demo that we\u2019ve been covering in this blog series uses the OpenAI Chat API, and in each blog post where we add new features, it supports conversations with a reasonable number of replies. However, just like any LLM request API, there are limits to the number of tokens that can [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/posts\/3428","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/users\/570"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/comments?post=3428"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/posts\/3428\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/media\/3430"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/media?parent=3428"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/categories?post=3428"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/surface-duo\/wp-json\/wp\/v2\/tags?post=3428"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}