Surface Duo Blog

Responsible AI and content safety

Hello prompt engineers, This week we’re taking a break from code samples to highlight the general availability of Azure AI Content Safety. In this blog series we’ve touched briefly on the using prompt engineering to restrict the types of responses an LLM will provide, such as setting the system prompt to set boundaries on what ...

“Search the web” for up-to-date OpenAI chat responses

Hello prompt engineers, Over the course of this blog series, we have investigated different ways of augmenting the information available to an LLM when answering user queries, such as: However, there is still a challenge getting the model to answer with up-to-date “general information” (for example, if...

Android tokenizer for OpenAI

Hello prompt engineers, The past few weeks we’ve been extending JetchatAI’s sliding window which manages the size of the chat API calls to stay under the model’s token limit. The code we’ve written so far has used a VERY rough estimate for determining the number of tokens being used in our LLM requests: This very ...

Speech-to-speech conversing with OpenAI on Android

Hello prompt engineers, Just this week, OpenAI announced that their chat app and website can now ‘hear and speak’. In a huge coincidence (originally inspired by this Azure OpenAI speech to speech doc), we’ve added similar functionality to our Jetpack Compose LLM chat sample based on Jetchat. The screenshot below shows ...

Infinite chat with history embeddings

Hello prompt engineers, The last few posts have been about the different ways to create an ‘infinite chat’, where the conversation between the user and an LLM model is not limited by the token size limit and as much historical context as possible can be used to answer future queries. We previously covered: ...

“Infinite” chat with history summarization

Hello prompt engineers, A few weeks ago we talked about token limits on LLM chat APIs and how this prevents an infinite amount of history being remembered as context. A sliding window can limit the overall context size, and making the sliding window more efficient can help maximize the amount of context sent with each new chat query...

De-duplicating context in the chat sliding window

Hello prompt engineers, Last week’s post discussed the concept of a sliding window to keep recent context while preventing LLM chat prompts from exceeding the model’s token limit. The approach involved adding context to the prompt until we've reached the maximum number of tokens the model can accept, then ignoring any remaining ...

Infinite chat using a sliding window

Hello prompt engineers, There are a number of different strategies to support an ‘infinite chat’ using an LLM, required because large language models do not store ‘state’ across API requests and there is a limit to how large a single request can be. In this OpenAI community question on token limit differences in API ...

OpenAI tokens and limits

Hello prompt engineers, The Jetchat demo that we’ve been covering in this blog series uses the OpenAI Chat API, and in each blog post where we add new features, it supports conversations with a reasonable number of replies. However, just like any LLM request API, there are limits to the number of tokens that can be processed, and ...

Prompt engineering tips

Hello prompt engineers, We’ve been sharing a lot of OpenAI content the last few months, and because each blog post typically focuses on a specific feature or API, there’s often smaller learnings or discoveries that don’t get mentioned or highlighted. In this blog we’re sharing a few little tweaks that we discovered ...