Surface Duo Blog

Infinite chat with history embeddings

Hello prompt engineers, The last few posts have been about the different ways to create an ‘infinite chat’, where the conversation between the user and an LLM model is not limited by the token size limit and as much historical context as possible can be used to answer future queries. We previously covered: ...

“Infinite” chat with history summarization

Hello prompt engineers, A few weeks ago we talked about token limits on LLM chat APIs and how this prevents an infinite amount of history being remembered as context. A sliding window can limit the overall context size, and making the sliding window more efficient can help maximize the amount of context sent with each new chat query...

De-duplicating context in the chat sliding window

Hello prompt engineers, Last week’s post discussed the concept of a sliding window to keep recent context while preventing LLM chat prompts from exceeding the model’s token limit. The approach involved adding context to the prompt until we've reached the maximum number of tokens the model can accept, then ignoring any remaining ...

Infinite chat using a sliding window

Hello prompt engineers, There are a number of different strategies to support an ‘infinite chat’ using an LLM, required because large language models do not store ‘state’ across API requests and there is a limit to how large a single request can be. In this OpenAI community question on token limit differences in API ...

OpenAI tokens and limits

Hello prompt engineers, The Jetchat demo that we’ve been covering in this blog series uses the OpenAI Chat API, and in each blog post where we add new features, it supports conversations with a reasonable number of replies. However, just like any LLM request API, there are limits to the number of tokens that can be processed, and ...

Prompt engineering tips

Hello prompt engineers, We’ve been sharing a lot of OpenAI content the last few months, and because each blog post typically focuses on a specific feature or API, there’s often smaller learnings or discoveries that don’t get mentioned or highlighted. In this blog we’re sharing a few little tweaks that we discovered ...

Dynamic Sqlite queries with OpenAI chat functions

Hello prompt engineers, Previous blogs explained how to add droidcon session favorites to a database and also cache the embedding vectors in a database – but what if we stored everything in a database and then let the model query it directly? The OpenAI Cookbook examples repo includes a section on how to call functions with...

Embedding vector caching (redux)

Hello prompt engineers, Earlier this year I tried to create a hardcoded cache of embedding vectors, only to be thwarted by the limitations of Kotlin (the combined size of the arrays of numbers exceeded Kotlin’s maximum function size). Now that we’ve added Sqlite to the solution to support memory and querying, we can use that ...

Chat memory with OpenAI functions

Hello prompt engineers, We first introduced OpenAI chat functions with a weather service and then a time-based conference sessions query. Both of those examples work well for ‘point in time’ queries or questions about a static set of data (e.g., the conference schedule). But each time the JetchatAI app is opened, it has ...

Combining OpenAI function calls with embeddings

Hello prompt engineers, Last week’s post introduced the OpenAI chat function calling to implement a live weather response. This week, we’ll look at how to use function calling to enhance responses when using embeddings to retrieve data isn’t appropriate. The starting point will be the droidcon SF sample we’ve covered ...