We interviewed SK team member Brian Krabach on the emerging concept of “Personas.” The idea was born from a need to support longer chat interactions, as most models forget the early parts of conversations as they progress. What did I say again? <smile> In SK Personas, a unique concept that’s been developed is “synthetic memories” — where the AI creates short and long term “memories” from the chat history. This provides more context for the AI to generate responses, to start. And it opens the door to even more adventures in memory-land.
SK: Semantic Kernel lets you build and test Plugins, which in turn are used by Planners. There’s also the “Personas” concept. Can you share a bit about how you came up with this “persona” idea in Semantic Kernel?
Brian Krabach: Sure. It started out as an exploration into how we could support long running chats. When using LLMs, like the Azure OpenAI GPT models, there is a limit to how much data you can pass in a prompt or list of messages. Since most chat experiences are built by providing chat history as the context for the model to “complete” a response as a bot, this means you can only pass so much chat history before you hit those limits. This is fine for casual conversation with an agent (just pass what you need and drop the rest), but it also means you cannot ask the agent about something that happened earlier in the conversation that no longer fits in that chat history “window”.
SK: Is there an analogy that can help more readers get your gist?
BK: Absolutely. I like to think about this in the context of the movie Memento. The main character had an accident where he could no longer make new memories. He could remember things up until his accident, but in new conversation if he talked long enough, he forgot where the conversation started or even how he got there. Similarly, agent experiences built using just this chat history “window” approach are aware of content from the model’s base training and the “recent” stuff in the history, but if you talk long enough, it forgets what was said earlier.
SK: Isn’t that what resolved today with what we call “RAG” or “Retrieval Augmented Generation”?
BK: Yes, but it didn’t have a name until recently <laughter>. The next place most folks turn to is to create the ability to retrieve past chat history messages to provide relevant information or context, along with some recent chat history, to allow for response generation. And this is indeed now called retrieval augmented-generation, or RAG for short. This may be done with traditional search methods or by extracting embeddings for those history messages. By storing embeddings for chat history messages, and then obtaining embeddings for a new chat request, the relatedness can be compared to determine which messages to bring back.
SK: How have you tweaked the RAG recipe to do more interesting things?
BK: Conventional RAG-ing works reasonably well when the user request has enough relevant content to perform the embedding search, but what if the user responds back with “sure” – about a question the agent just asked? One trick we’ve explored is using a call to the model to extract the user’s intent from the recent chat history. This allows us to convert a “sure” to something like “the user would like to brainstorm more ideas about cognitive architecture” or whatever the intent is. This provides much more to match against.
SK: Sure <laughter>
BK: As you can imagine, this starts to really improve the agent’s ability to retrieve the portions of chat history most related to the current request. That said, the chat history messages that come back create a view that is somewhat like doing a keyword search across all your emails and only seeing the snippets surrounding the keywords and having to understand how it all fits together, across all of the emails (or are they unrelated to each other). Now imagine doing this but looking at someone else’s emails – where you truly may not know the context of each snippet. Since the models are stateless and each completion performed by the model only has access to its training data and whatever you pass in a request, its “view” of the data is closer to that scenario. Even in this state, the models do a great job in filling in the blanks and providing an answer.
SK: That must feel like magic.
BK: Yeah, but while that can feel magical, it’s also one of the challenges. Since the model likes to provide “an” answer, it doesn’t always provide the “right” answer. More context can help, but then we’re back to our original issue of being limited on how much content we can pass. So, what do we do?
Think about our own recollection when it comes to conversations we’ve had with others. When we have just talked about something we may recall the exact words. The longer since the conversation, the more the details may become “fuzzy” – remembering ideas more than specific wording, etc. The exception is that certain verbatim words/phrases may be important enough that we remember them as-is, but in general, we start to consolidate those memories. What if we give our agents a similar capability?
SK: So that’s why “Personas” are so tied to memories.
BK: Exactly. One idea we decided to explore was to create “synthetic memories” from the chat history. We started by exploring using the model to extract “short term” and “long term” memories from recent chat history. In our first approach, we gave it very specific details to extract. It did this very well. Over time, we wanted to explore other variants on data to extract, so we used our own agent to help brainstorm those ideas. When we realized that it was better at determining the appropriate details than we were (and most importantly, what might be more/less important to that specific conversation), we changed our strategy. Instead of directing it as to what details we wanted to extract as memories, we crafted the prompts to inform the model of how we planned to later use these memories and to give it more autonomy to do what it felt was the right thing.
SK: What do you then do with these memories?
BK: Well, just like with chat history, we can search over them for relevant items and use them to build the context for our response generation. The difference, however, is now we have multiple altitudes of data from the chat history. In addition to the verbatim chat messages, we have these snippets that also capture some of the bigger picture ideas – things that span multiple chat messages, that start to summarize ideas, but also extracted, specific details. When all of this is used together (related memories, related chat history, and recent chat history), we now have a much more complete context that better connects the dots for the model to use for responses with less “fill in the gaps”.
SK: What’s the net outcome of this approach?
BK: Building a system with this approach makes the initial chat history “window” approach feel very broken by comparison, once you’ve had a conversation longer than a few dozen back and forth interactions and it forgets what either of you said before, contradicts its prior statements, etc. There are still lots of additional areas to explore to take this idea further, but this is a great step to consider and provides a foundation for so much more. That’s where short-term and long-term memory come in. Short-term + long-term memory “stores” provide certain “perspectives” on the conversation – what happens if you also do the same with “associative”, “episodic”, “procedural”, and other types of memories**? Stay tuned to future releases of Semantic Kernel to find out!
About Brian Krabach?
Brian has spent most of his entire career building startups, primarily in tech, but also in other areas such as gaming, ministry, tattooing, and airbrushing. He is passionate about exploring and inventing new ideas, and is a challenge-driven, creative problem solver. Brian’s been working with OpenAI models for the past 2+ years and had to work through its earliest limitations, having to create systems or chains of calls to solve more complex prompts, to now leveraging that experience to build larger systems to solve even more challenging scenarios.
0 comments