{"id":794,"date":"2023-07-20T13:52:28","date_gmt":"2023-07-20T20:52:28","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/semantic-kernel\/?p=794"},"modified":"2023-07-20T14:47:41","modified_gmt":"2023-07-20T21:47:41","slug":"semantic-kernel-personas-an-interview-with-sk-team-member-brian-krabach","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/agent-framework\/semantic-kernel-personas-an-interview-with-sk-team-member-brian-krabach\/","title":{"rendered":"Semantic Kernel Personas: An Interview with SK Team Member Brian Krabach"},"content":{"rendered":"<p>We interviewed SK team member Brian Krabach on the emerging concept of &#8220;Personas.&#8221; The idea was born from a need to support longer chat interactions, as most models forget the early parts of conversations as they progress. What did I say again? &lt;smile&gt; In SK Personas, a unique concept that&#8217;s been developed is &#8220;synthetic memories&#8221; \u2014 where the AI creates short and long term &#8220;memories&#8221; from the chat history. This provides more context for the AI to generate responses, to start. And it opens the door to even more adventures in memory-land.<\/p>\n<hr \/>\n<p><strong>SK:<\/strong> Semantic Kernel lets you build and test Plugins, which in turn are used by Planners. There&#8217;s also the &#8220;Personas&#8221; concept. Can you share a bit about how you came up with this &#8220;persona&#8221; idea in Semantic Kernel?<\/p>\n<p><strong>Brian Krabach:<\/strong> Sure. It started out as an exploration into how we could support long running chats. When using LLMs, like the Azure OpenAI GPT models, there is a limit to how much data you can pass in a prompt or list of messages.\u00a0 Since most chat experiences are built by providing chat history as the context for the model to \u201ccomplete\u201d a response as a bot, this means you can only pass so much chat history before you hit those limits.\u00a0 This is fine for casual conversation with an agent (just pass what you need and drop the rest), but it also means you cannot ask the agent about something that happened earlier in the conversation that no longer fits in that chat history \u201cwindow\u201d.<\/p>\n<p><strong>SK:<\/strong> Is there an analogy that can help more readers get your gist?<\/p>\n<p><strong>BK:<\/strong> Absolutely. I like to think about this in the context of the movie Memento.\u00a0 The main character had an accident where he could no longer make new memories.\u00a0 He could remember things up until his accident, but in new conversation if he talked long enough, he forgot where the conversation started or even how he got there.\u00a0 Similarly, agent experiences built using just this chat history \u201cwindow\u201d approach are aware of content from the model\u2019s base training and the \u201crecent\u201d stuff in the history, but if you talk long enough, it forgets what was said earlier.<\/p>\n<p><strong>SK:<\/strong> Isn&#8217;t that what resolved today with what we call &#8220;RAG&#8221; or &#8220;Retrieval Augmented Generation&#8221;?<\/p>\n<p><strong>BK:<\/strong> Yes, but it didn&#8217;t have a name until recently &lt;laughter&gt;. The next place most folks turn to is to create the ability to retrieve past chat history messages to provide relevant information or context, along with some recent chat history, to allow for response generation.\u00a0 And this is indeed now called retrieval augmented-generation, or RAG for short.\u00a0 This may be done with traditional search methods or by extracting embeddings for those history messages.\u00a0 By storing embeddings for chat history messages, and then obtaining embeddings for a new chat request, the relatedness can be compared to determine which messages to bring back.<\/p>\n<p><strong>SK:<\/strong> How have you tweaked the RAG recipe to do more interesting things?<\/p>\n<p>BK: Conventional RAG-ing works reasonably well when the user request has enough relevant content to perform the embedding search, but what if the user responds back with \u201csure\u201d \u2013 about a question the agent just asked?\u00a0 One trick we\u2019ve explored is using a call to the model to extract the user\u2019s intent from the recent chat history.\u00a0 This allows us to convert a \u201csure\u201d to something like \u201cthe user would like to brainstorm more ideas about cognitive architecture\u201d or whatever the intent is.\u00a0 This provides much more to match against.<\/p>\n<p><strong>SK:<\/strong> <em>Sure<\/em> &lt;laughter&gt;<\/p>\n<p><strong>BK:<\/strong> As you can imagine, this starts to really improve the agent\u2019s ability to retrieve the portions of chat history most related to the current request.\u00a0 That said, the chat history messages that come back create a view that is somewhat like doing a keyword search across all your emails and only seeing the snippets surrounding the keywords and having to understand how it all fits together, across all of the emails (or are they unrelated to each other).\u00a0 Now imagine doing this but looking at someone else\u2019s emails \u2013 where you truly may not know the context of each snippet.\u00a0 Since the models are stateless and each completion performed by the model <em>only<\/em> has access to its training data and whatever you pass in a request, its \u201cview\u201d of the data is closer to that scenario.\u00a0 Even in this state, the models do a great job in filling in the blanks and providing an answer.<\/p>\n<p><strong>SK:<\/strong> That must feel like magic.<\/p>\n<p><strong>BK:<\/strong> Yeah, but while that can feel magical, it\u2019s also one of the challenges.\u00a0 Since the model likes to provide \u201can\u201d answer, it doesn\u2019t always provide the \u201cright\u201d answer.\u00a0 More context can help, but then we\u2019re back to our original issue of being limited on how much content we can pass.\u00a0 So, what do we do?<\/p>\n<p>Think about our own recollection when it comes to conversations we\u2019ve had with others.\u00a0 When we have just talked about something we may recall the exact words.\u00a0 The longer since the conversation, the more the details may become \u201cfuzzy\u201d &#8211; remembering ideas more than specific wording, etc.\u00a0 The exception is that certain verbatim words\/phrases may be important enough that we remember them as-is, but in general, we start to consolidate those memories.\u00a0 What if we give our agents a similar capability?<\/p>\n<p><strong>SK:<\/strong> So that&#8217;s why &#8220;Personas&#8221; are so tied to memories.<\/p>\n<p><strong>BK:<\/strong> Exactly. One idea we decided to explore was to create \u201csynthetic memories\u201d from the chat history.\u00a0 We started by exploring using the model to extract \u201cshort term\u201d and \u201clong term\u201d memories from recent chat history.\u00a0 In our first approach, we gave it very specific details to extract.\u00a0 It did this very well.\u00a0 Over time, we wanted to explore other variants on data to extract, so we used our own agent to help brainstorm those ideas.\u00a0 When we realized that it was better at determining the appropriate details than we were (and most importantly, what might be more\/less important to that specific conversation), we changed our strategy.\u00a0 Instead of directing it as to what details we wanted to extract as memories, we crafted the prompts to inform the model of how we planned to <em>later<\/em> use these memories and to give it more autonomy to do what it felt was the right thing.<\/p>\n<p><strong>SK:<\/strong> What do you then do with these memories?<\/p>\n<p><strong>BK:<\/strong> Well, just like with chat history, we can search over them for relevant items and use them to build the context for our response generation.\u00a0 The difference, however, is now we have multiple altitudes of data from the chat history.\u00a0 In addition to the verbatim chat messages, we have these snippets that also capture some of the bigger picture ideas \u2013 things that span multiple chat messages, that start to summarize ideas, but also extracted, specific details.\u00a0 When all of this is used <em>together<\/em> (related memories, related chat history, and recent chat history), we now have a much more complete context that better connects the dots for the model to use for responses with less \u201cfill in the gaps\u201d.<\/p>\n<p><strong>SK:<\/strong> What&#8217;s the net outcome of this approach?<\/p>\n<p><strong>BK:<\/strong> Building a system with this approach makes the initial chat history \u201cwindow\u201d approach feel very broken by comparison, once you\u2019ve had a conversation longer than a few dozen back and forth interactions and it forgets what either of you said before, contradicts its prior statements, etc. There are still lots of additional areas to explore to take this idea further, but this is a great step to consider and provides a foundation for so much more. That&#8217;s where short-term and long-term memory come in. Short-term + long-term memory \u201cstores\u201d provide certain \u201cperspectives\u201d on the conversation \u2013 what happens if you also do the same with \u201cassociative\u201d, \u201cepisodic\u201d, \u201cprocedural\u201d, and other types of memories<a href=\"https:\/\/www.psychologytoday.com\/us\/basics\/memory\/types-memory\">**<\/a>? Stay tuned to future releases of Semantic Kernel to find out!<\/p>\n<h3>About Brian Krabach?<\/h3>\n<p>Brian has spent most of his entire career building startups, primarily in tech, but also in other areas such as gaming, ministry, tattooing, and airbrushing. He is passionate about exploring and inventing new ideas, and is a challenge-driven, creative problem solver. Brian&#8217;s been working with OpenAI models for the past 2+ years and had to work through its earliest limitations, having to create systems or chains of calls to solve more complex prompts, to now leveraging that experience to build larger systems\u00a0 to solve even more challenging scenarios.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>We interviewed SK team member Brian Krabach on the emerging concept of &#8220;Personas.&#8221; The idea was born from a need to support longer chat interactions, as most models forget the early parts of conversations as they progress. What did I say again? &lt;smile&gt; In SK Personas, a unique concept that&#8217;s been developed is &#8220;synthetic memories&#8221; [&hellip;]<\/p>\n","protected":false},"author":111267,"featured_media":819,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[15],"tags":[],"class_list":["post-794","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-interviews"],"acf":[],"blog_post_summary":"<p>We interviewed SK team member Brian Krabach on the emerging concept of &#8220;Personas.&#8221; The idea was born from a need to support longer chat interactions, as most models forget the early parts of conversations as they progress. What did I say again? &lt;smile&gt; In SK Personas, a unique concept that&#8217;s been developed is &#8220;synthetic memories&#8221; [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-json\/wp\/v2\/posts\/794","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-json\/wp\/v2\/users\/111267"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-json\/wp\/v2\/comments?post=794"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-json\/wp\/v2\/posts\/794\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-json\/wp\/v2\/media\/819"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-json\/wp\/v2\/media?parent=794"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-json\/wp\/v2\/categories?post=794"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-json\/wp\/v2\/tags?post=794"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}