{"id":4273,"date":"2025-03-03T12:02:05","date_gmt":"2025-03-03T20:02:05","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/semantic-kernel\/?p=4273"},"modified":"2025-03-03T12:02:05","modified_gmt":"2025-03-03T20:02:05","slug":"guest-blog-llmagentops-toolkit-for-semantic-kernel","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/agent-framework\/guest-blog-llmagentops-toolkit-for-semantic-kernel\/","title":{"rendered":"Guest Blog: LLMAgentOps Toolkit for Semantic Kernel"},"content":{"rendered":"<p>Today the Semantic Kernel team is excited to welcome a guest author, Prabal Deb to share his work.<\/p>\n<p><a href=\"https:\/\/github.com\/Azure-Samples\/llm-agent-ops-toolkit-sk\"><span data-contrast=\"none\">LLMAgentOps\u202fToolkit<\/span><\/a><span data-contrast=\"auto\"> is repository that contains basic structure of LLM Agent based application built on top of the Semantic Kernel Python version. The toolkit is designed to be a starting point for data scientists and developers for experimentation to evaluation and finally deploy to production their own LLM Agent based applications.<\/span><span data-ccp-props=\"{}\">\u00a0<\/span><\/p>\n<h3 aria-level=\"1\"><span data-contrast=\"none\">Architecture<\/span><span data-ccp-props=\"{&quot;134245418&quot;:true,&quot;134245529&quot;:true,&quot;201341983&quot;:0,&quot;335559738&quot;:320,&quot;335559739&quot;:0,&quot;335559740&quot;:240}\">\u00a0<\/span><\/h3>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/semantic-kernel\/wp-content\/uploads\/sites\/78\/2025\/02\/Screenshot-2025-02-25-153758.png\"><img decoding=\"async\" class=\"alignnone wp-image-4326 size-large\" src=\"https:\/\/devblogs.microsoft.com\/semantic-kernel\/wp-content\/uploads\/sites\/78\/2025\/02\/Screenshot-2025-02-25-153758-1024x431.png\" alt=\"Image Screenshot 2025 02 25 153758\" width=\"1024\" height=\"431\" srcset=\"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-content\/uploads\/sites\/78\/2025\/02\/Screenshot-2025-02-25-153758-1024x431.png 1024w, https:\/\/devblogs.microsoft.com\/agent-framework\/wp-content\/uploads\/sites\/78\/2025\/02\/Screenshot-2025-02-25-153758-300x126.png 300w, https:\/\/devblogs.microsoft.com\/agent-framework\/wp-content\/uploads\/sites\/78\/2025\/02\/Screenshot-2025-02-25-153758-768x323.png 768w, https:\/\/devblogs.microsoft.com\/agent-framework\/wp-content\/uploads\/sites\/78\/2025\/02\/Screenshot-2025-02-25-153758.png 1413w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/a><\/p>\n<p>The\u00a0LLMAgentOps\u00a0architecture might be constructed using the following key components divided into two phases like DevOps \/ MLOps \/ LLMOps development and deployment phases:<\/p>\n<ul>\n<li><strong>LLM Agent Development Phase (inner loop)<\/strong>:\n<ul>\n<li>Agent Architecture: Designing the agent architecture for the LLM Agent based solution. For this sample we have used\u00a0Semantic Kernel\u00a0development kit by using\u00a0Python\u00a0programming language.<\/li>\n<li>Experimentation &amp; Evaluation: Experimentation and Evaluation of the LLM Agent based solution. Where the experimentation is done using\u00a0console\u00a0or\u00a0UI\u00a0or in\u00a0batch\u00a0mode and evaluation is done using\u00a0LLM as Judge\u00a0and\u00a0Human Evaluation.<\/li>\n<\/ul>\n<\/li>\n<li><strong>LLM Agent Deployment Phase (outer loop)<\/strong>:\n<ul>\n<li>GitHub Actions: Continuous Integration, Evaluation and Deployment of the LLM Agent based solution with addition of\u00a0<strong>Continuous Security<\/strong>\u00a0for security checks of the LLM Agents.<\/li>\n<li>Deployment: Deployment of the LLM Agent based solution in\u00a0local\u00a0or\u00a0cloud\u00a0environment.<\/li>\n<li>Monitoring: Monitoring the LLM Agent based solution for data collection, performance and other metrics.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<h1>Source Code Structure<\/h1>\n<p>The\u00a0source code\u00a0of LLMAgentOps application might be structured in such a way that it can be easily developed and maintained by data scientists and developers together with following key concepts of dividing the code into two parts &#8211;\u00a0core\u00a0and\u00a0ops:<\/p>\n<ul>\n<li><strong>Core<\/strong>: The LLM Agent core implementation code.\n<ul>\n<li><strong>Agents Base Class<\/strong>: The\u00a0base class\u00a0for the agents.<\/li>\n<li><strong>Agents<\/strong>: All the\u00a0agents\u00a0with their specific prompts and descriptions. Example:\u00a0Observe Agent.<\/li>\n<li><strong>Code Execute Agent (optional)<\/strong>: The\u00a0code execute agent\u00a0is an agent that can join the group of agents, but it will execute the code and return the result, instead of using LLM for generating response like other agents.<\/li>\n<li><strong>Group Chat Selection Logic<\/strong>: The\u00a0group chat selection logic\u00a0is used to select the appropriate next agent based on the current state of the conversation.<\/li>\n<li><strong>Group Chat Termination Logic<\/strong>: The\u00a0group chat termination logic\u00a0is used to terminate the conversation based on the current state of the conversation or maximum number of turns.<\/li>\n<li><strong>Group Chat<\/strong>: The\u00a0group chat\u00a0contains the group chat client that can serve the conversation between the user and the agents.<\/li>\n<\/ul>\n<\/li>\n<li><strong>Ops<\/strong>: The operational code for the LLM Agent based solution.\n<ul>\n<li><strong>Observability<\/strong>: The\u00a0observability code\u00a0contains the code for logging and monitoring the agents. OpenTelemetry\u00a0can be used for logging and monitoring.<\/li>\n<li><strong>Application Specific Codes (optional)<\/strong>: There could application specific code, like code for interacting with database or code for integrating with other systems.<\/li>\n<li><strong>Deployment<\/strong>: The deployment code contains the code for deploying the agents in local or cloud environment. In this sample the code is provided for deploying the agents in Azure Web App Service. The deployment code will be:\n<ul>\n<li>Source Module: core implementation of the agents and group chat.<\/li>\n<li>REST API Based App: REST API based app for calling the agents and getting the response.<\/li>\n<li>Dockefile: for building the image of the entire application.<\/li>\n<li>Requirements\u00a0file for the dependencies.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li><strong>Experimentation: <\/strong>The code related to performing experimentation in Console\u00a0or\u00a0User Interface\u00a0or in\u00a0Batch\u00a0mode.<\/li>\n<li><strong>Evaluation: <\/strong>The evaluation related code for\u00a0LLM as Judge\u00a0and\u00a0Human Evaluation.<\/li>\n<li><strong>Security:<\/strong> The code for the security checks of the LLM Agent based solution.<\/li>\n<\/ul>\n<h1>Experimentation<\/h1>\n<p>The\u00a0experimentation\u00a0setup by using\u00a0could be more complex and granular. The experimentation process may involve starting from defining the problem =&gt; data collection =&gt; LLM agent design =&gt; experiments. In this toolkit we have demonstrated how experiments can be done using three following modes:<\/p>\n<ol>\n<li><strong>Experimentation with Console<\/strong><\/li>\n<\/ol>\n<p>The\u00a0Console\u00a0based experimentation involves running the LLM Agent in the console and interacting with it using a text-based interface.<\/p>\n<p><strong>Note<\/strong>: The console-based experimentation is for initial\u00a0exploration\u00a0stage and does not store any data for evaluation.<\/p>\n<p>Sample experiment:\n<a href=\"https:\/\/devblogs.microsoft.com\/semantic-kernel\/wp-content\/uploads\/sites\/78\/2025\/03\/Screenshot-2025-02-25-164956.png\"><img decoding=\"async\" class=\"alignnone wp-image-4328 size-full\" src=\"https:\/\/devblogs.microsoft.com\/semantic-kernel\/wp-content\/uploads\/sites\/78\/2025\/03\/Screenshot-2025-02-25-164956.png\" alt=\"Image Screenshot 2025 02 25 164956\" width=\"2028\" height=\"434\" srcset=\"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-content\/uploads\/sites\/78\/2025\/03\/Screenshot-2025-02-25-164956.png 2028w, https:\/\/devblogs.microsoft.com\/agent-framework\/wp-content\/uploads\/sites\/78\/2025\/03\/Screenshot-2025-02-25-164956-300x64.png 300w, https:\/\/devblogs.microsoft.com\/agent-framework\/wp-content\/uploads\/sites\/78\/2025\/03\/Screenshot-2025-02-25-164956-1024x219.png 1024w, https:\/\/devblogs.microsoft.com\/agent-framework\/wp-content\/uploads\/sites\/78\/2025\/03\/Screenshot-2025-02-25-164956-768x164.png 768w, https:\/\/devblogs.microsoft.com\/agent-framework\/wp-content\/uploads\/sites\/78\/2025\/03\/Screenshot-2025-02-25-164956-1536x329.png 1536w\" sizes=\"(max-width: 2028px) 100vw, 2028px\" \/><\/a><\/p>\n<p>2.<strong>Experimentation with User Interface (UI)<\/strong><\/p>\n<p>The user interface drive experimentation can be achieved using\u00a0<a href=\"https:\/\/github.com\/Chainlit\/chainlit\">Chainlit<\/a> where the LLM Agents can be interacted using a conversational interface. The Chainlit provides a more interactive and user-friendly experience for the LLM Agents based solution. It not only stores the conversation data for evaluation but also provides a way to provide\u00a0Human Feedback\u00a0for the overall conversation and individual agents.<\/p>\n<p><strong>Note<\/strong>: The UI based experimentation is\u00a0experiment\u00a0stage and it can be provided to\u00a0Human Evaluator\u00a0for feedback. The collected data can be viewed by opening the\u00a0SQLite file in an SQLite browser.<\/p>\n<p>Sample experiment:<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/semantic-kernel\/wp-content\/uploads\/sites\/78\/2025\/03\/Screenshot-2025-02-25-170449.jpg\"><img decoding=\"async\" class=\"alignnone wp-image-4329 size-full\" src=\"https:\/\/devblogs.microsoft.com\/semantic-kernel\/wp-content\/uploads\/sites\/78\/2025\/03\/Screenshot-2025-02-25-170449.jpg\" alt=\"Image Screenshot 2025 02 25 170449\" width=\"768\" height=\"800\" srcset=\"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-content\/uploads\/sites\/78\/2025\/03\/Screenshot-2025-02-25-170449.jpg 768w, https:\/\/devblogs.microsoft.com\/agent-framework\/wp-content\/uploads\/sites\/78\/2025\/03\/Screenshot-2025-02-25-170449-288x300.jpg 288w, https:\/\/devblogs.microsoft.com\/agent-framework\/wp-content\/uploads\/sites\/78\/2025\/03\/Screenshot-2025-02-25-170449-24x24.jpg 24w\" sizes=\"(max-width: 768px) 100vw, 768px\" \/><\/a><\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/semantic-kernel\/wp-content\/uploads\/sites\/78\/2025\/03\/Screenshot-2025-02-25-170553.jpg\"><img decoding=\"async\" class=\"alignnone wp-image-4330 size-full\" src=\"https:\/\/devblogs.microsoft.com\/semantic-kernel\/wp-content\/uploads\/sites\/78\/2025\/03\/Screenshot-2025-02-25-170553.jpg\" alt=\"Image Screenshot 2025 02 25 170553\" width=\"645\" height=\"800\" srcset=\"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-content\/uploads\/sites\/78\/2025\/03\/Screenshot-2025-02-25-170553.jpg 645w, https:\/\/devblogs.microsoft.com\/agent-framework\/wp-content\/uploads\/sites\/78\/2025\/03\/Screenshot-2025-02-25-170553-242x300.jpg 242w\" sizes=\"(max-width: 645px) 100vw, 645px\" \/><\/a><\/p>\n<p>3. <strong>Experimentation in Batch<\/strong><\/p>\n<p>Batch-based experimentation involves running the LLM Agent in batch mode to process many queries and evaluate the solution&#8217;s performance. The batch experimentation can be used to evaluate the accuracy, efficiency, and scalability of the LLM Agents based solution.<\/p>\n<p><strong>Note<\/strong>: The batch-based experimentation is for\u00a0continuous evaluation\u00a0stage, and it can be used to evaluate the performance of the LLM Agents based solution.<\/p>\n<h1>Evaluation<\/h1>\n<p>Evaluation is the process of evaluating the performance of the LLM Agents based solution, that will help in the decision-making process of the LLM Agents based solution. In this toolkit we have demonstrated how evaluation can be done in two following modes:<\/p>\n<ol>\n<li><strong>Human Evaluation<\/strong><\/li>\n<\/ol>\n<p>The\u00a0Human Evaluation\u00a0is the process of evaluating the performance of the LLM Agents based solution by providing the conversational interface to the\u00a0Human Evaluator. The\u00a0Human Evaluator\u00a0will interact with the LLM Agents using a conversational interface and provide feedback on the overall conversation and individual agents.<\/p>\n<p>This can be achieved by running the Experiment in the UI mode and providing the\u00a0Chainlit\u00a0based interface to the\u00a0Human Evaluator.<\/p>\n<p>Sample Evaluation:\n<a href=\"https:\/\/devblogs.microsoft.com\/semantic-kernel\/wp-content\/uploads\/sites\/78\/2025\/02\/Picture7.png\"><img decoding=\"async\" class=\"alignnone wp-image-4278 size-full\" src=\"https:\/\/devblogs.microsoft.com\/semantic-kernel\/wp-content\/uploads\/sites\/78\/2025\/02\/Picture7.png\" alt=\"Image Picture7\" width=\"653\" height=\"477\" srcset=\"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-content\/uploads\/sites\/78\/2025\/02\/Picture7.png 653w, https:\/\/devblogs.microsoft.com\/agent-framework\/wp-content\/uploads\/sites\/78\/2025\/02\/Picture7-300x219.png 300w\" sizes=\"(max-width: 653px) 100vw, 653px\" \/><\/a><\/p>\n<p>2. <strong>LLM as Judge<\/strong><\/p>\n<p>The LLM as Judge is the process of evaluating the performance of the LLM Agents using another LLM Agent as a judge. For this evaluations Azure AI Foundry Service can be used. For more details, refer to the <a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/ai-studio\/how-to\/develop\/evaluate-sdk\">documentation<\/a>.<\/p>\n<p>This can be achieved by running the experiment in the batch mode.<\/p>\n<p>Sample Azure AI Foundry Evaluation Result:<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/semantic-kernel\/wp-content\/uploads\/sites\/78\/2025\/03\/Screenshot-2025-02-25-170318.jpg\"><img decoding=\"async\" class=\"alignnone wp-image-4331 size-full\" src=\"https:\/\/devblogs.microsoft.com\/semantic-kernel\/wp-content\/uploads\/sites\/78\/2025\/03\/Screenshot-2025-02-25-170318.jpg\" alt=\"Image Screenshot 2025 02 25 170318\" width=\"800\" height=\"334\" srcset=\"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-content\/uploads\/sites\/78\/2025\/03\/Screenshot-2025-02-25-170318.jpg 800w, https:\/\/devblogs.microsoft.com\/agent-framework\/wp-content\/uploads\/sites\/78\/2025\/03\/Screenshot-2025-02-25-170318-300x125.jpg 300w, https:\/\/devblogs.microsoft.com\/agent-framework\/wp-content\/uploads\/sites\/78\/2025\/03\/Screenshot-2025-02-25-170318-768x321.jpg 768w\" sizes=\"(max-width: 800px) 100vw, 800px\" \/><\/a><\/p>\n<h1>Security Scanning<\/h1>\n<p>Security Scanning is the process of ensuring the security of the LLM Agents based solution. Agents are going to write \/ execute code, browse the web, and interact with databases, hence security is a key concern and must be designed and implemented from the beginning.<\/p>\n<p>The security scan of LLM Agents can be performed using following tools:<\/p>\n<ul>\n<li><a href=\"https:\/\/azure.github.io\/PyRIT\/\">Risk Identification Tool for generative AI (PyRIT)<\/a><\/li>\n<li><a href=\"https:\/\/llm-guard.com\/\">LLM Guard &#8211; The Security Toolkit for LLM Interactions<\/a><\/li>\n<\/ul>\n<p>In this toolkit we have demonstrated how to enable <strong>Continuous Security Scan<\/strong> using LLM Guard.<\/p>\n<p>Sample Security Scanning Result:<\/p>\n<p><em>==================Summary=====================<\/em><\/p>\n<p><em>Agent Error avg score for scan BanTopics: 0.6<\/em><\/p>\n<p><em>Agent Observe avg score for scan BanTopics: 1.0<\/em><\/p>\n<p><em>Agent Verify avg score for scan BanTopics: 0.8<\/em><\/p>\n<p><em>Agent Select avg score for scan BanTopics: 1.0<\/em><\/p>\n<p><em>Overall avg score for scan BanTopics: 0.85<\/em><\/p>\n<p><em>===============================================<\/em><\/p>\n<h1>Engineering Fundamentals<\/h1>\n<p>&nbsp;<\/p>\n<p>The following engineering fundamentals must be considered while designing and developing the LLM Agent based solution:<\/p>\n<ol>\n<li><a href=\"https:\/\/github.com\/features\/actions\">GitHub Actions<\/a>: \u00a0for continuous integration, continuous evaluation, continuous deployment and continuous security scanning of the LLM Agent based solution.<\/li>\n<li><a href=\"https:\/\/code.visualstudio.com\/docs\/devcontainers\/containers\">Dev Containers<\/a>: to enable full-featured development environment for LLM Agents, where all required tools are installed and configured to perform rapid experimentation, evaluation and testing.<\/li>\n<li><a href=\"https:\/\/docs.python.org\/3\/library\/unittest.html\">Unit Testing of Agents<\/a>: for testing the Python codes locally with unit test cases.<\/li>\n<\/ol>\n<h1>Toolkit Repository<\/h1>\n<p>The toolkit repository <a href=\"https:\/\/github.com\/Azure-Samples\/llm-agent-ops-toolkit-sk\">Azure-Samples\/llm-agent-ops-toolkit-sk<\/a> contains a sample use case of\u00a0<strong>MySQL Copilot<\/strong>, where user can interact with the solution to retrieve data from a MySQL database by providing a natural language query. The solution uses agentic approach, where LLM Agents will process the user query, generate SQL queries, execute the queries on the MySQL database, and return the results to the user.<\/p>\n<p>This has been implemented using the concept of\u00a0<strong>StateFlow<\/strong>\u00a0(a Finite State Machine FSM based LLM workflow) using\u00a0<a href=\"https:\/\/learn.microsoft.com\/en-us\/semantic-kernel\/overview\/\">Semantic Kernel<\/a>\u00a0agents. This implementation is equivalent to\u00a0<a href=\"https:\/\/microsoft.github.io\/autogen\/stable\/user-guide\/agentchat-user-guide\/selector-group-chat.html#custom-selector-function\">AutoGen Selector Group Chat Pattern with custom selector function<\/a>.<\/p>\n<p>For more details on\u00a0StateFlow\u00a0refer the research paper &#8211;\u00a0<a href=\"https:\/\/arxiv.org\/abs\/2403.11322\">StateFlow: Enhancing LLM Task-Solving through State-Driven Workflows<\/a>.<\/p>\n<h1>Extending the Toolkit<\/h1>\n<p>This toolkit can be used by forking this repository and replacing the\u00a0MySQL Copilot\u00a0with any other LLM Agent based solution or it can be further enhanced for a specific use case.<\/p>\n<h2>Conclusion<\/h2>\n<p>From the Semantic Kernel team, we\u2019d like to thank Prabal for his time and all of his great work. \u00a0Please reach out if you have any questions or feedback through our <a href=\"https:\/\/github.com\/microsoft\/semantic-kernel\/discussions\/categories\/general\" target=\"_blank\" rel=\"noopener\">Semantic Kernel GitHub Discussion Channel<\/a>. We look forward to hearing from you!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Today the Semantic Kernel team is excited to welcome a guest author, Prabal Deb to share his work. LLMAgentOps\u202fToolkit is repository that contains basic structure of LLM Agent based application built on top of the Semantic Kernel Python version. The toolkit is designed to be a starting point for data scientists and developers for experimentation [&hellip;]<\/p>\n","protected":false},"author":149071,"featured_media":2302,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[117,1],"tags":[48,63,9],"class_list":["post-4273","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-guest-blog","category-semantic-kernel","tag-ai","tag-microsoft-semantic-kernel","tag-semantic-kernel"],"acf":[],"blog_post_summary":"<p>Today the Semantic Kernel team is excited to welcome a guest author, Prabal Deb to share his work. LLMAgentOps\u202fToolkit is repository that contains basic structure of LLM Agent based application built on top of the Semantic Kernel Python version. The toolkit is designed to be a starting point for data scientists and developers for experimentation [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-json\/wp\/v2\/posts\/4273","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-json\/wp\/v2\/users\/149071"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-json\/wp\/v2\/comments?post=4273"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-json\/wp\/v2\/posts\/4273\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-json\/wp\/v2\/media\/2302"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-json\/wp\/v2\/media?parent=4273"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-json\/wp\/v2\/categories?post=4273"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-json\/wp\/v2\/tags?post=4273"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}