{"id":4513,"date":"2025-03-18T20:29:43","date_gmt":"2025-03-19T03:29:43","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/semantic-kernel\/?p=4513"},"modified":"2025-03-18T20:29:43","modified_gmt":"2025-03-19T03:29:43","slug":"accelerating-agentic-workflows-with-nvidia-agentiq-azure-ai-foundry-and-semantic-kernel","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/agent-framework\/accelerating-agentic-workflows-with-nvidia-agentiq-azure-ai-foundry-and-semantic-kernel\/","title":{"rendered":"Accelerating Agentic Workflows with NVIDIA AgentIQ, Azure AI Foundry and Semantic Kernel"},"content":{"rendered":"<p>Today, we&#8217;re excited to announce our collaboration with NVIDIA. In Azure AI Foundry, we&#8217;ve integrated NVIDIA NIM microservices and the NVIDIA AgentIQ toolkit into <a href=\"https:\/\/azure.microsoft.com\/en-us\/products\/ai-foundry\" target=\"_blank\" rel=\"noreferrer noopener\">Azure AI Foundry<\/a>\u2014unlocking unprecedented efficiency, performance, and cost optimization for your AI projects. Read more on the announcement <a href=\"https:\/\/azure.microsoft.com\/en-us\/blog\/accelerating-agentic-workflows-with-azure-ai-foundry-nvidia-nim-and-nvidia-agentiq\/?msockid=36fead99b61a6c2133cbbf4fb7a06d94\">here<\/a>.<\/p>\n<h2 id=\"optimizing-performance-with-nvidia-agentiq\" class=\"wp-block-heading\">Optimizing performance with NVIDIA AgentIQ and Semantic Kernel<\/h2>\n<p class=\"wp-block-paragraph\">Once your NVIDIA NIM microservices are deployed, <a href=\"http:\/\/github.com\/NVIDIA\/AgentIQ\">NVIDIA AgentIQ<\/a> takes center stage. This\u00a0<a href=\"https:\/\/github.com\/microsoft\/semantic-kernel\/tree\/main\/python\/semantic_kernel\/connectors\/ai\/nvidia\" target=\"_blank\" rel=\"noreferrer noopener\">open-source toolkit<\/a>\u00a0is designed to seamlessly connect, profile, and optimize teams of AI agents, enables your systems to run at peak performance. AgentIQ delivers:<\/p>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\"><strong>Profiling and optimization:<\/strong>\u00a0Leverage real-time telemetry to fine-tune AI agent placement, reducing latency and compute overhead.<\/li>\n<\/ul>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\"><strong>Dynamic inference enhancements:<\/strong>\u00a0Continuously collect and analyze metadata\u2014such as predicted output tokens per call, estimated time to next inference, and expected token lengths\u2014to dynamically improve agent performance.<\/li>\n<\/ul>\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\"><strong>Integration with Semantic Kernel:<\/strong>\u00a0Direct integration with Azure AI Foundry Agent Service further empowers your agents with enhanced semantic reasoning and task execution capabilities.<\/li>\n<\/ul>\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/devblogs.microsoft.com\/semantic-kernel\/wp-content\/uploads\/sites\/78\/2025\/03\/Blog-Diagram.png\"><img decoding=\"async\" class=\"alignnone wp-image-4519 size-full\" src=\"https:\/\/devblogs.microsoft.com\/semantic-kernel\/wp-content\/uploads\/sites\/78\/2025\/03\/Blog-Diagram.png\" alt=\"Image Blog Diagram\" width=\"3445\" height=\"2210\" srcset=\"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-content\/uploads\/sites\/78\/2025\/03\/Blog-Diagram.png 2500w, https:\/\/devblogs.microsoft.com\/agent-framework\/wp-content\/uploads\/sites\/78\/2025\/03\/Blog-Diagram-300x192.png 300w, https:\/\/devblogs.microsoft.com\/agent-framework\/wp-content\/uploads\/sites\/78\/2025\/03\/Blog-Diagram-1024x657.png 1024w, https:\/\/devblogs.microsoft.com\/agent-framework\/wp-content\/uploads\/sites\/78\/2025\/03\/Blog-Diagram-768x493.png 768w, https:\/\/devblogs.microsoft.com\/agent-framework\/wp-content\/uploads\/sites\/78\/2025\/03\/Blog-Diagram-1536x985.png 1536w, https:\/\/devblogs.microsoft.com\/agent-framework\/wp-content\/uploads\/sites\/78\/2025\/03\/Blog-Diagram-2048x1314.png 2048w\" sizes=\"(max-width: 3445px) 100vw, 3445px\" \/><\/a><\/figure>\n<p class=\"wp-block-paragraph\">This intelligent profiling not only reduces compute costs but also boosts accuracy and responsiveness, so that every part of your agentic AI workflow is optimized for success.<\/p>\n<p class=\"wp-block-paragraph\">In addition, we will soon be integrating the NVIDIA Llama Nemotron Reason open reasoning model. NVIDIA Llama Nemotron Reason is a powerful AI model family designed for advanced reasoning. According\u00a0to NVIDIA, Nemotron excels at coding, complex math, and scientific reasoning while understanding user intent and seamlessly calling tools like search and translations to accomplish tasks.<\/p>\n<h2 class=\"wp-block-paragraph\">Ready to dive in?<\/h2>\n<p class=\"wp-block-paragraph\">Deploy NVIDIA NIM microservices and\u00a0<a href=\"https:\/\/github.com\/microsoft\/semantic-kernel\/tree\/main\/python\/semantic_kernel\/connectors\/ai\/nvidia\" target=\"_blank\" rel=\"noreferrer noopener\">optimize your AI agents<\/a>. Below are details to get up and running. Details about the NVIDIA Text Embedding Connector can be found <a href=\"https:\/\/github.com\/microsoft\/semantic-kernel\/blob\/main\/python\/semantic_kernel\/connectors\/ai\/nvidia\/README.md\">here<\/a>.<\/p>\n<p dir=\"auto\">This connector enables integration with NVIDIA&#8217;s NIM API for text embeddings. It allows you to use NVIDIA&#8217;s embedding models within the Semantic Kernel SDK.<\/p>\n<div class=\"markdown-heading\" dir=\"auto\">\n<h2 class=\"heading-element\" dir=\"auto\" tabindex=\"-1\">Quick start<\/h2>\n<\/div>\n<div class=\"markdown-heading\" dir=\"auto\">\n<h3 class=\"heading-element\" dir=\"auto\" tabindex=\"-1\">Initialize the kernel<\/h3>\n<\/div>\n<div class=\"highlight highlight-source-python notranslate position-relative overflow-auto\" dir=\"auto\">\n<pre class=\"prettyprint language-py\"><code class=\"language-py\">from semantic_kernel import Kernel \r\nkernel =\u00a0Kernel()<\/code><\/pre>\n<\/div>\n<div class=\"markdown-heading\" dir=\"auto\">\n<h3 class=\"heading-element\" dir=\"auto\" tabindex=\"-1\">Add NVIDIA text embedding service<\/h3>\n<\/div>\n<p dir=\"auto\">You can provide your API key directly or through environment variables<\/p>\n<div class=\"highlight highlight-source-python notranslate position-relative overflow-auto\" dir=\"auto\">\n<pre class=\"prettyprint language-py\"><code class=\"language-py\">from semantic_kernel.connectors.ai.nvidia import NvidiaTextEmbedding\r\n\r\nembedding_service = NvidiaTextEmbedding(\r\n    ai_model_id=\"nvidia\/nv-embedqa-e5-v5\",  # Default model if not specified\r\n    api_key=\"your-nvidia-api-key\",  # Can also use NVIDIA_API_KEY env variable \r\n    service_id=\"nvidia-embeddings\",  # Optional service identifier \r\n)<\/code><\/pre>\n<\/div>\n<div class=\"markdown-heading\" dir=\"auto\">\n<h3 class=\"heading-element\" dir=\"auto\" tabindex=\"-1\">Add the embedding service to the kernel<\/h3>\n<\/div>\n<div class=\"highlight highlight-source-python notranslate position-relative overflow-auto\" dir=\"auto\">\n<pre class=\"prettyprint language-py\"><code class=\"language-py\">kernel.add_service(embedding_service)<\/code><\/pre>\n<\/div>\n<div class=\"markdown-heading\" dir=\"auto\">\n<h3 class=\"heading-element\" dir=\"auto\" tabindex=\"-1\">Generate embeddings for text<\/h3>\n<pre class=\"prettyprint language-py\"><code class=\"language-py\">texts = [\"Hello, world!\", \"Semantic Kernel is awesome\"] \r\nembeddings = await kernel.get_service(\"nvidia-embeddings\").generate_embeddings(texts)<\/code><\/pre>\n<\/div>\n<div class=\"highlight highlight-source-python notranslate position-relative overflow-auto\" dir=\"auto\">\n<p>Now that you have generated the embeddings, you can seamlessly integrate them into your application to enhance its AI capabilities.<\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Today, we&#8217;re excited to announce our collaboration with NVIDIA. In Azure AI Foundry, we&#8217;ve integrated NVIDIA NIM microservices and the NVIDIA AgentIQ toolkit into Azure AI Foundry\u2014unlocking unprecedented efficiency, performance, and cost optimization for your AI projects. Read more on the announcement here. Optimizing performance with NVIDIA AgentIQ and Semantic Kernel Once your NVIDIA NIM [&hellip;]<\/p>\n","protected":false},"author":149071,"featured_media":4519,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[47,17],"tags":[48,82,63,131,132,9],"class_list":["post-4513","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-announcement","category-announcements","tag-ai","tag-announcement","tag-microsoft-semantic-kernel","tag-nvidia","tag-nvidia-agentiq","tag-semantic-kernel"],"acf":[],"blog_post_summary":"<p>Today, we&#8217;re excited to announce our collaboration with NVIDIA. In Azure AI Foundry, we&#8217;ve integrated NVIDIA NIM microservices and the NVIDIA AgentIQ toolkit into Azure AI Foundry\u2014unlocking unprecedented efficiency, performance, and cost optimization for your AI projects. Read more on the announcement here. Optimizing performance with NVIDIA AgentIQ and Semantic Kernel Once your NVIDIA NIM [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-json\/wp\/v2\/posts\/4513","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-json\/wp\/v2\/users\/149071"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-json\/wp\/v2\/comments?post=4513"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-json\/wp\/v2\/posts\/4513\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-json\/wp\/v2\/media\/4519"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-json\/wp\/v2\/media?parent=4513"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-json\/wp\/v2\/categories?post=4513"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/agent-framework\/wp-json\/wp\/v2\/tags?post=4513"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}