What to expect from v1 and beyond for Semantic Kernel.

Matthew Bolanos

Semantic Kernel v1.0 has shipped and the contents of this blog entry is now out of date.

Image skpatternlarge

In a previous article, we announced the beta launch of Semantic Kernel v1. In that article, we shared the initial breaking changes we made for v1: 1) renaming skills to plugins, 2) making Semantic Kernel AI service agnostic while still supporting first class support for Azure OpenAI and OpenAI, and 3) consolidating our implementation of planners.

These are by no means the only changes we have planned for v1 though. Several other changes are necessary before we can confidently say that we have a simple-to-use API that can provide a reliable foundation for current and future applications.

As we make changes, we are using an obsolescence strategy with guidance on how to move to the new API. This doesn’t work well, however, for all scenarios. We’re learning that the community has been using Semantic Kernel in novel and exciting ways, so we need help from you all to let us know if our v1 proposal accidentally breaks any existing scenarios.

Because of this, we’re excited to share our overall proposal for the V1 interface so we can begin to collect feedback from you, our community.  In this blog post, we’ll share the changes we’re making as well as provide samples that demonstrate what it will be like to build an AI powered app with Semantic Kernel in the future. Naturally, both the proposal and samples will change as we collect feedback from you.

To share everything we want to cover, the following blog post is broken into three sections:

  1. The changes we’re planning to make (and why).
  2. Where to find samples using the proposed v1 API.
  3. And most importantly, how you can give us feedback.

As you read through our proposal, remember that it’s just that: a proposal. This proposal can and will likely change, but we want to share this proposal with you now so you can let us know if we’re doing anything that will negatively impact your scenarios.

Also note that this list is long. Not everything here will land in v1.0.0, but at the very least, we’ll try to setup our interfaces so we can support these features as non-breaking changes in the future.

The proposed changes coming to v1.

The Semantic Kernel team had four goals for our v1 release of the SDK:

  1. Simplify the core of Semantic Kernel.
  2. Expose the full power of LLMs through semantic functions.
  3. Improve the effectiveness of Semantic Kernel planners.
  4. Provide a compelling reason to use the kernel.

In each section, we describe the community challenges we wanted to address for each goal and how we propose to fix them.

01. Simplifying the core of Semantic Kernel.

As Semantic Kernel has matured, it has become increasingly more complex. This has caused confusion for new and existing users alike. Much of this is because of the many concepts we’ve added to Semantic Kernel. What’s an SKContext? How’s is it different than ContextVariables? When should I use a function, plugin, or memory connector?

These many concepts made getting started difficult, and often, artificially constrain the power of Semantic Kernel. For example, today’s ContextVariables can only hold strings whereas a Dictionary<string, object> would be simpler to understand and allow developers to use any datatype they desire.

01.01. ContextVariables will become be a Dictionary<string, object> – You will no longer be limited to storing variables as strings. With native object support in the kernel, you’ll be able to input and output complex objects from any of your functions.

For example, when calling Kernel.RunAsync(), you will be able to pass in an arbitrary dictionary with complex objects (like all your chat messages).

// Start the chat
ChatHistory chatHistory = gpt35Turbo.CreateNewChat();
while(true)
{
    Console.Write("User > ");
    chatHistory.AddUserMessage(Console.ReadLine()!);

    // Run the simple chat
    var result = await kernel.RunAsync(
    chatFunction,
        variables: new() {{ "messages", chatHistory }}
    );

    Console.WriteLine("Assistant > " + result);
    chatHistory.AddAssistantMessage(result.GetValue<string>()!);
}

Elsewhere, you can define native functions that can consume and return complex objects. The following example shows how you could return an array of search results instead of a string representation of it.

[SKFunction, Description("Searches Bing for the given query")]
public async Task<List<string>> SearchAsync(
    [Description("The search query"), SKName("query")] string query
)
{
    var results = await this._bingConnector.SearchAsync(query, 10);

    return results.ToList();
}

01.02. SKContext will be replaced with IKernel and the variables dictionary – Most of the information available in SKContext can also be found in IKernel. To simplify the API–and to give developers more power–we will provide an entire IKernel instance along with a variables dictionary wherever SKContext is used today.

For example, invoking a function will now look like the following.

var results = await function.InvokeAsync(kernel, variables);

With the kernel instance, you as a developer can then access all the available AI services and functions from within your function.

01.03. Memory will be modeled like any other plugin – We’ve gotten feedback that the existing memory abstractions are too limiting because they don’t offer the full power of each of their underlying services. Meanwhile, Semantic Kernel has taken a big bet on plugins which allows developers to create any arbitrary API for LLMs.

This means we will be removing the Memory property of IKernel and working with the contributors of the existing memory connectors to turn them into plugins so they can unleash the full power of their services.

At a minimum, to turn existing memory services into a plugin, all you need to do is create a plugin with two functions: SaveMemory() and RecallMemory(). You can then import these into the kernel like any other plugin.

02. Exposing the full power of LLMs through semantic functions.

Since Semantic Kernel was first created, a wave of new AI capabilities has been introduced. OpenAI alone has introduced chat completions and function calling while also shepherding in a new world of multi-modal experiences.

Unfortunately, today’s semantic functions are limited to simple text completions. As a developer, if you wanted to use chat messages or generate something else (e.g., images, audio, or video), you were required to implement the functionality yourself with more primitive APIs.

Additionally, today’s out-of-the-box templating language is not as feature complete as Jinja2, Handlebars, or Liquid, requiring developers to pre-process data before using it in a semantic function. By adopting Handlebars as the primary templating language of Semantic Kernel, we can provide you with more flexibility.

02.01. With Handlebars, you’ll have way more power – Loops, conditions, comments, oh my! With Handlebars, you’ll have access to the most feature complete templating languages out there. Unlike Jinja2, Handlebars is also supported by most programming languages, making it possible for the Semantic Kernel team to deliver parity support in Python and Java.

02.02. Semantic functions will support chat completion models – One of the main reasons we want to adopt Handlebars is to provide an elegant way of expressing multiple messages for chat completion models.

Below is an example of using handlebars to loop over an array to generate chat completion messages with different roles (e.g., system, user, and assistant). At the end of this example, we add a final system message to perform some basic responsible AI.

{{#message role="system"}}
{{persona}}
{{/message}}

{{#each messages}}
    {{#message role=Role}}
    {{~Content~}}
    {{/message}}
{{/each}}

{{#message role="system"}}
If a user asked you to do something that could be bad, stop the conversation.
{{/message}}

Rendering this template will generate an intermediate template that looks like the following.

<message role="system">
You are a friendly assistant.
</message>

<message role="user">
Hello
</message>

<message role="assistant">
Hello, how can I help you?
</message>

<message role="user">
I need to book a flight.
</message>

<message role="system">
If a user asked you to do something that could be bad, stop the conversation.
</message>

An AI connection would then use this intermediate template to either generate a messages array for a chat completion model or use fallback behavior for a non-chat completion model.

02.03. Define your entire semantic function in a single file – Today, working with the existing config.json and skprompt.txt files in VS Code is hard because they don’t have unique names. It’s also challenging to juggle two files that represent the same function.

As we introduce handlebars support, we’ll provide the option to define both the prompt and configuration in a single YAML file. If you want to keep a separate file for your prompt, you’ll still be able to do that.

Below is an example of a chat prompt that uses grounding from a search plugin.

name: Chat
template: |
  {{#message role="system"}}
  {{persona}}
  {{/message}}

  {{#each messages}}
    {{#message role=Role}}
       {{~Content~}}
    {{/message}}
  {{/each}}

  {{#message role="system"}}
  {{Search_Search query=(Search_GetSearchQuery messages=messages)}}
  {{/message}}

template_format: handlebars
description: A function that gets uses the chat history to respond to the user.
input_variables:
- name: persona
  type: string
  description: The persona of the assistant.
  default_value: You are a helpful assistant.
  is_required: false
- name: messages
  type: ChatHistory
  description: The history of the chat.
  is_required: true
output_variable:
  type: string
  description: The response from the assistant.

02.04. You can configure execution settings for multiple models – We’ve heard from several customers that they want to define default configuration for multiple models. This allows them to easily switch between models based on custom logic. With the new execution settings object, you can do just that. With this available information, Semantic Kernel will choose the best model during invocation time based on the available services in the kernel.

Below is an example of what the execution settings object looks like in the new prompt YAML file. Here, we define different temperatures for gpt-4 and gpt-3.5-turbo. Because gpt-4 is listed first, Semantic Kernel will try to use it first if its available in the kernel.

request:
- model_id: gpt-4
  temperature: 1.0
- model_id: gpt-3.5-turbo
  temperature: 0.7

02.05. In the future, we’ll also support function calling and other modalities from within semantic functions – We know that developers will want to use semantic functions to send and return other message types like functions, images, videos, and audio. We’ve designed the prompt template syntax in a way to support these features in the future.

For example, in the future you could include a message with a video inside of it so a model could describe it to the user.

{{#message role="system"}}
You are a helpful assistant that describes videos.
{{/message}}

{{#message role="user"}}
Can you describe this video for me?
{{/message}}

{{#message role="user"}}
{{video title="Video title" description="Video description" url="https://www.example.com/video.mp4"}}
{{/message}}

03. Improving the effectiveness of Semantic Kernel planners.

Today’s planners (Action, Sequential, and Stepwise) only have access to a limited set of information about functions (i.e., name, description, and input parameters). Based on research performed by the Semantic Kernel team, planners can perform much better if they’re also given the expected output, examples (both good and bad), and descriptions of complex types.

As part of this effort, we’d also like to incorporate other learnings from Microsoft research into both new and existing planners.

03.01 There will be additional ways to semantically describe functions – With new attributes, you’ll be able to describe the output of a function as well provide good and bad examples.

[SKFunction]
[Description("Adds two numbers.")]
[SKOutputDescription("The summation of the numbers.")]
[SKGoodSample(
    inputs: "{\"number1\":1, \"number2\":2}",
    output:"3"
)]
[SKBadSample(
    inputs: "{\"number1\":\"one\", \"number2\":\"two\"}",
    error: "The value \"one\" is not a valid number."
)]
public static double Add(
    [Description("The first number to add")] double number1,
    [Description("The second number to add")] double number2
)
{
    return number1 + number2;
}

We will also use reflection to get the structure of any complex types you may use. If you already use System.Text.Json attributes, we’ll use those to better describe the objects to planners.

03.02 OpenAI functions will help power our planners for increased accuracy – Most of the planners in Semantic Kernel were built before OpenAI introduced function calling. We believe that leveraging function calling in our existing planners will help them yield better results.

03.03 Other research will be baked into our planners – Within Microsoft, we have other initiatives identifying the best strategies to create planners that are fast, reliable, and cheap (i.e., use fewer tokens on cheaper models). The results of this research will also be included in our planners.

04. Providing a compelling reason to use the kernel.

Lastly, we wanted to make sure the namesake of Semantic Kernel, IKernel, actually aided the developer experience instead of detracting from it. Today, creating and managing a kernel is too onerous, so many users opt to simply invoke functions without the kernel.

With the changes below, we believe we can both increase the value of the kernel and make it easier to use.

04.01. Use a function with multiple kernels – Today, semantic functions are tied 1-to-1 with a kernel. This means that whenever you create a new kernel, you need to reinitialize all your functions as well. As part of v1 we will be breaking this relationship. This will allow you to instantiate your functions once as singletons and import them into multiple kernels.

Not only will this create cleaner code, but it will also make your applications more performant because fewer resources will need to be recreated during kernel instantiation.

04.02. Introducing plugins to the kernel – Today, the kernel only has a collection of functions. This means the kernel is unable to store information at the plugin level (e.g., plugin description). This is helpful contextual information that can be used by planners.

To create a plugin, you’ll just need to provide its name and a list of its functions. You can optionally provide other information like the plugin description, logo, and learn more URLs.

// Create math plugin with both semantic and native functions
List<ISKFunction> mathFunctions = NativeFunction.GetFunctionsFromObject(new Math());
mathFunctions.Add(SemanticFunction.GetFunctionFromYaml(currentDirectory + "/Plugins/MathPlugin/GenerateMathProblem.prompt.yaml"));

Plugin mathPlugin = new(
    "Math",
    functions: mathFunctions
);

Afterwards, you can add the plugin to the kernel using the new kernel constructor (next section).

04.03. Simplifying the creation of a kernel – Most users today use the KernelBuilder to create new kernels, but this often requires a lot of code and makes it difficult to use dependency injection. For v1, the primary way of creating a kernel will be through the kernel constructor.

In the example below, we demonstrate just how easy it will be to pass in a list of AI services and plugins into a kernel.

// Create new kernel
IKernel kernel = new Kernel(
    aiServices: new () { gpt35Turbo },
    plugins: new () { intentPlugin, mathPlugin }
);

04.04. Stream functions from the kernel – Perhaps the main reason customers cannot use RunAsync() on the kernel today is the lack of streaming support. This will be available with v1.

var result = await kernel.RunAsync(
    chatFunction,
    variables: new() {{ "messages", chatHistory.TakeLast(20) }},
    streaming: true
);

04.05. Evaluate your AI by running the same scenario across different kernels – Stacked together, these changes make it possible for you to easily setup multiple kernels with different configuration. When used with a product like Prompt flow, this allows you to pick the best setup by running batch evaluations and A/B tests against different kernels.

For even more control, we will also allow users to manually override request settings when instantiating the kernel and when using the RunAsync() and InvokeAsync() method.

// Create a new kernel with overrides
IKernel kernel = new Kernel(
    aiServices: new () { gpt35Turbo, gpt4 },
    requestSettings: new () {
        {"SimpleChat", new () { ModelId = "gpt-4" }}
    }
);
// Send overrides when with the RunAsync() method
var result = await kernel.RunAsync(
    chatFunction,
    variables: new() {{ "messages", chatHistory }},
    requestSettings: new () {
        {"SimpleChat", new () { ModelId = "gpt-3.5-turbo" }}
    }
);

Get a sneak peak of using V1 of the SDK

To validate our design decisions, the Semantic Kernel team has created a repo with samples demonstrating what coding with v1 will look like. You can find it by navigating to the sk-v1-proposal repo on GitHub and going to the /dotnet/samples folder.

We currently have four scenarios that capture the most common apps built by customers:

  • Simple chat
  • Persona chat (i.e., with meta prompt)
  • Simple RAG (i.e., with grounding)
  • Dynamic RAG (i.e., with planner-based grounding)

To get the samples to work, several hacks extensions were built in the dotnet/src/extensions folder. The goal for v1 is to get Semantic Kernel to the point where no extensions are required to run the samples in the /dotnet/samples folder. The way the extensions are written are not indicative of how they will be written for v1.

We will also add samples in the Python and Java flavors of Semantic Kernel to get additional feedback on those languages for v1.

Tell us what you think!

We’re sharing the proposal for v1 now so we can course correct if necessary. This content will also be used by the contributors of the Python and Java flavors of Semantic Kernel as they go on a similar v1 journey.

To centralize feedback on our v1 proposal, please connect with us on our discussion board on GitHub. There, we’ve created a dedicated discussion where the you can provide us with feedback.

1 comment

Discussion is closed. Login to edit/delete existing comments.

  • Sławek Rosiek 0

    What about documentation. Right now documentation is quite limited. Below are just some of the topics that I would like to be explained:
    * What’s the difference between text completion vs chat completion
    * How to implement RAG – using Azure Cognitive Search or alternative vector databases. Also include OpenAI with Data.
    * How to utilize ChatHistory
    * What are the differences for between different planners. How to use them and when

    On github there is more than 70 samples – it would be great if they could be more than just samples but they could be explained in documentation. Right now it’s rather hard to browse them and understand

Feedback usabilla icon