March 6th, 2025

Talk to your agents! Introducing the Realtime API’s in Semantic Kernel

Eduard van Valkenburg
Senior Software Engineer

Introducing Realtime Agents in Semantic Kernel for Python

With release 1.23.0 of the Python version of Semantic Kernel we are introducing a new set of clients for interacting with the realtime multi-modal API’s of OpenAI and Azure OpenAI. They provide a abstracted approach to connecting to those services, adding your tools and running apps that leverage these very powerful and useful agents.

Experimental

These connectors are experimental as we learn to better understand what is needed to support these kinds of models from different providers. The underlying API’s are also in preview so there might also be breaking changes coming from the services.

The key addition that Semantic Kernel brings when you want to connect to these models is that we make the experience of using these models with function calling very easy, just create a Kernel and add your plugins as you are used to doing with Semantic Kernel, you can even just pass in your plugins and then we create the kernel for you, next add the FunctionChoiceBehavior class to the settings, and pass both to the Realtime Client and it will handle serializing the function definitions to the API, and when you use FunctionChoiceBehavior.Auto with auto_invoke turned on (the default), then we will execute the function, pass the result to the API, and ask it to create a response.

Another important thing that we have done with these clients is to abstract away the underlying protocols as much as possible, so that you can easily switch models and providers while maintaining the same codebase.

Get started

First you need to install Semantic Kernel with the realtime extra:

pip install semantic-kernel[realtime]

Next, create your functions and Kernel, and add the functions, you can also wrap these function in a class and pass that as one in a list of plugins:


from datetime import datetime
from semantic_kernel.functions import kernel_function
from semantic_kernel import Kernel

@kernel_function
def get_weather(location: str) -> str:
    """Get the weather for a location."""
    weather_conditions = ("sunny", "hot", "cloudy", "raining", "freezing", "snowing")
    weather = weather_conditions[randint(0, len(weather_conditions) - 1)]  # nosec
    logger.info(f"@ Getting weather for {location}: {weather}")
    return f"The weather in {location} is {weather}."


@kernel_function
def get_date_time() -> str:
    """Get the current date and time."""
    logger.info("@ Getting current datetime")
    return f"The current date and time is {datetime.now().isoformat()}."


@kernel_function
def goodbye():
    """When the user is done, say goodbye and then call this function."""
    logger.info("@ Goodbye has been called")
    raise KeyboardInterrupt

kernel = Kernel()
kernel.add_functions(plugin_name="helpers", functions=[goodbye, get_weather, get_date_time])

Next, create a Realtime Client, there are currently three types of clients available, AzureRealtimeWebsocket, OpenAIRealtimeWebsocket and OpenAIRealtimeWebRTC (they are all available from the semantic_kernel.connectors.ai.open_ai namespace:

from semantic_kernel.connectors.ai import FunctionChoiceBehavior
from semantic_kernel.connectors.ai.open_ai import (
    AzureRealtimeExecutionSettings,
    AzureRealtimeWebsocket
)

realtime_agent = AzureRealtimeWebsocket()
settings = AzureRealtimeExecutionSettings(
        instructions="""
    You are a chat bot. Your name is Mosscap and
    you have one goal: figure out what people need.
    Your full name, should you need to know it, is
    Splendid Speckled Mosscap. You communicate
    effectively, but you tend to answer with long
    flowery prose.
    """,
        voice="alloy",
        turn_detection=TurnDetection(type="server_vad", create_response=True, silence_duration_ms=800, threshold=0.8),
        function_choice_behavior=FunctionChoiceBehavior.Auto(),
    )

Then we can start the session, the settings, chat_history and kernel or plugins can be added here, or they can be passed in the constructor above.

This then starts receiving events from the service, those events are both Audio (RealtimeAudioEvent, a subclass of RealtimeEvent) and Text (RealtimeTextEvent, a subclass of RealtimeEvent) events as well as events that denote other activities of the API (RealtimeEvent is the type of those) , such as responses being created, items added and updates to the session itself:

async with realtime_agent(
    settings=settings,
    chat_history=chat_history,
    kernel=kernel,
    create_response=True,
):
    async for event in realtime_agent.receive():
       # event handling code

At the same time you can send events to the service, again both audio and text inputs, but also updates to the way the session runs, such as which functions are available.

For instance, if you want the service to create a response, you can do this:

await realtime_agent.send(RealtimeEvent(service_type=SendEvents.RESPONSE_CREATE))

Learn more

To learn more about these new features, see our documentation and samples. Finally, we have a more complete demo app that uses Azure Communication Services to allow you to have calls with your data and other tools.

Happy talking!

Author

Eduard van Valkenburg
Senior Software Engineer

Senior Software Engineer - Semantic Kernel Python

0 comments