Hybrid model orchestration is a powerful technique that AI applications can use to intelligently select and switch between multiple models based on various criteria, all while being transparent to the calling code. This technique not only allows for model selection based on factors such as the prompt’s input token size and each model’s min/max token capacity, or data sensitivity – where sensitive inference is done against local models and the others against cloud models – returning either the fastest response, the most relevant response, or the first available model’s response, but also provides a robust fallback mechanism by ensuring that if one model fails, another can seamlessly take over. In this blog post, we will explore the fallback mechanism, which is just one implementation of the technique, and demonstrate its application through a practical example.
Benefits of Hybrid Model Orchestration
- Enhanced Flexibility: The application can dynamically choose the best model based on the current context or requirements.
- Seamless Integration: The consumer code does not need to be aware that it is interacting with an orchestrator. This transparency simplifies integration and reduces complexity for developers.
- Enhanced Reliability: By having multiple models available, the application can continue to function smoothly even if one model fails, ensuring continuous operation.
Example Implementation
Let’s look at an example implementation of hybrid model orchestration. The following code demonstrates how to use a FallbackChatClient
to perform chat completion,
falling back to an available model when the primary model is unavailable.
public async Task FallbackToAvailableModelAsync()
{
// Create an unavailable chat client that fails with 503 Service Unavailable HTTP status code
IChatClient unavailableChatClient = CreateUnavailableOpenAIChatClient();
// Create a cloud available chat client
IChatClient availableChatClient = CreateAzureOpenAIChatClient();
// Create a fallback chat client that will fallback to the available chat client when request to unavailable chat client fails
IChatClient fallbackChatClient = new FallbackChatClient([unavailableChatClient, availableChatClient]);
ChatOptions chatOptions = new() { Tools = [AIFunctionFactory.Create(GetWeather)] };
var result = await fallbackChatClient.GetResponseAsync("Do I need an umbrella?", chatOptions);
Output.WriteLine(result);
[Description("Gets the weather")]
string GetWeather() => "It's sunny";
}
internal sealed class FallbackChatClient : IChatClient
{
private readonly IList<IChatClient> _chatClients;
private static readonly List<HttpStatusCode> s_defaultFallbackStatusCodes = new()
{
HttpStatusCode.InternalServerError,
HttpStatusCode.NotImplemented,
HttpStatusCode.BadGateway,
HttpStatusCode.ServiceUnavailable,
HttpStatusCode.GatewayTimeout
};
public FallbackChatClient(IList<IChatClient> chatClients)
{
this._chatClients = chatClients?.Any() == true ? chatClients : throw new ArgumentException("At least one chat client must be provided.", nameof(chatClients));
}
public List<HttpStatusCode>? FallbackStatusCodes { get; set; }
public async Task<ChatResponse> GetResponseAsync(IList<ChatMessage> chatMessages, ChatOptions? options = null, CancellationToken cancellationToken = default)
{
for (int i = 0; i < this._chatClients.Count; i++)
{
var chatClient = this._chatClients.ElementAt(i);
try
{
return await chatClient.GetResponseAsync(chatMessages, options, cancellationToken).ConfigureAwait(false);
}
catch (Exception ex)
{
if (this.ShouldFallbackToNextClient(ex, i, this._chatClients.Count))
{
continue;
}
throw;
}
}
throw new InvalidOperationException("Neither of the chat clients could complete the inference.");
}
public async IAsyncEnumerable<ChatResponseUpdate> GetStreamingResponseAsync(IList<ChatMessage> chatMessages, ChatOptions? options = null, [EnumeratorCancellation] CancellationToken cancellationToken = default)
{
// Similar funcitonality as GetResponseAsync but for streaming
}
private bool ShouldFallbackToNextClient(Exception ex, int clientIndex, int numberOfClients)
{
if (clientIndex == numberOfClients - 1)
{
return false;
}
HttpStatusCode? statusCode = ex switch
{
HttpOperationException operationException => operationException.StatusCode,
HttpRequestException httpRequestException => httpRequestException.StatusCode,
ClientResultException clientResultException => (HttpStatusCode?)clientResultException.Status,
_ => throw new InvalidOperationException($"Unsupported exception type: {ex.GetType()}."),
};
if (statusCode is null)
{
throw new InvalidOperationException("The exception does not contain an HTTP status code.");
}
return (this.FallbackStatusCodes ?? s_defaultFallbackStatusCodes).Contains(statusCode!.Value);
}
}
For a full implementation, refer to the sample provided by Microsoft Semantic Kernel on GitHub
Explanation
In this example, the FallbackChatClient
class is designed to handle multiple chat clients and switch between them based on their availability. The FallbackToAvailableModelAsync
method demonstrates how to use this class to perform chat completion with fallback support.
- Creating Chat Clients:
- The code initializes an unavailable chat client that simulates a failure by returning a 503 Service Unavailable HTTP status code.
- It also creates an available chat client that represents a functional service.
- Fallback Mechanism:
- The
FallbackChatClient
is set up with both the unavailable and available chat clients. - This client will first attempt to use the unavailable client, and if it fails, it will automatically switch to the available client.
- The
- Setting Chat Options:
- Chat options are defined, which include tools like the
GetWeather
function created using theAIFunctionFactory
. - This function is a simple method that returns a string indicating sunny weather.
- Chat options are defined, which include tools like the
- Performing Completion:
- The
GetResponseAsync
method of theFallbackChatClient
is used to perform chat completion. - It attempts to get a response to the question “Do I need an umbrella?” utilizing the defined chat options.
- The result of this operation is then printed to the output.
- The
- Handling Exceptions:
- Within the
FallbackChatClient
, theGetResponseAsync
method iterates through the list of chat clients and handles exceptions. - If a request fails with a certain set of HTTP status codes (like 503 Service Unavailable), it falls back to the next client in the list.
- If all clients fail, an exception is thrown indicating that none of the clients could complete the request.
- Within the
- Customization and Streaming:
- The
FallbackChatClient
also supports customization of fallback status codes via a property. - Additionally, it provides a method for handling streaming responses, similar to the primary completion method.
- The
- Decorator Pattern:
- The
FallbackChatClient
implements the same interface as the chat clients it wraps, making it a decorator. - This design pattern allows it to add additional functionality – in this case, fallback logic – without changing either the caller code or the underlying chat client implementations.
- The
Potential Scenarios
Hybrid model orchestration can be effectively utilized in various situations. For instance, when selecting models based on token size, the system determines the appropriate model by considering the prompt’s input token size and each model’s minimum and maximum token capacity. It can then return the fastest model’s response, the most relevant response, or the first available model’s response. Similarly, in scenarios involving data sensitivity, the system selects models based on the sensitivity of the data and can return the fastest model’s response, the most relevant response, or the first available model’s response.
Conclusion
Hybrid model orchestration is a powerful technique for enhancing the flexibility, integration, and reliability of AI applications. By dynamically selecting the best model based on context, seamlessly integrating with consumer code, and providing a robust fallback mechanism, applications can ensure continuous and efficient operation. The example provided demonstrates how to implement hybrid model orchestration in a C# application, highlighting its practical benefits and ease of use. This approach not only improves the resilience of AI systems but also simplifies the development process, making it an invaluable tool for developers.
0 comments
Be the first to start the discussion.