The Microsoft.Extensions.AI.Evaluations library is designed to simplify the integration of AI evaluation processes into your applications. It provides a robust framework for evaluating your AI applications and automating the assessment of their performance.
In November, we announced the public preview of the library, and today, we are thrilled to announce that it is now available open source in the dotnet/Extensions repo. This repository contains a suite of libraries that provide facilities commonly needed when creating production-ready applications. By making this library available to everyone, we aim to empower developers to harness the power of AI more effectively in their projects.
New Samples for Using the Library
To help you get started with the Microsoft.Extensions.AI.Evaluations library, we have released a set of new samples. These samples showcase various use cases and demonstrate how to leverage the library’s capabilities effectively. Whether you are a seasoned developer or just beginning your AI journey, these samples will provide valuable insights and practical guidance. You can find the samples on our GitHub repository. We encourage you to explore them, experiment, and share your feedback with us. Your contributions and feedback are invaluable as we continue to enhance and expand the library’s features.
Introducing the Azure DevOps Plug-in
Looking to integrate your AI evaluations into your Azure DevOps pipeline? We are excited to announce the availability of a plug-in in the marketplace. This plug-in allows you to seamlessly integrate AI evaluations into your pipelines, enabling continuous assessment of your AI models as part of your CI/CD workflows.
With the AzDO plug-in, you can automate the evaluation process, ensuring that your applications meet the desired criteria before deployment. This integration enhances the reliability and efficiency of your AI solutions, helping you deliver high-quality applications with confidence.
Get Started Today
We invite you to explore the open-source Microsoft.Extensions.AI.Evaluations preview libraries, try out the new samples, and integrate the AzDO plug-in into your pipelines. We are excited to see how you will use these tools to innovate and create impactful AI solutions. Stay tuned for more updates and enhancements, and, as always, we welcome your feedback and contributions.
Hi! Thanks for the great post. The addition of more example would've definitely helped a couple months ago when we started working on this!
I see the report is a bit different now and it sadly removed a section with some global stats such as separating scenario and iteration pass/fail.
I was hoping since the beginning that we would be able to expand those global stats and add custom ones too. For example, the average number of tokens used by our chatbot for each question during the evaluation, or the average number of citations, etc...
Is this something that is being considered?
Thanks!
Thanks Pascal, that’s great feedback. We have a lot of plans for the reporting, including adding trend data and the ability to do more searching/filtering, etc… We definitely open to more ideas about the best ways to report the data.
Here are a couple of issues we are tracking for the work. Feel free to comment on those issues, or to open a new one.
https://github.com/dotnet/extensions/issues/5934
https://github.com/dotnet/extensions/issues/5935
Thanks for the feedback, Pascal. We also have the following issues that seem related. As Peter mentioned, please feel free to comment or open new issues.
Adding custom global metadata –
https://github.com/dotnet/extensions/issues/6034
Including measurements for token counts –
https://github.com/dotnet/extensions/issues/5970