ML.NET is an open-source and cross-platform machine learning framework (Windows, Linux, macOS) for .NET developers. Using ML.NET, developers can leverage their existing tools and skillsets to develop and infuse custom AI into their applications by creating custom machine learning models.
ML.NET allows you to create and use machine learning models targeting common tasks such as classification, regression, clustering, ranking, recommendations and anomaly detection. It also supports the broader open source ecosystem by proving integration with popular deep-learning frameworks like TensorFlow and interoperability through ONNX. Some common use cases of ML.NET are scenarios like Sentiment Analysis, Recommendations, Image Classification, Sales Forecast, etc. Please see our samples for more scenarios.
Today we’re announcing the release of ML.NET 0.10. ( ML.NET 0.1 was released at //Build 2018). Note that ML.NET follows a semantic versioning pattern, so this preview version is 0.10. There will be additional versions such as 0.11 and 0.12 before we release v1.0.
This release focuses on the overall stability of the framework, continuing to refine the API, increase test coverage and as an strategic milestone, we have moved the IDataView
components into a new and separated assembly under Microsoft.Data.DataView
namespace so it will favor interoperability in the future.
The main highlights for this blog post are described below in further details:
- IDataView as a shared type across libraries in the .NET ecosystem
- Support for multiple ‘feature columns’ in recommendations (FFM based)
- Additional updates in v0.10 timeframe
- Explore the community samples and share yours!
- Planning to go to production?
- Get Started!
IDataView as a shared type across libraries in the .NET ecosystem
The IDataView
component provides a very efficient, compositional processing of tabular data (columns and rows) especialy made for machine learning and advanced analytics applications. It is designed to efficiently handle high dimensional data and large data sets. It is also suitable for single node processing of data partitions belonging to larger distributed data sets.
For further info on IDataview read the IDataView design principles
What’s new in v0.10 for IDataView
In ML.NET 0.10 we have segregated the IDataView component into a single assembly and NuGet package. This is a very important step towards the interoperability with other APIs and frameworks.
Why segregate IDataView from the rest of the ML.NET framework?
This is a very important milestone that will help the ecosystem’s interoperability between multiple frameworks and libraries from Microsoft of third parties. By seggregating IDataView
, different libraries will be able to reference it and use it from their API and allow users to pass large volumes of data between two independent libraries.
For example, from ML.NET you can of course consume and produce IDataView
instances. But what if you need to integrate with a different framework by creating an IDataView
from another API such as any “Data Preparation framework” library? If those frameworks can simply reference a single NuGet package with just the IDataView
, then you can directly pass data into ML.NET from those frameworks without having to copy the data into a format that ML.NET consumes. Also, the additional framework wouldn’t depend on the whole ML.NET framework but just reference a very clean package limited to the IDataView
.
The image below is an aspirational approach when using IDataView
across frameworks in the ecosystem:
Another good example would be any plotting/charting library in the .NET ecosystem that could consume data using IDataView
. You could take data that was produced by ML.NET and feed it directly into the plotting library without that library having a direct reference to the whole ML.NET framework. There would be no need to copy, or change the shape of the data at all. And there is no need for this plotting library to know anything about ML.NET.
Basically, IDataView
can be an exchange data format which allows producers and consumers to pass large amounts of data in a standarized way.
For additional info check the PR #2220
Support for multiple ‘feature columns’ in recommendations (FFM based)
In previous ML.NET releases, when using the Field-aware Factorization Machine (FFM) trainer (training algorithm) you could only provide a single feature column like in this sample app
In 0.10 release we’ve added support for multiple ‘feature columns’ in your training dataset when using an FFM trainer by allowing to specify those additional column names in the trainer’s ‘Options’ parameter as shown in the following code snippet:
var ffmArgs = new FieldAwareFactorizationMachineTrainer.Options();
// Create the multiple field names.
ffmArgs.FeatureColumn = nameof(MyObservationClass.MyField1); // First field.
ffmArgs.ExtraFeatureColumns = new[]{ nameof(MyObservationClass.MyField2), nameof(MyObservationClass.MyField3) }; // Additional fields.
var pipeline = mlContext.BinaryClassification.Trainers.FieldAwareFactorizationMachine(ffmArgs);
var model = pipeline.Fit(dataView);
You can see additional code example details in this code
Additional updates in v0.10 timeframe
Support for returning multiple predicted labels
Until ML.NET v0.9, when predicting (for instance with a multi-class classification model), you could only predict and return a single label. That’s an issue for many business scenarios. For instance, in an eCommerce scenario, you could want to automatically classify a product and assign it to multiple product categories instead of just a single category.
However, when predicting, ML.NET internally already had a list of the multiple possible predictions with a score/proability per each in the schema’s data, but the API was simply not returning the list of possible predicted labels but a single one.
Therefore, this improvement allows you to access the schema’s data so you can get a list of the predicted labels which can then be related to their scores/proabilities provided by the float[] Score array in your Prediction class, such as in this sample prediction class.
For additional info check this code example
Minor updates in 0.10
- Introducing Microsoft.ML.Recommender NuGet name instead of Microsoft.ML.MatrixFactorization name: Microsoft.ML.Recommender] is a better naming for NuGet packages based on the scenario (Recommendations) instead of the trainer’s name (Microsoft.ML.MatrixFactorization).
- Added support in TensorFlow for using using text and sparse input in TensorFlow: Specifically, this adds support for loading a map from a file through dataview by using ValueMapperTransformer. This provides support for additional scenarios like a Text/NLP scenario) in TensorFlowTransform where model’s expected input is vector of integers.
- Added Tensorflow unfrozen models support in GetModelSchema: For a code example loading an unfrozen TensorFlow check it out here.
Breaking changes in ML.NET 0.10
For your convenience, if you are moving your code from ML.NET v0.9 to v0.10, you can check out the breaking changes list that impacted our samples.
Instrumented code coverage tools as part of the ML.NET CI systems
We have also instrumented code coverage tools (using https://codecov.io/) as part of our CI systems and will continue to push for stability and quality in the code.
You can check it out here which is also a link in the home page of the ML.NET repo:
Once you click on that link, you’ll see the current code coverage for ML.NET:
Explore the community samples and share yours!!
As part of the ML.NET Samples repo we also have a special Community Samples page pointing to multiple samples provided by the community. These samples are not maintained by Microsoft but are very interesting and cover additional scenarios not covered by us.
Here’s an screenshot of the current community samples:
There are pretty cool samples like the following:
‘Photo-Search’ WPF app running a TensorFlow model exported to ONNX format
UWP app using ML.NET
Other very interesting samples are:
Share your sample with the ML.NET community!
We encourage you to share your ML.NET demos and samples with the community by simply submitting its brief description and URL pointing to your GitHub repo or blog posts, into this repo issue “Request for your samples!”.
We’ll do the rest and publish it at the ML.NET Community Samples page!
Planning to go to production?
If you are using ML.NET in your app and looking to go into production, you can talk to an engineer on the ML.NET team to:
- Get help implementing ML.NET successfully in your application.
- Demo your app and potentially have it featured on the .NET Blog, dot.net site, or other Microsoft channel.
Fill out this form and someone from the ML.NET team will contact you.
Get started!
If you haven’t already get started with ML.NET here.
Next, going further explore some other resources:
- Tutorials and resources at the Microsoft Docs ML.NET Guide
- Code samples at the machinelearning-samples GitHub repo
- Important ML.NET concepts for understanding the new API are introduced here
- “How to” guides that show how to use these APIs for a variety of scenarios can be found here
- Download the Visual Studio templates for ML.NET from the Visual Studio Marketplace
We will appreciate your feedback by filing issues with any suggestions or enhancements in the ML.NET GitHub repo to help us shape ML.NET and make .NET a great platform of choice for Machine Learning.
Thanks and happy coding with ML.NET!
The ML.NET Team.
This blog was authored by Cesar de la Torre and Eric Erhardt plus additional contributions of the ML.NET team
0 comments