February 7th, 2019

Announcing ML.NET 0.10 – Machine Learning for .NET

Cesar De la Torre
Principal Program Manager

alt text

ML.NET is an open-source and cross-platform machine learning framework (Windows, Linux, macOS) for .NET developers. Using ML.NET, developers can leverage their existing tools and skillsets to develop and infuse custom AI into their applications by creating custom machine learning models.

ML.NET allows you to create and use machine learning models targeting common tasks such as classification, regression, clustering, ranking, recommendations and anomaly detection. It also supports the broader open source ecosystem by proving integration with popular deep-learning frameworks like TensorFlow and interoperability through ONNX. Some common use cases of ML.NET are scenarios like Sentiment Analysis, Recommendations, Image Classification, Sales Forecast, etc. Please see our samples for more scenarios.

Today we’re announcing the release of ML.NET 0.10. ( ML.NET 0.1 was released at //Build 2018). Note that ML.NET follows a semantic versioning pattern, so this preview version is 0.10. There will be additional versions such as 0.11 and 0.12 before we release v1.0.

This release focuses on the overall stability of the framework, continuing to refine the API, increase test coverage and as an strategic milestone, we have moved the IDataView components into a new and separated assembly under Microsoft.Data.DataView namespace so it will favor interoperability in the future.

The main highlights for this blog post are described below in further details:

IDataView as a shared type across libraries in the .NET ecosystem

The IDataView component provides a very efficient, compositional processing of tabular data (columns and rows) especialy made for machine learning and advanced analytics applications. It is designed to efficiently handle high dimensional data and large data sets. It is also suitable for single node processing of data partitions belonging to larger distributed data sets.

For further info on IDataview read the IDataView design principles

What’s new in v0.10 for IDataView

In ML.NET 0.10 we have segregated the IDataView component into a single assembly and NuGet package. This is a very important step towards the interoperability with other APIs and frameworks.

Why segregate IDataView from the rest of the ML.NET framework?

This is a very important milestone that will help the ecosystem’s interoperability between multiple frameworks and libraries from Microsoft of third parties. By seggregating IDataView, different libraries will be able to reference it and use it from their API and allow users to pass large volumes of data between two independent libraries.

For example, from ML.NET you can of course consume and produce IDataView instances. But what if you need to integrate with a different framework by creating an IDataView from another API such as any “Data Preparation framework” library? If those frameworks can simply reference a single NuGet package with just the IDataView, then you can directly pass data into ML.NET from those frameworks without having to copy the data into a format that ML.NET consumes. Also, the additional framework wouldn’t depend on the whole ML.NET framework but just reference a very clean package limited to the IDataView.

The image below is an aspirational approach when using IDataView across frameworks in the ecosystem:

alt text

Another good example would be any plotting/charting library in the .NET ecosystem that could consume data using IDataView. You could take data that was produced by ML.NET and feed it directly into the plotting library without that library having a direct reference to the whole ML.NET framework. There would be no need to copy, or change the shape of the data at all. And there is no need for this plotting library to know anything about ML.NET.

Basically, IDataView can be an exchange data format which allows producers and consumers to pass large amounts of data in a standarized way.

For additional info check the PR #2220

Support for multiple ‘feature columns’ in recommendations (FFM based)

In previous ML.NET releases, when using the Field-aware Factorization Machine (FFM) trainer (training algorithm) you could only provide a single feature column like in this sample app

In 0.10 release we’ve added support for multiple ‘feature columns’ in your training dataset when using an FFM trainer by allowing to specify those additional column names in the trainer’s ‘Options’ parameter as shown in the following code snippet:

var ffmArgs = new FieldAwareFactorizationMachineTrainer.Options();

// Create the multiple field names.
ffmArgs.FeatureColumn = nameof(MyObservationClass.MyField1); // First field.
ffmArgs.ExtraFeatureColumns = new[]{ nameof(MyObservationClass.MyField2), nameof(MyObservationClass.MyField3) }; // Additional fields.

var pipeline = mlContext.BinaryClassification.Trainers.FieldAwareFactorizationMachine(ffmArgs);

var model = pipeline.Fit(dataView);

You can see additional code example details in this code

Additional updates in v0.10 timeframe

Support for returning multiple predicted labels

Until ML.NET v0.9, when predicting (for instance with a multi-class classification model), you could only predict and return a single label. That’s an issue for many business scenarios. For instance, in an eCommerce scenario, you could want to automatically classify a product and assign it to multiple product categories instead of just a single category.

However, when predicting, ML.NET internally already had a list of the multiple possible predictions with a score/proability per each in the schema’s data, but the API was simply not returning the list of possible predicted labels but a single one.

Therefore, this improvement allows you to access the schema’s data so you can get a list of the predicted labels which can then be related to their scores/proabilities provided by the float[] Score array in your Prediction class, such as in this sample prediction class.

For additional info check this code example

Minor updates in 0.10

  • Introducing Microsoft.ML.Recommender NuGet name instead of Microsoft.ML.MatrixFactorization name: Microsoft.ML.Recommender] is a better naming for NuGet packages based on the scenario (Recommendations) instead of the trainer’s name (Microsoft.ML.MatrixFactorization).
  • Added support in TensorFlow for using using text and sparse input in TensorFlow: Specifically, this adds support for loading a map from a file through dataview by using ValueMapperTransformer. This provides support for additional scenarios like a Text/NLP scenario) in TensorFlowTransform where model’s expected input is vector of integers.
  • Added Tensorflow unfrozen models support in GetModelSchema: For a code example loading an unfrozen TensorFlow check it out here.

Breaking changes in ML.NET 0.10

For your convenience, if you are moving your code from ML.NET v0.9 to v0.10, you can check out the breaking changes list that impacted our samples.

Instrumented code coverage tools as part of the ML.NET CI systems

We have also instrumented code coverage tools (using https://codecov.io/) as part of our CI systems and will continue to push for stability and quality in the code.

You can check it out here which is also a link in the home page of the ML.NET repo:

alt text

Once you click on that link, you’ll see the current code coverage for ML.NET:

alt text

Explore the community samples and share yours!!

As part of the ML.NET Samples repo we also have a special Community Samples page pointing to multiple samples provided by the community. These samples are not maintained by Microsoft but are very interesting and cover additional scenarios not covered by us.

Here’s an screenshot of the current community samples:

alt text

There are pretty cool samples like the following:

‘Photo-Search’ WPF app running a TensorFlow model exported to ONNX format

alt text

UWP app using ML.NET

alt text

Other very interesting samples are:

Share your sample with the ML.NET community!

alt text

We encourage you to share your ML.NET demos and samples with the community by simply submitting its brief description and URL pointing to your GitHub repo or blog posts, into this repo issue “Request for your samples!”.

We’ll do the rest and publish it at the ML.NET Community Samples page!

Planning to go to production?

alt text

If you are using ML.NET in your app and looking to go into production, you can talk to an engineer on the ML.NET team to:

  1. Get help implementing ML.NET successfully in your application.
  2. Demo your app and potentially have it featured on the .NET Blog, dot.net site, or other Microsoft channel.

Fill out this form and someone from the ML.NET team will contact you.

Get started!

alt text

If you haven’t already get started with ML.NET here.

Next, going further explore some other resources:

We will appreciate your feedback by filing issues with any suggestions or enhancements in the ML.NET GitHub repo to help us shape ML.NET and make .NET a great platform of choice for Machine Learning.

Thanks and happy coding with ML.NET!

The ML.NET Team.

This blog was authored by Cesar de la Torre and Eric Erhardt plus additional contributions of the ML.NET team

Author

Cesar De la Torre
Principal Program Manager

Principal Program Manager at the Azure team.

0 comments

Discussion are closed.