Announcing ML.NET 1.4 Preview and Model Builder updates (Machine Learning for .NET)

Cesar De la Torre

Cesar

We are excited to announce ML.NET 1.4 Preview and updates to Model Builder and CLI.

ML.NET is an open-source and cross-platform machine learning framework for .NET developers. ML.NET also includes Model Builder (a simple UI tool) and CLI to make it super easy to build custom Machine Learning (ML) models using Automated Machine Learning (AutoML).

Using ML.NET, developers can leverage their existing tools and skillsets to develop and infuse custom ML into their applications by creating custom machine learning models for common scenarios like Sentiment Analysis, Price Prediction, Sales Forecast prediction, Image Classification and more!

Following are some of the key highlights in this update:

ML.NET Updates

ML.NET 1.4 Preview is a backwards compatible release with no breaking changes so please update to get the latest changes.

In addition to bug fixes described here, in ML.NET 1.4 Preview we have released some exciting new features that are described in the following sections.

Database Loader (Preview)

DatabaseLoader in ML.NET

This feature introduces a native database loader that enables training directly against relational databases. This loader supports any relational database provider supported by System.Data in .NET Core or .NET Framework, meaning that you can use any RDBMS such as SQL Server, Azure SQL Database, Oracle, SQLite, PostgreSQL, MySQL, Progress, IBM DB2, etc.

In previous ML.NET releases, since ML.NET 1.0, you could also train against a relational database by providing data through an IEnumerable collection by using the LoadFromEnumerable() API where the data could be coming from a relational database or any other source. However, when using that approach, you as a developer are responsible for the code reading from the relational database (such as using Entity Framework or any other approach) which needs to be implemented properly so you are streaming data while training the ML model, as in this previous sample using LoadFromEnumerable().

However, this new Database Loader provides a much simpler code implementation for you since the way it reads from the database and makes data available through the IDataView is provided out-of-the-box by the ML.NET framework so you just need to specify your database connection string, what’s the SQL statement for the dataset columns and what’s the data-class to use when loading the data. It is that simple!

Here’s example code on how easily you can now configure your code to load data directly from a relational database into an IDataView which will be used later on when training your model.

This feature is in preview and can be accessed via the Microsoft.ML.Experimental v0.16-Preview nuget package available here.

For further learning see this complete sample app using the new DatabaseLoader.

Image classification with deep neural networks retraining (Preview)

This new feature enables native DNN transfer learning with ML.NET, targeting image classification as our first high level scenario.

For instance, with this feature you can create your own custom image classifier model by natively training a TensorFlow model from ML.NET API with your own images.

Image classifier scenario – Train your own custom deep learning model with ML.NET

 

In order to use TensorFlow, ML.NET is internally taking dependency on the Tensorflow.NET library.

The Tensorflow.NET library is an open source and low level API library that provides the .NET Standard bindings for TensorFlow. That library is part of the SciSharp stack libraries.

Microsoft (the ML.NET team) is closely working with the TensorFlow.NET library team not just for providing higher level APIs for the users in ML.NET (such as our new ImageClassification API) but also helping to improve and evolve the Tensorflow.NET library as an open source project.

We would like to acknowledge the effort and say thank you to the Tensorflow.NET library team for their agility and great collaboration with us.

The stack diagram below shows how ML.NET implements these new DNN training features. Although we currently only support training TensorFlow models, PyTorch support is in the roadmap.

As the first main scenario for high level APIs, we are currently focusing on image classification. The goal of these new high-level APIs is to provide powerful and easy to use interfaces for DNN training scenarios like image classification, object detection and text classification.

The below API code example shows how easily you can train a new TensorFlow model which under the covers is based on transfer learning from a selected architecture (pre-trained model) such as Inception v3 or Resnet.

Image classifier high level API code using transfer learning from Inceptionv3 pre-trained model

The important line in the above code is the one using the mlContext.Model.ImageClassification classifier trainer which as you can see is a high level API where you just need to select the base pre-trained model to derive from, in this case Inception v3, but you could also select other pre-trained models such as Resnet v2101. Inception v3 is a widely used image recognition model trained on the ImageNet dataset. Those pre-trained models or architectures are the culmination of many ideas developed by multiple researchers over the years and you can easily take advantage of it now.

The DNN Image Classification training API is still in early preview and we hope to get feedback from you that we can incorporate in the next upcoming releases.

For further learning see this sample app training a custom TensorFlow model with provided images.

Enhanced for .NET Core 3.0

ML.NET is now building for .NET Core 3.0. This means ML.NET can take advantage of the new features when running in a .NET Core 3.0 application. The first new feature we are using is the new hardware intrinsics feature, which allows .NET code to accelerate math operations by using processor specific instructions.

Of course, you can still run ML.NET on older versions, but when running on .NET Framework, or .NET Core 2.2 and below, ML.NET uses C++ code that is hard-coded to x86-based SSE instructions. SSE instructions allow for four 32-bit floating-point numbers to be processed in a single instruction. Modern x86-based processors also support AVX instructions, which allow for processing eight 32-bit floating-point numbers in one instruction. ML.NET’s C# hardware intrinsics code supports both AVX and SSE instructions and will use the best one available. This means when training on a modern processor, ML.NET will now train faster because it can do more concurrent floating-point operations than it could with the existing C++ code that only supported SSE instructions.

Another advantage the C# hardware intrinsics code brings is that when neither SSE nor AVX are supported by the processor, for example on an ARM chip, ML.NET will fall back to doing the math operations one number at a time. This means more processor architectures are now supported by the core ML.NET components. (Note: There are still some components that don’t work on ARM processors, for example FastTree, LightGBM, and OnnxTransformer. These components are written in C++ code that is not currently compiled for ARM processors.)

For more information on how ML.NET uses the new hardware intrinsics APIs in .NET Core 3.0, please check out Brian Lui’s blog post Using .NET Hardware Intrinsics API to accelerate machine learning scenarios.

Model Builder in VS and CLI updated to latest GA version

The Model Builder tool in Visual Studio and the ML.NET CLI (both in preview) have been updated to use the latest ML.NET GA version (1.3) and addresses lots of customer feedback. Learn more about the changes here.

Model Builder updated to latest ML.NET GA version

Model Builder uses the latest GA version of ML.NET (1.3) and therefore the generated C# code also references ML.NET 1.3.

Improved support for other OS cultures

This addresses many frequently reported issues where developers want to use their own local culture OS settings to train a model in Model Builder. Please read this issue for more details.

Customer feedback addressed for Model Builder

There were many issues fixed in this release. Learn more in the release notes.

New sample apps

Coinciding with this new release, we’re also announcing new interesting sample apps covering additional scenarios:

  Sales forecast scenario based on Time Series SSA (Single Spectrum Analysis)
  Credit Card Fraud Detection scenario based on Anomaly Detection PCA
  Search engine sorted results scenario based on Ranking task
  Model Explainability and feature importance
  Database Loader (Native Database Loader for relational databases)
  Deep Learning training: Image Classification DNN re-train (Transfer Learning)
Scalable ML.NET model on ASP.NET Core Razor web app (C#)
  Scalable ML.NET model on Azure Function (C#)

 

New ML.NET video playlist at YouTube

We have created a ML.NET Youtube playlist at the .NET foundation channel with a list made of selected videos, each video focusing on a single and particular ML.NET feature, so it is great for learning purposes.

Access here the ML.NET Youtube playlist.

 

Try ML.NET and Model Builder today!

Summary

We are excited to release these updates for you and we look forward to seeing what you will build with ML.NET. If you have any questions or feedback, you can ask them here for ML.NET and Model Builder.

Happy coding!

The ML.NET team.

This blog was authored by Cesar de la Torre and Eric Erhardt plus additional contributions of the ML.NET team.

 

Acknowledgements

  • As mentioned above, we would like to acknowledge the effort and say thank you to the Tensorflow.NET library team for their agility and great collaboration with us. Special kudos for Haiping (Oceania2018.)
  • Special thanks for Jon Wood (@JWood) for his many and great YouTube videos on ML.NET that we’re also pointing from our ML.NET YouTube playlist mentioned in the blog post. Also, thanks for being an early adopter and tester for the new DatabaseLoader.

 

Cesar De la Torre
Cesar De la Torre

Principal Program Manager, .NET

Follow Cesar   

15 comments

  • Avatar
    Max Mustermueller

    I really find it hard to step into the world of ML.NET because although I have a clear idea of what I want, I have no idea how to achieve this. There are sample applications that do work but the syntax is too difficult and less explained that I can understand how to make this work on my purposes.

    Let’s say I want to add face recognization. I have a bunch of pictures of myself and of random people and I want to find myself on pictures. Where do I start? Do I create a folder called “me” and put all images in there? Ok and how to train a machine knowing that it should identify my face on these pictures in a folder called “me”? And how to I run it against other pictures telling the machine to focus on faces?

    There is an “Image Classification” example. But the explanation is terrible. ”
    mlContext.Transforms.Conversion.MapValueToKey(outputColumnName: “LabelAsKey”,
    inputColumnName: “Label”,
    keyOrdinality: ValueToKeyMappingEstimator.KeyOrdinality.ByValue)
    .Append(mlContext.Model.ImageClassification(“ImagePath”, “LabelAsKey”,
    arch: ImageClassificationEstimator.Architecture.InceptionV3,
    epoch: 100,
    batchSize: 30,
    metricsCallback: (metrics) => Console.WriteLine(metrics)));”Is explained as ” you define the model’s training pipeline where you can see how easily you can train a new TensorFlow model” well for someone knowing what these lines do it might be “EASY” to see but I just ask myself what the hell is outputColumName supposed to be, arch? epoch? batchsize? metrocallback? I would love to see a basic walkthrough through ML.NET for completely beginners taking a real world scenario and explaining it from ground up by what you need first, how the data should be stored so ML.NET can read it, then over the syntax language and so on. As of now all of these samples seem to be only useful for those who understand ML.NET already.

    • Cesar De la Torre
      Cesar De la Torre

      Hi Max. Thanks a lot for your feedback. I completely a gree on it. However, take into account that this feature (the new Image Classification API natively training a TensorFlow model under the covers) is still in Preview and we’re actually working on having further documentation and tutorials/walkthroughs to be published at https://docs.microsoft.com/en-us/dotnet/machine-learning/ when the API is GA. But today we’re simply making the Preview available for early adopters.

      In addition to that, that API might get further simplifications and improvements, that’s why it is still in Preview. 🙂

      In the meantime, if you want to go ahead and try to create your own image classifier model with it, feel free to send me an email to cesardl at microsoft.com so I can explain any question you might have about that code and help you while you also provide additional feedback.  

    • Cesar De la Torre
      Cesar De la Torre

      Not sure wht you mean with ‘replaced’. In general? Within ML.NET?

      In any case, for ML.NET, it does. Originally we were using code from TensorFlowSharp, but since release 1.3.1, ML.NET is taking dependency on TensorFlow.NET, including when simply scoring a TensorFlow model

    • Cesar De la Torre
      Cesar De la Torre

      If those data sources (Azure Storage Tables, Azure Cosmos DB or any other data source is important for your business scenarios, in addition to Relational Databases, please submit an issue at the ML.NET GitHub repo requesting it and explaining why those sources would work better for your scenarios, etc., ok?

      That would help us on prioritize our backlog depending on the needs of the users like you. 🙂

  • Avatar
    Mario M.

    There is a big problem with the model builder, it does not support vector columns. I have tried to export the data to text file directly from ML.Net with SaveAsText and the model builder cannot load the vector colums and cannot identify the columns. The same data text file can be loaded without any problem with ML.Net
    Also with the SQL database I don’t know how to create the data with vectors in order to load it in model builder.

  • Avatar
    Stephen Eldridge

    It’s a pity about
    Console.WriteLine($”Predicted Label: {clickPrediction.PredictedLabel} – Score:{Sigmoid(clickPrediction.Score)}”, Color.YellowGreen);
    in the DatabaseLoader example. You must have meant to use to ColorfulConsole Nuget package. OK, but ConsoleHelper uses System.ConsoleColor. It is remarkable that one can see scores at all 🙂
    It would also be good to know what kind of results you expect the program to produce. This also applies to practically every other every other sample program.

    • Cesar De la Torre
      Cesar De la Torre

      Hi Hernando. Good point. So, currently even when ML.NET is cross platform (Windows, Linux, macOS) thanks to .NET Core, there are many internal ML algorithms implemented natively in C/C++. Those ‘native’ areas don’t support ARM processors, currently. That means that running ML.NET on ARM based devices such as iOS or Android (Xamarin target) and also IoT ARM-based devices is not supported by ML.NET since in terms of processors we just support x64 and x86 (See our current OS and processors support here: https://github.com/dotnet/machinelearning#operating-systems-and-processor-architectures-supported-by-mlnet).

      Currently, there’s a clear workaorund for you which is to run the ML.NET models in HTTP services (or any remote solution running on a server) which would be consumed by the Xamarin apps.

      However, support for ARM processors (and hence support for Xamarin) is in our roadmap and backlog.

      Important, if you want to influence the priorities of our roadmap/backlog priorities, please do provide your feedback/requests in the ML.NET site repo as an issue or write your feedback in existing issues. 

      For this topic, please, write your feedback and why this scenario is important for you in the following issue (Feel free to re-open or create a new issue and link to this one):

      https://github.com/dotnet/machinelearning/issues/1790

      Thanks for your feedback,

  • Avatar
    LOST _

    I hope you guys understand what you are doing when taking a dependency on TensorFlow.NET
    At least TensorFlowSharp was properly auto-generated from public C interface.

    TensorFlow.NET team is quite misleading. They advertise on their NuGet package, that they support full TensorFlow API. And you can find most classes in the package, but when you look at the source code, they are basically empty even with no NotImplementedException being thrown. For instance: https://github.com/SciSharp/TensorFlow.NET/blob/master/src/TensorFlowNET.Core/Train/GradientDescentOptimizer.cs

    And here’s the real one: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/training/gradient_descent.py for comparison.

    I know because I am making a competing product, and pointed this problem out to them before, so since they implemented AdamOptimizer by hand-converting source code from Python. But will they maintain that converted code? I highly doubt that.

    So just beware, that if you expose functionality in ML.NET through TensorFlow.NET some pieces might just silently do nothing.

Leave a comment