September 3rd, 2019

Announcing ML.NET 1.4 Preview and Model Builder updates (Machine Learning for .NET)

Cesar De la Torre
Principal Program Manager

We are excited to announce ML.NET 1.4 Preview and updates to Model Builder and CLI.

ML.NET is an open-source and cross-platform machine learning framework for .NET developers. ML.NET also includes Model Builder (a simple UI tool) and CLI to make it super easy to build custom Machine Learning (ML) models using Automated Machine Learning (AutoML).

Using ML.NET, developers can leverage their existing tools and skillsets to develop and infuse custom ML into their applications by creating custom machine learning models for common scenarios like Sentiment Analysis, Price Prediction, Sales Forecast prediction, Image Classification and more!

Following are some of the key highlights in this update:

ML.NET Updates

ML.NET 1.4 Preview is a backwards compatible release with no breaking changes so please update to get the latest changes.

In addition to bug fixes described here, in ML.NET 1.4 Preview we have released some exciting new features that are described in the following sections.

Database Loader (Preview)

DatabaseLoader in ML.NET

This feature introduces a native database loader that enables training directly against relational databases. This loader supports any relational database provider supported by System.Data in .NET Core or .NET Framework, meaning that you can use any RDBMS such as SQL Server, Azure SQL Database, Oracle, SQLite, PostgreSQL, MySQL, Progress, IBM DB2, etc.

In previous ML.NET releases, since ML.NET 1.0, you could also train against a relational database by providing data through an IEnumerable collection by using the LoadFromEnumerable() API where the data could be coming from a relational database or any other source. However, when using that approach, you as a developer are responsible for the code reading from the relational database (such as using Entity Framework or any other approach) which needs to be implemented properly so you are streaming data while training the ML model, as in this previous sample using LoadFromEnumerable().

However, this new Database Loader provides a much simpler code implementation for you since the way it reads from the database and makes data available through the IDataView is provided out-of-the-box by the ML.NET framework so you just need to specify your database connection string, what’s the SQL statement for the dataset columns and what’s the data-class to use when loading the data. It is that simple!

Here’s example code on how easily you can now configure your code to load data directly from a relational database into an IDataView which will be used later on when training your model.

//Lines of code for loading data from a database into an IDataView for a later model training

string connectionString = @"Data Source=YOUR_SERVER;Initial Catalog= YOUR_DATABASE;Integrated Security=True";
string commandText = "SELECT * from SentimentDataset";
DatabaseLoader loader = mlContext.Data.CreateDatabaseLoader();
DatabaseSource dbSource = new DatabaseSource(SqlClientFactory.Instance, connectionString, commandText);

IDataView trainingDataView = loader.Load(dbSource);

// ML.NET model training code using the training IDataView
//...

public class SentimentData
{
    public string FeedbackText;
    public string Label;
}

This feature is in preview and can be accessed via the Microsoft.ML.Experimental v0.16-Preview nuget package available here.

For further learning see this complete sample app using the new DatabaseLoader.

Image classification with deep neural networks retraining (Preview)

This new feature enables native DNN transfer learning with ML.NET, targeting image classification as our first high level scenario.

For instance, with this feature you can create your own custom image classifier model by natively training a TensorFlow model from ML.NET API with your own images.

Image classifier scenario – Train your own custom deep learning model with ML.NET

 

In order to use TensorFlow, ML.NET is internally taking dependency on the Tensorflow.NET library.

The Tensorflow.NET library is an open source and low level API library that provides the .NET Standard bindings for TensorFlow. That library is part of the SciSharp stack libraries.

Microsoft (the ML.NET team) is closely working with the TensorFlow.NET library team not just for providing higher level APIs for the users in ML.NET (such as our new ImageClassification API) but also helping to improve and evolve the Tensorflow.NET library as an open source project.

We would like to acknowledge the effort and say thank you to the Tensorflow.NET library team for their agility and great collaboration with us.

The stack diagram below shows how ML.NET implements these new DNN training features. Although we currently only support training TensorFlow models, PyTorch support is in the roadmap.

As the first main scenario for high level APIs, we are currently focusing on image classification. The goal of these new high-level APIs is to provide powerful and easy to use interfaces for DNN training scenarios like image classification, object detection and text classification.

The below API code example shows how easily you can train a new TensorFlow model which under the covers is based on transfer learning from a selected architecture (pre-trained model) such as Inception v3 or Resnet.

Image classifier high level API code using transfer learning from Inceptionv3 pre-trained model

var pipeline = mlContext.Transforms.Conversion.MapValueToKey(outputColumnName: "LabelAsKey", inputColumnName: "Label")
               .Append(mlContext.Model.ImageClassification("ImagePath", "LabelAsKey",
                            arch: ImageClassificationEstimator.Architecture.InceptionV3));  //Can also use ResnetV2101
                            
// Train the model
ITransformer trainedModel = pipeline.Fit(trainDataView);

The important line in the above code is the one using the mlContext.Model.ImageClassification classifier trainer which as you can see is a high level API where you just need to select the base pre-trained model to derive from, in this case Inception v3, but you could also select other pre-trained models such as Resnet v2101. Inception v3 is a widely used image recognition model trained on the ImageNet dataset. Those pre-trained models or architectures are the culmination of many ideas developed by multiple researchers over the years and you can easily take advantage of it now.

The DNN Image Classification training API is still in early preview and we hope to get feedback from you that we can incorporate in the next upcoming releases.

For further learning see this sample app training a custom TensorFlow model with provided images.

Enhanced for .NET Core 3.0

ML.NET is now building for .NET Core 3.0. This means ML.NET can take advantage of the new features when running in a .NET Core 3.0 application. The first new feature we are using is the new hardware intrinsics feature, which allows .NET code to accelerate math operations by using processor specific instructions.

Of course, you can still run ML.NET on older versions, but when running on .NET Framework, or .NET Core 2.2 and below, ML.NET uses C++ code that is hard-coded to x86-based SSE instructions. SSE instructions allow for four 32-bit floating-point numbers to be processed in a single instruction. Modern x86-based processors also support AVX instructions, which allow for processing eight 32-bit floating-point numbers in one instruction. ML.NET’s C# hardware intrinsics code supports both AVX and SSE instructions and will use the best one available. This means when training on a modern processor, ML.NET will now train faster because it can do more concurrent floating-point operations than it could with the existing C++ code that only supported SSE instructions.

Another advantage the C# hardware intrinsics code brings is that when neither SSE nor AVX are supported by the processor, for example on an ARM chip, ML.NET will fall back to doing the math operations one number at a time. This means more processor architectures are now supported by the core ML.NET components. (Note: There are still some components that don’t work on ARM processors, for example FastTree, LightGBM, and OnnxTransformer. These components are written in C++ code that is not currently compiled for ARM processors.)

For more information on how ML.NET uses the new hardware intrinsics APIs in .NET Core 3.0, please check out Brian Lui’s blog post Using .NET Hardware Intrinsics API to accelerate machine learning scenarios.

Model Builder in VS and CLI updated to latest GA version

The Model Builder tool in Visual Studio and the ML.NET CLI (both in preview) have been updated to use the latest ML.NET GA version (1.3) and addresses lots of customer feedback. Learn more about the changes here.

Model Builder updated to latest ML.NET GA version

Model Builder uses the latest GA version of ML.NET (1.3) and therefore the generated C# code also references ML.NET 1.3.

Improved support for other OS cultures

This addresses many frequently reported issues where developers want to use their own local culture OS settings to train a model in Model Builder. Please read this issue for more details.

Customer feedback addressed for Model Builder

There were many issues fixed in this release. Learn more in the release notes.

New sample apps

Coinciding with this new release, we’re also announcing new interesting sample apps covering additional scenarios:

  Sales forecast scenario based on Time Series SSA (Single Spectrum Analysis)
  Credit Card Fraud Detection scenario based on Anomaly Detection PCA
  Search engine sorted results scenario based on Ranking task
  Model Explainability and feature importance
  Database Loader (Native Database Loader for relational databases)
  Deep Learning training: Image Classification DNN re-train (Transfer Learning)
Scalable ML.NET model on ASP.NET Core Razor web app (C#)
  Scalable ML.NET model on Azure Function (C#)

 

New ML.NET video playlist at YouTube

We have created a ML.NET Youtube playlist at the .NET foundation channel with a list made of selected videos, each video focusing on a single and particular ML.NET feature, so it is great for learning purposes.

Access here the ML.NET Youtube playlist.

 

Try ML.NET and Model Builder today!

Summary

We are excited to release these updates for you and we look forward to seeing what you will build with ML.NET. If you have any questions or feedback, you can ask them here for ML.NET and Model Builder.

Happy coding!

The ML.NET team.

This blog was authored by Cesar de la Torre and Eric Erhardt plus additional contributions of the ML.NET team.

 

Acknowledgements

  • As mentioned above, we would like to acknowledge the effort and say thank you to the Tensorflow.NET library team for their agility and great collaboration with us. Special kudos for Haiping (Oceania2018.)
  • Special thanks for Jon Wood (@JWood) for his many and great YouTube videos on ML.NET that we’re also pointing from our ML.NET YouTube playlist mentioned in the blog post. Also, thanks for being an early adopter and tester for the new DatabaseLoader.

 

Author

Cesar De la Torre
Principal Program Manager

Principal Program Manager at the Azure team.

27 comments

Discussion is closed. Login to edit/delete existing comments.

  • Win Pooh

    I run the demo DatabaseIntegration, there is the code in the Program.cs :

    Console.WriteLine(“Training model…”);
    var model = pipeline.Fit(trainTestData.TrainSet);

    i.e. every time the model trained and created.
    The question: Is it possible to save the model in a database after it once trained and created and then reuse the saved one?
    It can be saved to zip file and loaded. But how to save/load it to database?

    • Cesar De la TorreMicrosoft employee Author

      An ML.NET model can only be serialized and saved as a .ZIP file which can be stored in any place you can store files.
      The most common approach is to store it as a file as part of the assets/resources of an application but it could also be stored as a file in a Blob in Azure Storage blobs, or in any Http repository.

      About storing the .zip file as into a database.. could be if...

      Read more
      • Win Pooh

        Hi,
        Ok, for example, my scenario is:
        I provide a client->server system to customers.
        The client app contains some ML methods to find similar texts, to predict something etc.
        And trained model provided in the db.
        I.e. all applications work with this db and use the same model instead of zip file on every client machine.
        I decide to retrain the model by some reasons and refresh it in the customer environment.
        In case...

        Read more
      • Cesar De la TorreMicrosoft employee Author

        Well, it depends on the scenario. If the app is a web app or you have services (such as an ASP.NET WebAPI service), you can also load the ML.NET model from a remote HTTP endpoint like in the link below, but using FromUri() instead of .FromFile():

        https://docs.microsoft.com/en-us/dotnet/machine-learning/how-to-guides/serve-model-web-api-ml-net

        But I agree that if you want to run code “Within SQL Server” (such as a C# SQL Server Function or Stored Procedure), it makes sense to have everything you...

        Read more
      • Win Pooh

        Hi Cesar,
        Is this feature in your plans? When we can wait it? Thanx! 🙂

      • Win Pooh

        Thank you, great! I have added comment to the issue.

  • Win Pooh

    The new version is great! Especially new features to work with database.

    How to train and generate the model for Word2Vec ? And how to use it for words vectors like: king+woman-man=queen?
    I have build this for NET implementation of Word2Vec.
    I could not find how to do it in ML.NET. May be there are any samples?

    • Cesar De la TorreMicrosoft employee Author

      @Win Pooh - Thanks for your feedback. About featurizing text into numeric vectors the way yo do it in ML.NET is with the transformer-estimator called 'FeaturizeText' that you can use from: 'mlContext.Transforms.Text.FeaturizeText'.

      You can see an example here (Sentiment Analysis):
      https://github.com/dotnet/machinelearning-samples/blob/master/samples/csharp/getting-started/BinaryClassification_SentimentAnalysis/SentimentAnalysis/SentimentAnalysisConsoleApp/Program.cs#L36

      However, simply featurizing text into numeric vectors might not be exactly what you mean? If so, can you elaborate further on the reasons why you think a specific Word2Vec DNN integration implementation would be important for...

      Read more
      • Win Pooh

        @Cesar De la Torre Thank you for the answer.
        That is my idea:
        I have a book store database, one table contains books title, amount etc etc and description.
        User has found a book by description and wants to find top N similar (by description) books.
        Yes, it is not easy to implement, but I'd like to try to find some solutions. In any case it is interesting task :-).
        It may be Word2Vec based...

        Read more
      • Cesar De la TorreMicrosoft employee Author

        We have customers using a similar approach than the one we use for 'GitHub issues automatic labeling sample' (https://github.com/dotnet/machinelearning-samples/tree/master/samples/csharp/end-to-end-apps/MulticlassClassification-GitHubLabeler) but for detecting/recognizing multiple languages (en, de, it, etc...) or many other business scenarios.
        It all depends on how the data is labeled.

        Or, if the data is not labeled, then you could featurize the text then do a "books segmentation" by using a clustering algorithm?

        Could you try a similar approach than the 'GitHub issues automatic labeling...

        Read more
      • Win Pooh

        Ok, I will check these ideas and let you know, thank you.

  • LOST

    I hope you guys understand what you are doing when taking a dependency on TensorFlow.NET
    At least TensorFlowSharp was properly auto-generated from public C interface.

    TensorFlow.NET team is quite misleading. They advertise on their NuGet package, that they support full TensorFlow API. And you can find most classes in the package, but when you look at the source code, they are basically empty even with no NotImplementedException being thrown. For instance: https://github.com/SciSharp/TensorFlow.NET/blob/master/src/TensorFlowNET.Core/Train/GradientDescentOptimizer.cs

    And here's the real one:...

    Read more
    • Zeeshan SiddiquiMicrosoft employee

      @LOST_ Yes, we know what we are doing and we also know what you are doing :-) TensorFlowSharp is also a Microsoft product and its author himself has endorsed TF .NET, please see https://twitter.com/migueldeicaza/status/1157385979071778817. Lets not bash people like that, there is merit to the work TF .NET has done and we really appreciate our collaboration with them. We also appreciate the quick fixes they have made for us to make the product more reliable...

      Read more
  • Hernando Zambrano

     Hi i, great article, is it possible  to use the model in a Xamarin app ? 
    Thanks 

    • Cesar De la TorreMicrosoft employee Author

      Hi Hernando. Good point. So, currently even when ML.NET is cross platform (Windows, Linux, macOS) thanks to .NET Core, there are many internal ML algorithms implemented natively in C/C++. Those 'native' areas don't support ARM processors, currently. That means that running ML.NET on ARM based devices such as iOS or Android (Xamarin target) and also IoT ARM-based devices is not supported by ML.NET since in terms of processors we just support x64 and x86 (See...

      Read more
      • Hernando Zambrano

        Thanks Cesar,

        Yes I’ll definitely give feedback to support ARM processors .

  • Ashkan

    I want create a bot using ML.Net, can you tell me what features can be usefull for this?

  • Stephen Eldridge

    It's a pity about
    Console.WriteLine($"Predicted Label: {clickPrediction.PredictedLabel} - Score:{Sigmoid(clickPrediction.Score)}", Color.YellowGreen);
    in the DatabaseLoader example. You must have meant to use to ColorfulConsole Nuget package. OK, but ConsoleHelper uses System.ConsoleColor. It is remarkable that one can see scores at all :-)
    It would also be good to know what kind of results you expect the program to produce. This also applies to practically every other every other sample program.

    Read more
  • Mario M.

    There is a big problem with the model builder, it does not support vector columns. I have tried to export the data to text file directly from ML.Net with SaveAsText and the model builder cannot load the vector colums and cannot identify the columns. The same data text file can be loaded without any problem with ML.Net
    Also with the SQL database I don't know how to create the data with vectors in order to...

    Read more
    • Cesar De la TorreMicrosoft employee Author

      @Mario M. – For that problem with Model Builder can you submit and issue at the Model Builder Repo here?:
      https://github.com/dotnet/machinelearning-modelbuilder

      I know Model Builder in VS lacks of support for that feature (support vector columns) as of today, but the more feedback we get from users requesting it, the higher priority it’ll have in our backlog. 🙂

  • King David Consulting LLC

    Great work! How about support for Azure Storage Tables or CosmosDb?

    • Cesar De la TorreMicrosoft employee Author

      If those data sources (Azure Storage Tables, Azure Cosmos DB or any other data source is important for your business scenarios, in addition to Relational Databases, please submit an issue at the ML.NET GitHub repo requesting it and explaining why those sources would work better for your scenarios, etc., ok?

      That would help us on prioritize our backlog depending on the needs of the users like you. 🙂

  • T.I Ali

    Has the Tensorflow .net library replaced the TensorFlowSharp version of the Tensorflow bindings?

    • Cesar De la TorreMicrosoft employee Author

      Not sure wht you mean with ‘replaced’. In general? Within ML.NET?

      In any case, for ML.NET, it does. Originally we were using code from TensorFlowSharp, but since release 1.3.1, ML.NET is taking dependency on TensorFlow.NET, including when simply scoring a TensorFlow model

  • Max Mustermueller

    I really find it hard to step into the world of ML.NET because although I have a clear idea of what I want, I have no idea how to achieve this. There are sample applications that do work but the syntax is too difficult and less explained that I can understand how to make this work on my purposes.

    Let's say I want to add face recognization. I have a bunch of pictures of myself and...

    Read more
    • Cesar De la TorreMicrosoft employee Author

      Hi Max. Thanks a lot for your feedback. I completely a gree on it. However, take into account that this feature (the new Image Classification API natively training a TensorFlow model under the covers) is still in Preview and we're actually working on having further documentation and tutorials/walkthroughs to be published at https://docs.microsoft.com/en-us/dotnet/machine-learning/ when the API is GA. But today we're simply making the Preview available for early adopters.
      In addition to that, that API...

      Read more