Announcing ML.NET 1.4 Preview and Model Builder updates (Machine Learning for .NET)

Cesar De la Torre

We are excited to announce ML.NET 1.4 Preview and updates to Model Builder and CLI.

ML.NET is an open-source and cross-platform machine learning framework for .NET developers. ML.NET also includes Model Builder (a simple UI tool) and CLI to make it super easy to build custom Machine Learning (ML) models using Automated Machine Learning (AutoML).

Using ML.NET, developers can leverage their existing tools and skillsets to develop and infuse custom ML into their applications by creating custom machine learning models for common scenarios like Sentiment Analysis, Price Prediction, Sales Forecast prediction, Image Classification and more!

Following are some of the key highlights in this update:

ML.NET Updates

ML.NET 1.4 Preview is a backwards compatible release with no breaking changes so please update to get the latest changes.

In addition to bug fixes described here, in ML.NET 1.4 Preview we have released some exciting new features that are described in the following sections.

Database Loader (Preview)

DatabaseLoader in ML.NET

This feature introduces a native database loader that enables training directly against relational databases. This loader supports any relational database provider supported by System.Data in .NET Core or .NET Framework, meaning that you can use any RDBMS such as SQL Server, Azure SQL Database, Oracle, SQLite, PostgreSQL, MySQL, Progress, IBM DB2, etc.

In previous ML.NET releases, since ML.NET 1.0, you could also train against a relational database by providing data through an IEnumerable collection by using the LoadFromEnumerable() API where the data could be coming from a relational database or any other source. However, when using that approach, you as a developer are responsible for the code reading from the relational database (such as using Entity Framework or any other approach) which needs to be implemented properly so you are streaming data while training the ML model, as in this previous sample using LoadFromEnumerable().

However, this new Database Loader provides a much simpler code implementation for you since the way it reads from the database and makes data available through the IDataView is provided out-of-the-box by the ML.NET framework so you just need to specify your database connection string, what’s the SQL statement for the dataset columns and what’s the data-class to use when loading the data. It is that simple!

Here’s example code on how easily you can now configure your code to load data directly from a relational database into an IDataView which will be used later on when training your model.

//Lines of code for loading data from a database into an IDataView for a later model training

string connectionString = @"Data Source=YOUR_SERVER;Initial Catalog= YOUR_DATABASE;Integrated Security=True";
string commandText = "SELECT * from SentimentDataset";
DatabaseLoader loader = mlContext.Data.CreateDatabaseLoader();
DatabaseSource dbSource = new DatabaseSource(SqlClientFactory.Instance, connectionString, commandText);

IDataView trainingDataView = loader.Load(dbSource);

// ML.NET model training code using the training IDataView
//...

public class SentimentData
{
    public string FeedbackText;
    public string Label;
}

This feature is in preview and can be accessed via the Microsoft.ML.Experimental v0.16-Preview nuget package available here.

For further learning see this complete sample app using the new DatabaseLoader.

Image classification with deep neural networks retraining (Preview)

This new feature enables native DNN transfer learning with ML.NET, targeting image classification as our first high level scenario.

For instance, with this feature you can create your own custom image classifier model by natively training a TensorFlow model from ML.NET API with your own images.

Image classifier scenario – Train your own custom deep learning model with ML.NET

 

In order to use TensorFlow, ML.NET is internally taking dependency on the Tensorflow.NET library.

The Tensorflow.NET library is an open source and low level API library that provides the .NET Standard bindings for TensorFlow. That library is part of the SciSharp stack libraries.

Microsoft (the ML.NET team) is closely working with the TensorFlow.NET library team not just for providing higher level APIs for the users in ML.NET (such as our new ImageClassification API) but also helping to improve and evolve the Tensorflow.NET library as an open source project.

We would like to acknowledge the effort and say thank you to the Tensorflow.NET library team for their agility and great collaboration with us.

The stack diagram below shows how ML.NET implements these new DNN training features. Although we currently only support training TensorFlow models, PyTorch support is in the roadmap.

As the first main scenario for high level APIs, we are currently focusing on image classification. The goal of these new high-level APIs is to provide powerful and easy to use interfaces for DNN training scenarios like image classification, object detection and text classification.

The below API code example shows how easily you can train a new TensorFlow model which under the covers is based on transfer learning from a selected architecture (pre-trained model) such as Inception v3 or Resnet.

Image classifier high level API code using transfer learning from Inceptionv3 pre-trained model

var pipeline = mlContext.Transforms.Conversion.MapValueToKey(outputColumnName: "LabelAsKey", inputColumnName: "Label")
               .Append(mlContext.Model.ImageClassification("ImagePath", "LabelAsKey",
                            arch: ImageClassificationEstimator.Architecture.InceptionV3));  //Can also use ResnetV2101
                            
// Train the model
ITransformer trainedModel = pipeline.Fit(trainDataView);

The important line in the above code is the one using the mlContext.Model.ImageClassification classifier trainer which as you can see is a high level API where you just need to select the base pre-trained model to derive from, in this case Inception v3, but you could also select other pre-trained models such as Resnet v2101. Inception v3 is a widely used image recognition model trained on the ImageNet dataset. Those pre-trained models or architectures are the culmination of many ideas developed by multiple researchers over the years and you can easily take advantage of it now.

The DNN Image Classification training API is still in early preview and we hope to get feedback from you that we can incorporate in the next upcoming releases.

For further learning see this sample app training a custom TensorFlow model with provided images.

Enhanced for .NET Core 3.0

ML.NET is now building for .NET Core 3.0. This means ML.NET can take advantage of the new features when running in a .NET Core 3.0 application. The first new feature we are using is the new hardware intrinsics feature, which allows .NET code to accelerate math operations by using processor specific instructions.

Of course, you can still run ML.NET on older versions, but when running on .NET Framework, or .NET Core 2.2 and below, ML.NET uses C++ code that is hard-coded to x86-based SSE instructions. SSE instructions allow for four 32-bit floating-point numbers to be processed in a single instruction. Modern x86-based processors also support AVX instructions, which allow for processing eight 32-bit floating-point numbers in one instruction. ML.NET’s C# hardware intrinsics code supports both AVX and SSE instructions and will use the best one available. This means when training on a modern processor, ML.NET will now train faster because it can do more concurrent floating-point operations than it could with the existing C++ code that only supported SSE instructions.

Another advantage the C# hardware intrinsics code brings is that when neither SSE nor AVX are supported by the processor, for example on an ARM chip, ML.NET will fall back to doing the math operations one number at a time. This means more processor architectures are now supported by the core ML.NET components. (Note: There are still some components that don’t work on ARM processors, for example FastTree, LightGBM, and OnnxTransformer. These components are written in C++ code that is not currently compiled for ARM processors.)

For more information on how ML.NET uses the new hardware intrinsics APIs in .NET Core 3.0, please check out Brian Lui’s blog post Using .NET Hardware Intrinsics API to accelerate machine learning scenarios.

Model Builder in VS and CLI updated to latest GA version

The Model Builder tool in Visual Studio and the ML.NET CLI (both in preview) have been updated to use the latest ML.NET GA version (1.3) and addresses lots of customer feedback. Learn more about the changes here.

Model Builder updated to latest ML.NET GA version

Model Builder uses the latest GA version of ML.NET (1.3) and therefore the generated C# code also references ML.NET 1.3.

Improved support for other OS cultures

This addresses many frequently reported issues where developers want to use their own local culture OS settings to train a model in Model Builder. Please read this issue for more details.

Customer feedback addressed for Model Builder

There were many issues fixed in this release. Learn more in the release notes.

New sample apps

Coinciding with this new release, we’re also announcing new interesting sample apps covering additional scenarios:

  Sales forecast scenario based on Time Series SSA (Single Spectrum Analysis)
  Credit Card Fraud Detection scenario based on Anomaly Detection PCA
  Search engine sorted results scenario based on Ranking task
  Model Explainability and feature importance
  Database Loader (Native Database Loader for relational databases)
  Deep Learning training: Image Classification DNN re-train (Transfer Learning)
Scalable ML.NET model on ASP.NET Core Razor web app (C#)
  Scalable ML.NET model on Azure Function (C#)

 

New ML.NET video playlist at YouTube

We have created a ML.NET Youtube playlist at the .NET foundation channel with a list made of selected videos, each video focusing on a single and particular ML.NET feature, so it is great for learning purposes.

Access here the ML.NET Youtube playlist.

 

Try ML.NET and Model Builder today!

Summary

We are excited to release these updates for you and we look forward to seeing what you will build with ML.NET. If you have any questions or feedback, you can ask them here for ML.NET and Model Builder.

Happy coding!

The ML.NET team.

This blog was authored by Cesar de la Torre and Eric Erhardt plus additional contributions of the ML.NET team.

 

Acknowledgements

  • As mentioned above, we would like to acknowledge the effort and say thank you to the Tensorflow.NET library team for their agility and great collaboration with us. Special kudos for Haiping (Oceania2018.)
  • Special thanks for Jon Wood (@JWood) for his many and great YouTube videos on ML.NET that we’re also pointing from our ML.NET YouTube playlist mentioned in the blog post. Also, thanks for being an early adopter and tester for the new DatabaseLoader.

 

27 comments

Discussion is closed. Login to edit/delete existing comments.

  • Max Mustermueller 0

    I really find it hard to step into the world of ML.NET because although I have a clear idea of what I want, I have no idea how to achieve this. There are sample applications that do work but the syntax is too difficult and less explained that I can understand how to make this work on my purposes.

    Let’s say I want to add face recognization. I have a bunch of pictures of myself and of random people and I want to find myself on pictures. Where do I start? Do I create a folder called “me” and put all images in there? Ok and how to train a machine knowing that it should identify my face on these pictures in a folder called “me”? And how to I run it against other pictures telling the machine to focus on faces?

    There is an “Image Classification” example. But the explanation is terrible. ”
    mlContext.Transforms.Conversion.MapValueToKey(outputColumnName: “LabelAsKey”,
    inputColumnName: “Label”,
    keyOrdinality: ValueToKeyMappingEstimator.KeyOrdinality.ByValue)
    .Append(mlContext.Model.ImageClassification(“ImagePath”, “LabelAsKey”,
    arch: ImageClassificationEstimator.Architecture.InceptionV3,
    epoch: 100,
    batchSize: 30,
    metricsCallback: (metrics) => Console.WriteLine(metrics)));”Is explained as ” you define the model’s training pipeline where you can see how easily you can train a new TensorFlow model” well for someone knowing what these lines do it might be “EASY” to see but I just ask myself what the hell is outputColumName supposed to be, arch? epoch? batchsize? metrocallback? I would love to see a basic walkthrough through ML.NET for completely beginners taking a real world scenario and explaining it from ground up by what you need first, how the data should be stored so ML.NET can read it, then over the syntax language and so on. As of now all of these samples seem to be only useful for those who understand ML.NET already.

    • Cesar De la TorreMicrosoft employee 0

      Hi Max. Thanks a lot for your feedback. I completely a gree on it. However, take into account that this feature (the new Image Classification API natively training a TensorFlow model under the covers) is still in Preview and we’re actually working on having further documentation and tutorials/walkthroughs to be published at https://docs.microsoft.com/en-us/dotnet/machine-learning/ when the API is GA. But today we’re simply making the Preview available for early adopters.

      In addition to that, that API might get further simplifications and improvements, that’s why it is still in Preview. 🙂

      In the meantime, if you want to go ahead and try to create your own image classifier model with it, feel free to send me an email to cesardl at microsoft.com so I can explain any question you might have about that code and help you while you also provide additional feedback.  

  • T.I Ali 0

    Has the Tensorflow .net library replaced the TensorFlowSharp version of the Tensorflow bindings?

    • Cesar De la TorreMicrosoft employee 0

      Not sure wht you mean with ‘replaced’. In general? Within ML.NET?

      In any case, for ML.NET, it does. Originally we were using code from TensorFlowSharp, but since release 1.3.1, ML.NET is taking dependency on TensorFlow.NET, including when simply scoring a TensorFlow model

  • King David Consulting LLC 0

    Great work! How about support for Azure Storage Tables or CosmosDb?

    • Cesar De la TorreMicrosoft employee 0

      If those data sources (Azure Storage Tables, Azure Cosmos DB or any other data source is important for your business scenarios, in addition to Relational Databases, please submit an issue at the ML.NET GitHub repo requesting it and explaining why those sources would work better for your scenarios, etc., ok?

      That would help us on prioritize our backlog depending on the needs of the users like you. 🙂

  • Mario M. 0

    There is a big problem with the model builder, it does not support vector columns. I have tried to export the data to text file directly from ML.Net with SaveAsText and the model builder cannot load the vector colums and cannot identify the columns. The same data text file can be loaded without any problem with ML.Net
    Also with the SQL database I don’t know how to create the data with vectors in order to load it in model builder.

    • Cesar De la TorreMicrosoft employee 0

      @Mario M. – For that problem with Model Builder can you submit and issue at the Model Builder Repo here?:
      https://github.com/dotnet/machinelearning-modelbuilder

      I know Model Builder in VS lacks of support for that feature (support vector columns) as of today, but the more feedback we get from users requesting it, the higher priority it’ll have in our backlog. 🙂

  • Stephen Eldridge 0

    It’s a pity about
    Console.WriteLine($”Predicted Label: {clickPrediction.PredictedLabel} – Score:{Sigmoid(clickPrediction.Score)}”, Color.YellowGreen);
    in the DatabaseLoader example. You must have meant to use to ColorfulConsole Nuget package. OK, but ConsoleHelper uses System.ConsoleColor. It is remarkable that one can see scores at all 🙂
    It would also be good to know what kind of results you expect the program to produce. This also applies to practically every other every other sample program.

  • Ashkan 0

    I want create a bot using ML.Net, can you tell me what features can be usefull for this?

  • Hernando Zambrano 0

     Hi i, great article, is it possible  to use the model in a Xamarin app ? 
    Thanks 

    • Cesar De la TorreMicrosoft employee 0

      Hi Hernando. Good point. So, currently even when ML.NET is cross platform (Windows, Linux, macOS) thanks to .NET Core, there are many internal ML algorithms implemented natively in C/C++. Those ‘native’ areas don’t support ARM processors, currently. That means that running ML.NET on ARM based devices such as iOS or Android (Xamarin target) and also IoT ARM-based devices is not supported by ML.NET since in terms of processors we just support x64 and x86 (See our current OS and processors support here: https://github.com/dotnet/machinelearning#operating-systems-and-processor-architectures-supported-by-mlnet).

      Currently, there’s a clear workaorund for you which is to run the ML.NET models in HTTP services (or any remote solution running on a server) which would be consumed by the Xamarin apps.

      However, support for ARM processors (and hence support for Xamarin) is in our roadmap and backlog.

      Important, if you want to influence the priorities of our roadmap/backlog priorities, please do provide your feedback/requests in the ML.NET site repo as an issue or write your feedback in existing issues. 

      For this topic, please, write your feedback and why this scenario is important for you in the following issue (Feel free to re-open or create a new issue and link to this one):

      https://github.com/dotnet/machinelearning/issues/1790

      Thanks for your feedback,

      • Hernando Zambrano 0

        Thanks Cesar,

        Yes I’ll definitely give feedback to support ARM processors .

  • LOST 0

    I hope you guys understand what you are doing when taking a dependency on TensorFlow.NET
    At least TensorFlowSharp was properly auto-generated from public C interface.

    TensorFlow.NET team is quite misleading. They advertise on their NuGet package, that they support full TensorFlow API. And you can find most classes in the package, but when you look at the source code, they are basically empty even with no NotImplementedException being thrown. For instance: https://github.com/SciSharp/TensorFlow.NET/blob/master/src/TensorFlowNET.Core/Train/GradientDescentOptimizer.cs

    And here’s the real one: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/training/gradient_descent.py for comparison.

    I know because I am making a competing product, and pointed this problem out to them before, so since they implemented AdamOptimizer by hand-converting source code from Python. But will they maintain that converted code? I highly doubt that.

    So just beware, that if you expose functionality in ML.NET through TensorFlow.NET some pieces might just silently do nothing.

    • Zeeshan SiddiquiMicrosoft employee 0

      @LOST_ Yes, we know what we are doing and we also know what you are doing 🙂 TensorFlowSharp is also a Microsoft product and its author himself has endorsed TF .NET, please see https://twitter.com/migueldeicaza/status/1157385979071778817. Lets not bash people like that, there is merit to the work TF .NET has done and we really appreciate our collaboration with them. We also appreciate the quick fixes they have made for us to make the product more reliable and we are committed to making TF .NET even better. At the end of the day our high level DNN APIs are benchmarked against official Python TensorFlow APIs and we ensure the performance and accuracy numbers match, what C# Tensorflow bindings we use is implementation details, it is not important if some bindings are missing as long as we don’t need them. That being said, whenever we have needed any bindings the TF .NET has given to us very quickly and we really appreciate their effort.

  • Win Pooh 0

    The new version is great! Especially new features to work with database.

    How to train and generate the model for Word2Vec ? And how to use it for words vectors like: king+woman-man=queen?
    I have build this for NET implementation of Word2Vec.
    I could not find how to do it in ML.NET. May be there are any samples?

    • Cesar De la TorreMicrosoft employee 0

      @Win Pooh – Thanks for your feedback. About featurizing text into numeric vectors the way yo do it in ML.NET is with the transformer-estimator called ‘FeaturizeText’ that you can use from: ‘mlContext.Transforms.Text.FeaturizeText’.

      You can see an example here (Sentiment Analysis):
      https://github.com/dotnet/machinelearning-samples/blob/master/samples/csharp/getting-started/BinaryClassification_SentimentAnalysis/SentimentAnalysis/SentimentAnalysisConsoleApp/Program.cs#L36

      However, simply featurizing text into numeric vectors might not be exactly what you mean? If so, can you elaborate further on the reasons why you think a specific Word2Vec DNN integration implementation would be important for ML.NET?

      Thanks for your feedback.

      • Win Pooh 0

        @Cesar De la Torre Thank you for the answer.
        That is my idea:
        I have a book store database, one table contains books title, amount etc etc and description.
        User has found a book by description and wants to find top N similar (by description) books.
        Yes, it is not easy to implement, but I’d like to try to find some solutions. In any case it is interesting task :-).
        It may be Word2Vec based solution like Doc2Vec or something other.
        I have solved one subtask: it is ML.NET based lib which can identify language of given piece of text.
        The model trained to recognize 7 languages: en, de, es, it, ro, ru, uk, but it can be extended.
        By the way, TensorFlowNET contains the example:
        https://github.com/SciSharp/TensorFlow.NET/blob/master/test/TensorFlowNET.Examples/TextProcessing/Word2Vec.cs

        • Cesar De la TorreMicrosoft employee 0

          We have customers using a similar approach than the one we use for ‘GitHub issues automatic labeling sample’ (https://github.com/dotnet/machinelearning-samples/tree/master/samples/csharp/end-to-end-apps/MulticlassClassification-GitHubLabeler) but for detecting/recognizing multiple languages (en, de, it, etc…) or many other business scenarios.
          It all depends on how the data is labeled.

          Or, if the data is not labeled, then you could featurize the text then do a “books segmentation” by using a clustering algorithm?

          Could you try a similar approach than the ‘GitHub issues automatic labeling sample’ I mentioned and if that is not enough for you send me feedback? – Feel free to send me further emails with further details to cesardl at microsoft.com.

          • Win Pooh 0

            Ok, I will check these ideas and let you know, thank you.

  • Win Pooh 0

    I run the demo DatabaseIntegration, there is the code in the Program.cs :

    Console.WriteLine(“Training model…”);
    var model = pipeline.Fit(trainTestData.TrainSet);

    i.e. every time the model trained and created.
    The question: Is it possible to save the model in a database after it once trained and created and then reuse the saved one?
    It can be saved to zip file and loaded. But how to save/load it to database?

    • Cesar De la TorreMicrosoft employee 0

      An ML.NET model can only be serialized and saved as a .ZIP file which can be stored in any place you can store files.
      The most common approach is to store it as a file as part of the assets/resources of an application but it could also be stored as a file in a Blob in Azure Storage blobs, or in any Http repository.

      About storing the .zip file as into a database.. could be if you store it as a blob/binary type in a table but you’d be responsible for the code reading/saving to that blob/binary type in a table database. We currently don’t have any ad-hoc API for that.

      It is an interesting scenario, though. What are your motivations for doing this? Could you provide further feedback?

      • Win Pooh 0

        Hi,
        Ok, for example, my scenario is:
        I provide a client->server system to customers.
        The client app contains some ML methods to find similar texts, to predict something etc.
        And trained model provided in the db.
        I.e. all applications work with this db and use the same model instead of zip file on every client machine.
        I decide to retrain the model by some reasons and refresh it in the customer environment.
        In case of database I provide one script and it is updated once in one place and the fresh model appears immediately.
        In case of zip file it should be provided and distributed on every client’s machine. I am not sure that it was refreshed on all machines.

        • Cesar De la TorreMicrosoft employee 0

          Well, it depends on the scenario. If the app is a web app or you have services (such as an ASP.NET WebAPI service), you can also load the ML.NET model from a remote HTTP endpoint like in the link below, but using FromUri() instead of .FromFile():

          https://docs.microsoft.com/en-us/dotnet/machine-learning/how-to-guides/serve-model-web-api-ml-net

          But I agree that if you want to run code “Within SQL Server” (such as a C# SQL Server Function or Stored Procedure), it makes sense to have everything you need secured and available in the database server. Not just for scoring but also for saving after training close to the database..

          Another scenario would be for traditional client/server apps with the client apps directly accessing a database…

          I’ve created this issue to continue the discussion in the open:

          https://github.com/dotnet/machinelearning/issues/4285

          Thanks for the feedback! 🙂

          • Win Pooh 0

            Thank you, great! I have added comment to the issue.

          • Win Pooh 0

            Hi Cesar,
            Is this feature in your plans? When we can wait it? Thanx! 🙂

Feedback usabilla icon