We are excited to announce ML.NET 1.4 Preview and updates to Model Builder and CLI.
ML.NET is an open-source and cross-platform machine learning framework for .NET developers. ML.NET also includes Model Builder (a simple UI tool) and CLI to make it super easy to build custom Machine Learning (ML) models using Automated Machine Learning (AutoML).
Using ML.NET, developers can leverage their existing tools and skillsets to develop and infuse custom ML into their applications by creating custom machine learning models for common scenarios like Sentiment Analysis, Price Prediction, Sales Forecast prediction, Image Classification and more!
Following are some of the key highlights in this update:
ML.NET Updates
ML.NET 1.4 Preview is a backwards compatible release with no breaking changes so please update to get the latest changes.
In addition to bug fixes described here, in ML.NET 1.4 Preview we have released some exciting new features that are described in the following sections.
Database Loader (Preview)
This feature introduces a native database loader that enables training directly against relational databases. This loader supports any relational database provider supported by System.Data
in .NET Core or .NET Framework, meaning that you can use any RDBMS such as SQL Server, Azure SQL Database, Oracle, SQLite, PostgreSQL, MySQL, Progress, IBM DB2, etc.
In previous ML.NET releases, since ML.NET 1.0, you could also train against a relational database by providing data through an IEnumerable
collection by using the LoadFromEnumerable() API where the data could be coming from a relational database or any other source. However, when using that approach, you as a developer are responsible for the code reading from the relational database (such as using Entity Framework or any other approach) which needs to be implemented properly so you are streaming data while training the ML model, as in this previous sample using LoadFromEnumerable().
However, this new Database Loader provides a much simpler code implementation for you since the way it reads from the database and makes data available through the IDataView is provided out-of-the-box by the ML.NET framework so you just need to specify your database connection string, what’s the SQL statement for the dataset columns and what’s the data-class to use when loading the data. It is that simple!
Here’s example code on how easily you can now configure your code to load data directly from a relational database into an IDataView which will be used later on when training your model.
//Lines of code for loading data from a database into an IDataView for a later model training string connectionString = @"Data Source=YOUR_SERVER;Initial Catalog= YOUR_DATABASE;Integrated Security=True"; string commandText = "SELECT * from SentimentDataset"; DatabaseLoader loader = mlContext.Data.CreateDatabaseLoader(); DatabaseSource dbSource = new DatabaseSource(SqlClientFactory.Instance, connectionString, commandText); IDataView trainingDataView = loader.Load(dbSource); // ML.NET model training code using the training IDataView //... public class SentimentData { public string FeedbackText; public string Label; }
This feature is in preview and can be accessed via the Microsoft.ML.Experimental
v0.16-Preview nuget package available here.
For further learning see this complete sample app using the new DatabaseLoader.
Image classification with deep neural networks retraining (Preview)
This new feature enables native DNN transfer learning with ML.NET, targeting image classification as our first high level scenario.
For instance, with this feature you can create your own custom image classifier model by natively training a TensorFlow model from ML.NET API with your own images.
Image classifier scenario – Train your own custom deep learning model with ML.NET
In order to use TensorFlow, ML.NET is internally taking dependency on the Tensorflow.NET library.
The Tensorflow.NET library is an open source and low level API library that provides the .NET Standard bindings for TensorFlow. That library is part of the SciSharp stack libraries.
Microsoft (the ML.NET team) is closely working with the TensorFlow.NET library team not just for providing higher level APIs for the users in ML.NET (such as our new ImageClassification API) but also helping to improve and evolve the Tensorflow.NET library as an open source project.
We would like to acknowledge the effort and say thank you to the Tensorflow.NET library team for their agility and great collaboration with us.
The stack diagram below shows how ML.NET implements these new DNN training features. Although we currently only support training TensorFlow models, PyTorch support is in the roadmap.
As the first main scenario for high level APIs, we are currently focusing on image classification. The goal of these new high-level APIs is to provide powerful and easy to use interfaces for DNN training scenarios like image classification, object detection and text classification.
The below API code example shows how easily you can train a new TensorFlow model which under the covers is based on transfer learning from a selected architecture (pre-trained model) such as Inception v3 or Resnet.
Image classifier high level API code using transfer learning from Inceptionv3 pre-trained model
var pipeline = mlContext.Transforms.Conversion.MapValueToKey(outputColumnName: "LabelAsKey", inputColumnName: "Label") .Append(mlContext.Model.ImageClassification("ImagePath", "LabelAsKey", arch: ImageClassificationEstimator.Architecture.InceptionV3)); //Can also use ResnetV2101 // Train the model ITransformer trainedModel = pipeline.Fit(trainDataView);
The important line in the above code is the one using the mlContext.Model.ImageClassification
classifier trainer which as you can see is a high level API where you just need to select the base pre-trained model to derive from, in this case Inception v3, but you could also select other pre-trained models such as Resnet v2101. Inception v3 is a widely used image recognition model trained on the ImageNet dataset. Those pre-trained models or architectures are the culmination of many ideas developed by multiple researchers over the years and you can easily take advantage of it now.
The DNN Image Classification training API is still in early preview and we hope to get feedback from you that we can incorporate in the next upcoming releases.
For further learning see this sample app training a custom TensorFlow model with provided images.
Enhanced for .NET Core 3.0
ML.NET is now building for .NET Core 3.0. This means ML.NET can take advantage of the new features when running in a .NET Core 3.0 application. The first new feature we are using is the new hardware intrinsics feature, which allows .NET code to accelerate math operations by using processor specific instructions.
Of course, you can still run ML.NET on older versions, but when running on .NET Framework, or .NET Core 2.2 and below, ML.NET uses C++ code that is hard-coded to x86-based SSE instructions. SSE instructions allow for four 32-bit floating-point numbers to be processed in a single instruction. Modern x86-based processors also support AVX instructions, which allow for processing eight 32-bit floating-point numbers in one instruction. ML.NET’s C# hardware intrinsics code supports both AVX and SSE instructions and will use the best one available. This means when training on a modern processor, ML.NET will now train faster because it can do more concurrent floating-point operations than it could with the existing C++ code that only supported SSE instructions.
Another advantage the C# hardware intrinsics code brings is that when neither SSE nor AVX are supported by the processor, for example on an ARM chip, ML.NET will fall back to doing the math operations one number at a time. This means more processor architectures are now supported by the core ML.NET components. (Note: There are still some components that don’t work on ARM processors, for example FastTree, LightGBM, and OnnxTransformer. These components are written in C++ code that is not currently compiled for ARM processors.)
For more information on how ML.NET uses the new hardware intrinsics APIs in .NET Core 3.0, please check out Brian Lui’s blog post Using .NET Hardware Intrinsics API to accelerate machine learning scenarios.
Model Builder in VS and CLI updated to latest GA version
The Model Builder tool in Visual Studio and the ML.NET CLI (both in preview) have been updated to use the latest ML.NET GA version (1.3) and addresses lots of customer feedback. Learn more about the changes here.
Model Builder updated to latest ML.NET GA version
Model Builder uses the latest GA version of ML.NET (1.3) and therefore the generated C# code also references ML.NET 1.3.
Improved support for other OS cultures
This addresses many frequently reported issues where developers want to use their own local culture OS settings to train a model in Model Builder. Please read this issue for more details.
Customer feedback addressed for Model Builder
There were many issues fixed in this release. Learn more in the release notes.
New sample apps
Coinciding with this new release, we’re also announcing new interesting sample apps covering additional scenarios:
New ML.NET video playlist at YouTube
We have created a ML.NET Youtube playlist at the .NET foundation channel with a list made of selected videos, each video focusing on a single and particular ML.NET feature, so it is great for learning purposes.
Access here the ML.NET Youtube playlist.
Try ML.NET and Model Builder today!
- Get started with ML.NET here.
- Get started with Model Builder here.
- Refer to documentation for tutorials and more resources.
- Learn from samples apps for different scenarios using ML.NET.
Summary
We are excited to release these updates for you and we look forward to seeing what you will build with ML.NET. If you have any questions or feedback, you can ask them here for ML.NET and Model Builder.
Happy coding!
The ML.NET team.
This blog was authored by Cesar de la Torre and Eric Erhardt plus additional contributions of the ML.NET team.
Acknowledgements
- As mentioned above, we would like to acknowledge the effort and say thank you to the Tensorflow.NET library team for their agility and great collaboration with us. Special kudos for Haiping (Oceania2018.)
- Special thanks for Jon Wood (@JWood) for his many and great YouTube videos on ML.NET that we’re also pointing from our ML.NET YouTube playlist mentioned in the blog post. Also, thanks for being an early adopter and tester for the new DatabaseLoader.
I run the demo DatabaseIntegration, there is the code in the Program.cs :
…
Console.WriteLine(“Training model…”);
var model = pipeline.Fit(trainTestData.TrainSet);
…
i.e. every time the model trained and created.
The question: Is it possible to save the model in a database after it once trained and created and then reuse the saved one?
It can be saved to zip file and loaded. But how to save/load it to database?
An ML.NET model can only be serialized and saved as a .ZIP file which can be stored in any place you can store files.
The most common approach is to store it as a file as part of the assets/resources of an application but it could also be stored as a file in a Blob in Azure Storage blobs, or in any Http repository.
About storing the .zip file as into a database.. could be if...
Hi,
Ok, for example, my scenario is:
I provide a client->server system to customers.
The client app contains some ML methods to find similar texts, to predict something etc.
And trained model provided in the db.
I.e. all applications work with this db and use the same model instead of zip file on every client machine.
I decide to retrain the model by some reasons and refresh it in the customer environment.
In case...
Well, it depends on the scenario. If the app is a web app or you have services (such as an ASP.NET WebAPI service), you can also load the ML.NET model from a remote HTTP endpoint like in the link below, but using FromUri() instead of .FromFile():
https://docs.microsoft.com/en-us/dotnet/machine-learning/how-to-guides/serve-model-web-api-ml-net
But I agree that if you want to run code “Within SQL Server” (such as a C# SQL Server Function or Stored Procedure), it makes sense to have everything you...
Hi Cesar,
Is this feature in your plans? When we can wait it? Thanx! 🙂
Thank you, great! I have added comment to the issue.
The new version is great! Especially new features to work with database.
How to train and generate the model for Word2Vec ? And how to use it for words vectors like: king+woman-man=queen?
I have build this for NET implementation of Word2Vec.
I could not find how to do it in ML.NET. May be there are any samples?
@Win Pooh - Thanks for your feedback. About featurizing text into numeric vectors the way yo do it in ML.NET is with the transformer-estimator called 'FeaturizeText' that you can use from: 'mlContext.Transforms.Text.FeaturizeText'.
You can see an example here (Sentiment Analysis):
https://github.com/dotnet/machinelearning-samples/blob/master/samples/csharp/getting-started/BinaryClassification_SentimentAnalysis/SentimentAnalysis/SentimentAnalysisConsoleApp/Program.cs#L36
However, simply featurizing text into numeric vectors might not be exactly what you mean? If so, can you elaborate further on the reasons why you think a specific Word2Vec DNN integration implementation would be important for...
@Cesar De la Torre Thank you for the answer.
That is my idea:
I have a book store database, one table contains books title, amount etc etc and description.
User has found a book by description and wants to find top N similar (by description) books.
Yes, it is not easy to implement, but I'd like to try to find some solutions. In any case it is interesting task :-).
It may be Word2Vec based...
We have customers using a similar approach than the one we use for 'GitHub issues automatic labeling sample' (https://github.com/dotnet/machinelearning-samples/tree/master/samples/csharp/end-to-end-apps/MulticlassClassification-GitHubLabeler) but for detecting/recognizing multiple languages (en, de, it, etc...) or many other business scenarios.
It all depends on how the data is labeled.
Or, if the data is not labeled, then you could featurize the text then do a "books segmentation" by using a clustering algorithm?
Could you try a similar approach than the 'GitHub issues automatic labeling...
Ok, I will check these ideas and let you know, thank you.
I hope you guys understand what you are doing when taking a dependency on TensorFlow.NET
At least TensorFlowSharp was properly auto-generated from public C interface.
TensorFlow.NET team is quite misleading. They advertise on their NuGet package, that they support full TensorFlow API. And you can find most classes in the package, but when you look at the source code, they are basically empty even with no NotImplementedException being thrown. For instance: https://github.com/SciSharp/TensorFlow.NET/blob/master/src/TensorFlowNET.Core/Train/GradientDescentOptimizer.cs
And here's the real one:...
@LOST_ Yes, we know what we are doing and we also know what you are doing :-) TensorFlowSharp is also a Microsoft product and its author himself has endorsed TF .NET, please see https://twitter.com/migueldeicaza/status/1157385979071778817. Lets not bash people like that, there is merit to the work TF .NET has done and we really appreciate our collaboration with them. We also appreciate the quick fixes they have made for us to make the product more reliable...
Hi i, great article, is it possible to use the model in a Xamarin app ?
Thanks
Hi Hernando. Good point. So, currently even when ML.NET is cross platform (Windows, Linux, macOS) thanks to .NET Core, there are many internal ML algorithms implemented natively in C/C++. Those 'native' areas don't support ARM processors, currently. That means that running ML.NET on ARM based devices such as iOS or Android (Xamarin target) and also IoT ARM-based devices is not supported by ML.NET since in terms of processors we just support x64 and x86 (See...
Thanks Cesar,
Yes I’ll definitely give feedback to support ARM processors .
I want create a bot using ML.Net, can you tell me what features can be usefull for this?
@Ashkan - Great question! - You can run an ML.NET to make predictions within a Bot, very easily. Basically, a Bot when using the Microsoft Bot Framework (https://dev.botframework.com/) is nothing more than an ASP.NET Core WebAPI, and since you can run ML.NET on any .NET application as long as it is .NET Core or .NET Framework and runnning on X64 or x86 (See details in ML.NET supper here: https://github.com/dotnet/machinelearning#operating-systems-and-processor-architectures-supported-by-mlnet), therefore you can do it very...
It's a pity about
Console.WriteLine($"Predicted Label: {clickPrediction.PredictedLabel} - Score:{Sigmoid(clickPrediction.Score)}", Color.YellowGreen);
in the DatabaseLoader example. You must have meant to use to ColorfulConsole Nuget package. OK, but ConsoleHelper uses System.ConsoleColor. It is remarkable that one can see scores at all :-)
It would also be good to know what kind of results you expect the program to produce. This also applies to practically every other every other sample program.
There is a big problem with the model builder, it does not support vector columns. I have tried to export the data to text file directly from ML.Net with SaveAsText and the model builder cannot load the vector colums and cannot identify the columns. The same data text file can be loaded without any problem with ML.Net
Also with the SQL database I don't know how to create the data with vectors in order to...
@Mario M. – For that problem with Model Builder can you submit and issue at the Model Builder Repo here?:
https://github.com/dotnet/machinelearning-modelbuilder
I know Model Builder in VS lacks of support for that feature (support vector columns) as of today, but the more feedback we get from users requesting it, the higher priority it’ll have in our backlog. 🙂
Great work! How about support for Azure Storage Tables or CosmosDb?
If those data sources (Azure Storage Tables, Azure Cosmos DB or any other data source is important for your business scenarios, in addition to Relational Databases, please submit an issue at the ML.NET GitHub repo requesting it and explaining why those sources would work better for your scenarios, etc., ok?
That would help us on prioritize our backlog depending on the needs of the users like you. 🙂
Has the Tensorflow .net library replaced the TensorFlowSharp version of the Tensorflow bindings?
Not sure wht you mean with ‘replaced’. In general? Within ML.NET?
In any case, for ML.NET, it does. Originally we were using code from TensorFlowSharp, but since release 1.3.1, ML.NET is taking dependency on TensorFlow.NET, including when simply scoring a TensorFlow model
I really find it hard to step into the world of ML.NET because although I have a clear idea of what I want, I have no idea how to achieve this. There are sample applications that do work but the syntax is too difficult and less explained that I can understand how to make this work on my purposes.
Let's say I want to add face recognization. I have a bunch of pictures of myself and...
Hi Max. Thanks a lot for your feedback. I completely a gree on it. However, take into account that this feature (the new Image Classification API natively training a TensorFlow model under the covers) is still in Preview and we're actually working on having further documentation and tutorials/walkthroughs to be published at https://docs.microsoft.com/en-us/dotnet/machine-learning/ when the API is GA. But today we're simply making the Preview available for early adopters.
In addition to that, that API...