{"id":31052,"date":"2020-11-30T15:17:44","date_gmt":"2020-11-30T22:17:44","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/dotnet\/?p=31052"},"modified":"2020-11-30T17:01:46","modified_gmt":"2020-12-01T00:01:46","slug":"ml-net-model-builder-november-updates","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/dotnet\/ml-net-model-builder-november-updates\/","title":{"rendered":"ML.NET Model Builder November Updates"},"content":{"rendered":"<p><a href=\"https:\/\/dot.net\/ml\">ML.NET<\/a> is an open-source, cross-platform machine learning framework for .NET developers. It enables integrating machine learning into your .NET apps without requiring you to leave the .NET ecosystem or even have a background in ML or data science. ML.NET provides tooling (Model Builder UI in Visual Studio and the cross platform ML.NET CLI) that automatically trains custom machine learning models for you based on your scenario and data.<\/p>\n<p>This release of ML.NET Model Builder brings numerous bug fixes and enhancements as well as new features, including advanced data loading options and streaming training data from SQL.<\/p>\n<p>In this post, we\u2019ll cover the following items:<\/p>\n<ol>\n<li><a href=\"#advanced-data-loading-options\">Advanced data loading options<\/a><\/li>\n<li><a href=\"#streaming-from-sql-server-with-database-loader\">Streaming from SQL Server with Database Loader<\/a><\/li>\n<li><a href=\"#feedback\">Feedback<\/a><\/li>\n<li><a href=\"#get-started-and-resources\">Get started and resources<\/a><\/li>\n<\/ol>\n<h2>Advanced data loading options<\/h2>\n<p>Previously, Model Builder did not offer any data loading options, relying on AutoML to detect column purpose, header, and separator as well as decimal separator style.<\/p>\n<p>Let\u2019s take a look at the new advanced data loading options in Model Builder using the <a href=\"https:\/\/github.com\/dotnet\/machinelearning-samples\/tree\/master\/samples\/csharp\/getting-started\/Regression_TaxiFarePrediction\">taxi fare dataset<\/a>. This is a regression problem where you predict the taxi fare amount based on several factors like distance traveled, payment type, and number of passengers.<\/p>\n<p>In Model Builder, after selecting the <strong>Value prediction<\/strong> scenario and the local training environment, you\u2019ll end up on the <em>Data<\/em> step. Choose <strong>File<\/strong> as the <em>Data source type<\/em>, browse for the taxi fare dataset, and once the dataset is selected, change the <em>Column to predict (Label)<\/em> to <strong>fare_amount<\/strong>.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2020\/11\/advanced-data-data.png\" alt=\"Data step in Model Builder\" \/><\/p>\n<p>Select <strong>Advanced data options<\/strong> to open the advanced data loading options dialog.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2020\/11\/advanced-data-col-purpose.png\" alt=\"Advanced data options column settings\" \/><\/p>\n<p>In this dialog, there are two sections- <em>Column settings<\/em> and <em>Data formatting<\/em>.<\/p>\n<h3>Column settings<\/h3>\n<p>In the <em>Column settings<\/em> section, you can change the column purpose of each Feature column (columns which are used to predict the Label) to <strong>Categorical<\/strong>, <strong>Text<\/strong>, <strong>Numerical<\/strong>, or <strong>Ignore<\/strong>:<\/p>\n<ul>\n<li><strong>Categorical<\/strong> columns contain data that is in a discrete number of labeled groups. For instance, Payment Type, which can be CSH (cash) or CRD (card) would be Categorical.<\/li>\n<li><strong>Text<\/strong> columns contain strings in the form of free-form text. For example, if you had a model that predicted if reviews left by taxi passengers about their ride was positive or negative, the column which contains the free-form comments would have a column purpose of <strong>Text<\/strong>.<\/li>\n<li><strong>Numerical<\/strong> columns contain numbers only (floating point or integers). In the taxi fare example, trip distance and trip time are both <strong>Numerical<\/strong> columns.<\/li>\n<li>You can <strong>Ignore<\/strong> columns that you don\u2019t want to use for training.<\/li>\n<\/ul>\n<p>Normally, Model Builder does a suitable job of determining the column purpose, but there are cases where it might infer incorrectly or might choose a column purpose that gives slightly worse model performance. For instance, in the taxi fare example, Model Builder chooses <strong>Categorical<\/strong> for the passenger_count column, but this could also be a <strong>Numerical<\/strong> column.<\/p>\n<p>You can try training with the default settings chosen by Model Builder and then try changing the Column purpose of passenger_count to Numerical to see how it affects the model\u2019s performance.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2020\/11\/advanced-data-col-purpose-change.png\" alt=\"Advanced data options changing column purpose\" \/><\/p>\n<h3>Data formatting<\/h3>\n<p>In the <em>Data formatting<\/em> section, you can override the following data loading options chosen by Model Builder:<\/p>\n<ul>\n<li>Whether the dataset has column headers or not<\/li>\n<li>The column separator (comma, semicolon, or tab)<\/li>\n<li>The decimal separator (decimal dot or comma)<\/li>\n<\/ul>\n<p><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2020\/11\/advanced-data-formatting.png\" alt=\"Advanced data options data formatting\" \/><\/p>\n<p>As soon as you save the <em>Data formatting<\/em> options, you can see how it affects the dataset in the <em>Data Preview<\/em>.<\/p>\n<h2>Streaming from SQL Server with Database Loader<\/h2>\n<p>Model Builder now takes advantage of the <a href=\"https:\/\/docs.microsoft.com\/dotnet\/api\/microsoft.ml.data.databaseloader?view=ml-dotnet\">Database Loader<\/a>!<\/p>\n<p>Previously, if your training data was stored in SQL Server, Model Builder would download the data locally and then train. Now, Model Builder will load and train data directly from SQL Server without needing to load all the data in-memory, so it can handle huge datasets up to terabytes in size.<\/p>\n<h2>Feedback<\/h2>\n<p>We would love to hear your feedback!<\/p>\n<p>If you run into any issues, please let us know by creating an issue in our GitHub repos (or use the new Feedback button in Model Builder!):<\/p>\n<ul>\n<li><a href=\"https:\/\/github.com\/dotnet\/machinelearning\">ML.NET API<\/a><\/li>\n<li><a href=\"https:\/\/github.com\/dotnet\/machinelearning-modelbuilder\">ML.NET Tooling (Model Builder &amp; ML.NET CLI)<\/a><\/li>\n<\/ul>\n<h2>Get started and resources<\/h2>\n<p>Get started with ML.NET in this <a href=\"https:\/\/dotnet.microsoft.com\/learn\/ml-dotnet\/get-started-tutorial\/intro\">tutorial<\/a>.<\/p>\n<p>Learn more about ML.NET and Model Builder in <a href=\"https:\/\/docs.microsoft.com\/dotnet\/machine-learning\/\">Microsoft Docs<\/a>.<\/p>\n<p>Tune in to the <a href=\"https:\/\/dotnet.microsoft.com\/platform\/community\/standup\">Machine Learning .NET Community Standup<\/a> every other Wednesday at 10am Pacific Time.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This release of ML.NET Model Builder brings numerous bug fixes and enhancements as well as new features, including advanced data loading options and streaming training data from SQL.<\/p>\n","protected":false},"author":721,"featured_media":31053,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[685,688,691],"tags":[4,93,96,6098],"class_list":["post-31052","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-dotnet","category-machine-learning","category-ml-dotnet","tag-net","tag-machine-learning","tag-ml-net","tag-model-builder"],"acf":[],"blog_post_summary":"<p>This release of ML.NET Model Builder brings numerous bug fixes and enhancements as well as new features, including advanced data loading options and streaming training data from SQL.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/posts\/31052","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/users\/721"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/comments?post=31052"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/posts\/31052\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/media\/31053"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/media?parent=31052"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/categories?post=31052"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/tags?post=31052"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}