{"id":20633,"date":"2018-08-07T14:23:38","date_gmt":"2018-08-07T21:23:38","guid":{"rendered":"https:\/\/blogs.msdn.microsoft.com\/dotnet\/?p=18665"},"modified":"2019-02-22T14:19:21","modified_gmt":"2019-02-22T21:19:21","slug":"announcing-ml-net-0-4","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/dotnet\/announcing-ml-net-0-4\/","title":{"rendered":"Announcing ML.NET 0.4"},"content":{"rendered":"<p>A few months ago <a href=\"https:\/\/devblogs.microsoft.com\/dotnet\/introducing-ml-net-cross-platform-proven-and-open-source-machine-learning-framework\/\"> we released ML.NET 0.1 at \/\/Build 2018.<\/a>, ML.NET is a cross-platform, open source machine learning framework for .NET developers. We\u2019ve gotten great feedback so far and would like to thank the community for your engagement as we continue to develop ML.NET together in the open.<\/p>\n<p>We are happy to announce the latest version: <strong>ML.NET 0.4<\/strong>. In this release we\u2019ve improved support for natural language processing (NLP) scenarios by adding the <strong>Word Embedding Transform, <\/strong>improved the speed of linear learners like binary classification and linear regression by adding support for the <strong>SymSGD<\/strong> learner, made <strong>improvements to the F# API and samples for ML.NET, <\/strong>bug fixes and more.<\/p>\n<p>Additionally, we really want your feedback on making ML.NET really easy to use. We are working on a new API which improves flexibility and ease of use. When the new API is ready and good enough, we plan to deprecate the current \u201cpipeline\u201d API. Because this will be a significant change we want to share our proposals for the multiple API options and comparisons in a future blog post and start an open discussion with you where you can provide your feedback and help shape the long-term API for ML.NET.<\/p>\n<p>The blog post below provides more details about the additions in the 0.4 release.<\/p>\n<ul>\n<li><span><a href=\"#wordembed\">Word Embedding Transform for Text Scenarios<\/a><\/span><\/li>\n<li><span><a href=\"#symsgd\">SymSGD Learner for Binary Classification<\/a><\/span><\/li>\n<li><span><a href=\"#fsharp\">Improvements to F# API and samples for ML.NET<\/span><\/a><\/span><\/li>\n<\/ul>\n<h4 id=\"wordembed\">Word Embeddings Transform for Text Scenarios<\/h4>\n<p><span><a href=\"https:\/\/en.wikipedia.org\/wiki\/Word_embedding\">Word embeddings<\/a><\/span> is a technique for mapping words to numeric vectors that are intended to capture some of the meaning of the words, so they can be used for visualization or model training.<\/p>\n<p>The <span><a href=\"https:\/\/docs.microsoft.com\/en-us\/dotnet\/api\/microsoft.ml.transforms.wordembeddings?view=ml-dotnet\">word embedding transform<\/a><\/span> added to ML.NET enables using pretrained word embedding models in pipelines. \u201cPretrained\u201d means you can use existing embeddings instead of needing to create your own (which takes a lot of data and time).\u00a0 Several different pretrained models are available (<span><a href=\"https:\/\/nlp.stanford.edu\/projects\/glove\/\">GloVe<\/a><\/span>,\u00a0<span><a href=\"https:\/\/en.wikipedia.org\/wiki\/FastText\">fastText<\/a><\/span>, and\u00a0<span><a href=\"http:\/\/anthology.aclweb.org\/P\/P14\/P14-1146.pdf\">SSWE<\/a><\/span>).<\/p>\n<p>By adding this transform in addition to existing transforms for working with text (like the <span><a href=\"https:\/\/docs.microsoft.com\/en-us\/dotnet\/api\/microsoft.ml.transforms.textfeaturizer?view=ml-dotnet\">TextFeaturizer<\/a><\/span>), you can improve the model\u2019s metrics.<\/p>\n<p>For example, we can improve the accuracy of the <span><strong><a href=\"https:\/\/github.com\/dotnet\/machinelearning\/\">sentiment analysis sample<\/a><\/strong><\/span><strong> by 5% if we change the line with TextFeaturizer to:\n<\/strong><\/p>\n<pre><code>\r\n\/\/ Change TextFeaturizer to output tokens (list of words in the text)\r\npipeline.Add(new TextFeaturizer(\"FeaturesA\", \"SentimentText\") { OutputTokens = true});\r\n\r\n\/\/ Add word embeddings \r\npipeline.Add(new WordEmbeddings((\"Features_TransformedText\", \"FeaturesB\"))); \r\n\r\n\/\/ Combine the features from word embeddings and text featurizer into one column \r\npipeline.Add(new ColumnConcatenator(\"Features\", \"FeaturesA\", \"FeaturesB\"));\r\n<\/code><\/pre>\n<p>In the above example, we used the default word embeddings (SSWE: Sentiment-Specific Word Embeddings) which are helpful in sentiment tasks.<\/p>\n<h4 id=\"symsgd\">SymSGD Learner for Binary Classification<\/h4>\n<p><span><a href=\"https:\/\/arxiv.org\/abs\/1705.08030\">SymSGD<\/a><\/span> is a parallel SGD algorithm that retains the sequential semantics of SGD but offers a much better performance based on multithreading. SymSGD is fast, scales well on multiple cores, while achieving the same accuracy as sequential SGD. It is now available in ML.NET for binary classification.<\/p>\n<p>A related learner, <span><a href=\"https:\/\/en.wikipedia.org\/wiki\/Stochastic_gradient_descent\">Stochastic Gradient Descent (SGD)<\/a><\/span> is a well-known and effective method for many machine learning problems such as regression and classification tasks. However, its performance scalability is severely limited by its inherently sequential computation.<\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/10\/2019\/02\/symSGD.png\"><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/10\/2019\/02\/symSGD.png\" alt=\"\" width=\"587\" height=\"294\" class=\"alignnone wp-image-17575\" \/><\/a><\/p>\n<p>SymSGD approach is applicable to any linear learner whose update rule is linear, such as binary classification and a linear regression.<\/p>\n<p>Here\u2019s how you add a SymSGD Binary Classifier learner to the pipeline:\n<code>\npipeline.Add(new SymSgdBinaryClassifier() { NumberOfThreads = 1});\n<\/code><\/p>\n<p>For additional sample code using <span><a href=\"https:\/\/github.com\/dotnet\/machinelearning\/blob\/f9d3973a056ad26bc6cc15c2d7a09f8ae47e30da\/test\/Microsoft.ML.Tests\/Scenarios\/SentimentPredictionTests.cs\">SymSGD, check here<\/a><\/span>.<\/p>\n<p>The current implementation in ML.NET does not have multi-threading enabled, the issue is tracked by\u00a0<span><a href=\"https:\/\/github.com\/dotnet\/machinelearning\/issues\/655\">#655<\/a><\/span>, but SymSGD can still be helpful in scenarios where you want to try many different learners and limit each of them to a single thread<\/p>\n<h4 id=\"fsharp\">Improvements to F# API and samples for ML.NET<\/h4>\n<p><span><a href=\"https:\/\/github.com\/dsyme\">D<\/a>on Syme<\/span> has been pioneering the work on driving improvements to the overall F# story for ML.NET. As <span><a href=\"https:\/\/github.com\/isaacabraham\">Isaac\u2019s<\/a><\/span> issue had pointed out ML.NET did not support F# records. Work here is still ongoing but with 0.4 release ML.NET allows use of property-based row classes in F#. You can learn more about Don\u2019s work as a <span><a href=\"https:\/\/github.com\/dotnet\/machinelearning\/pull\/616\">part of this PR<\/a><\/span>.<\/p>\n<p>As a part of this change we have also updated the dot.net machine learning samples repo to add the language pivot for \u2018fsharp\u2019 <a href=\"https:\/\/github.com\/dotnet\/machinelearning-samples\/tree\/master\/samples\/fsharp\/getting-started\">porting over the existing samples to work for F# as well. We would love for you try them out and contribute more!<\/a><\/p>\n<p><a href=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/10\/2019\/02\/fsharpsamples.png\"><img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/wp-content\/uploads\/sites\/10\/2019\/02\/fsharpsamples.png\" alt=\"\" width=\"587\" height=\"294\" class=\"alignnone wp-image-17575\" \/><\/a><\/p>\n<h4>Help shape ML.NET for your needs<\/h4>\n<p>If you haven\u2019t already, try out ML.NET you can <span><a href=\"https:\/\/www.microsoft.com\/net\/learn\/apps\/machine-learning-and-ai\/ml-dotnet\/get-started\/windows\">get started here<\/a><\/span>.\u00a0 We look forward to your feedback and welcome you to file issues with any suggestions or enhancements in the GitHub repo.<\/p>\n<p><a href=\"https:\/\/github.com\/dotnet\/machinelearning\">https:\/\/github.com\/dotnet\/machinelearning<\/a><\/p>\n<p><em>This blog was authored by Cesar de la Torre, Gal Oshri and Ankit Asthana <\/em><\/p>\n<p>Thanks,\nML.NET Team<\/p>\n","protected":false},"excerpt":{"rendered":"<p>A few months ago we released ML.NET 0.1 at \/\/Build 2018., ML.NET is a cross-platform, open source machine learning framework for .NET developers. We\u2019ve gotten great feedback so far and would like to thank the community for your engagement as we continue to develop ML.NET together in the open. We are happy to announce the [&hellip;]<\/p>\n","protected":false},"author":362,"featured_media":58792,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[691],"tags":[1199],"class_list":["post-20633","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ml-dotnet","tag-ml-net-0-4"],"acf":[],"blog_post_summary":"<p>A few months ago we released ML.NET 0.1 at \/\/Build 2018., ML.NET is a cross-platform, open source machine learning framework for .NET developers. We\u2019ve gotten great feedback so far and would like to thank the community for your engagement as we continue to develop ML.NET together in the open. We are happy to announce the [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/posts\/20633","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/users\/362"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/comments?post=20633"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/posts\/20633\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/media\/58792"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/media?parent=20633"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/categories?post=20633"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/tags?post=20633"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}