{"id":12717,"date":"2019-11-06T10:01:31","date_gmt":"2019-11-06T18:01:31","guid":{"rendered":"http:\/\/devblogs.microsoft.com\/cesardelatorre\/?p=12717"},"modified":"2020-03-02T06:38:54","modified_gmt":"2020-03-02T14:38:54","slug":"using-ml-net-in-jupyter-notebooks","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/cesardelatorre\/using-ml-net-in-jupyter-notebooks\/","title":{"rendered":"Using ML.NET in Jupyter notebooks"},"content":{"rendered":"<p><img decoding=\"async\" class=\"alignnone wp-image-12760\" src=\"http:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/00-mlnet-jupyter-logos.png\" alt=\"\" width=\"207\" height=\"84\" srcset=\"https:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/00-mlnet-jupyter-logos.png 539w, https:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/00-mlnet-jupyter-logos-300x122.png 300w\" sizes=\"(max-width: 207px) 100vw, 207px\" \/><\/p>\n<p>I do believe this is great news for the ML.NET community and .NET in general. You can now run .NET code (C# \/ F#) in Jupyter notebooks and therefore run ML.NET code in it as well! &#8211; Under the covers, this is enabled by &#8216;dotnet-try&#8217; and its related .NET kernel for Jupyter (as early previews).<\/p>\n<p>The <a href=\"https:\/\/jupyter.org\/\">Jupyter Notebook<\/a> is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text.<\/p>\n<p>In terms of ML.NET this is awesome for many scenarios like data exploration, data cleaning, plotting data charts, documenting model experiments, learning scenarios such as courses or hands-on-labs, quizzes, etc.<\/p>\n<h2>Show me the code and run it!<\/h2>\n<p>Although I&#8217;m showing in the following steps most of the code, step by step, it is always useful, especially when dealing with Jupyter notebooks to have the Jupyter notebook code and simply run it!<\/p>\n<p>I set up a Jupyter environment in <strong>MyBinder<\/strong> (public service in the Internet) which is a great way to try notebooks if you don&#8217;t have Jupyter setup in your own machine. You can run it by simply clicking on the below link:<\/p>\n<p><a href=\"https:\/\/mybinder.org\/v2\/gh\/CESARDELATORRE\/mlnet-on-jupyter-samples\/master\">\n<img decoding=\"async\" src=\"https:\/\/mybinder.org\/badge_logo.svg\" alt=\"\" width=\"109\" height=\"20\" \/>\u00a0<\/a><a href=\"https:\/\/mybinder.org\/v2\/gh\/CESARDELATORRE\/mlnet-on-jupyter-samples\/master\"> Ready-to-run ML.NET Jupyter notebook at MyBinder<\/a><\/p>\n<p>In that ready-to-run Jupyter notebook you can directly try ML.NET code, plotting charts from C#, display the training time and quality metrics, etc. as shown in the image below:<\/p>\n<p><img decoding=\"async\" class=\"alignnone wp-image-12786\" src=\"http:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/000-mybinder-initial-global-sample.png\" alt=\"\" width=\"499\" height=\"458\" srcset=\"https:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/000-mybinder-initial-global-sample.png 589w, https:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/000-mybinder-initial-global-sample-300x276.png 300w\" sizes=\"(max-width: 499px) 100vw, 499px\" \/><\/p>\n<p>You can also download the Jupyter notebook with ML.NET code that I&#8217;m using in this Blog Post from <a href=\"https:\/\/github.com\/CESARDELATORRE\/mlnet-on-jupyter-samples\/tree\/master\/NotebookExamples\">here (MLNET-Jupyter-Demo.ipynb).<\/a><\/p>\n<p>Note that after some time if your MyBinder environment was not active, it&#8217;ll be shutdown. Therefore, if you want to have a stable environment you might want to set it up on your own machine, as explained below.<\/p>\n<h2>Setting it up on your local machine<\/h2>\n<p>If you want to set it up on your local machine\/PC, you need to install:<\/p>\n<ul>\n<li>Jupyter (Easiest way is to install Anaconda).<\/li>\n<li>\u2018dotnet try\u2019 global tool.<\/li>\n<li>Enable the .NET kernel for Jupyter.<\/li>\n<\/ul>\n<h2>Install Jupyter on your machine<\/h2>\n<p>The easiest and recommended way to install Jupyter notebooks is by installing <strong>Anaconda<\/strong> <code>(conda)<\/code> but you can also use <code>pip<\/code>.<\/p>\n<p>When installing anaconda, it&#8217;ll also install Python. However, I want to highlight that ML.NET doesn&#8217;t have any dependency on Python, but Jupyter has.<\/p>\n<p>For more details on how to install Anaconda and Jupyter please checkout the <a href=\"https:\/\/test-jupyter.readthedocs.io\/en\/latest\/install.html\" rel=\"nofollow\">Jupyter installation guide<\/a>.<\/p>\n<h2>Install the &#8216;dotnet try&#8217; tool<\/h2>\n<p>The Jupyter kernel for .NET is based on the &#8216;dotnet try&#8217; tool you need to install first.<\/p>\n<p>The &#8216;dotnet try&#8217; tool is a CLI Global Tool so you install it with the &#8216;dotnet CLI&#8217;.<\/p>\n<p>Since these versions are early previews, they are still not in NuGet.org but in MyGet, therefore you need to provide the MyGet feed, like in the following CLI line:<\/p>\n<p><code>dotnet tool install -g dotnet-try<\/code><\/p>\n<p><em>Note: If you have the dotnet try global tool already installed, you will need to uninstall before grabbing the kernel enabled version of the dotnet try global tool.<\/em><\/p>\n<p><strong>List<\/strong> what Global Tools you have installed:<code>\ndotnet tool list -g\n<\/code><\/p>\n<p><strong>Update<\/strong> dotnet-try:<code>\ndotnet tool update -g dotnet-try\n<\/code>\n<strong>Uninstall<\/strong>:<code>\ndotnet tool uninstall dotnet-try -g\n<\/code><\/p>\n<h3><strong>Issues and Open Source code<\/strong><\/h3>\n<p>The &#8216;dotnet try&#8217; tool open source repo is here: <a href=\"https:\/\/github.com\/dotnet\/try\">https:\/\/github.com\/dotnet\/try<\/a> . You can research there for deeper details about it.<\/p>\n<p><strong>Issues and Feedback: <\/strong>If you have any issue with dotnet-try or the .NET kernel on Jupyter, please post it here: \u00a0<a href=\"https:\/\/github.com\/dotnet\/try\/issues\">https:\/\/github.com\/dotnet\/try\/<strong>issues<\/strong><\/a><\/p>\n<h2>Install the .NET kernel in Jupyter<\/h2>\n<ul>\n<li>If you have Jupyter using Anaconda then you should execute the commands below inside the <strong>Anaconda command prompt<\/strong>\n<ul>\n<li>Run the following command\u00a0<code>dotnet try jupyter install<\/code><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><img decoding=\"async\" class=\"alignnone size-full\" src=\"https:\/\/user-images.githubusercontent.com\/2546640\/63954737-93106e00-ca51-11e9-8c72-939f3f558d05.png\" width=\"735\" height=\"147\" \/><\/p>\n<h2>Test that it is working<\/h2>\n<ul>\n<li style=\"list-style-type: none;\">\n<ul>\n<li>Start the Anaconda Navigator app (Double click on \u2018Anaconda Navigator\u2019 icon)<\/li>\n<li>Launch Jupyter from the \u2018Launch\u2019 button in the \u2018Jupyter Notebook\u2019 tile.<\/li>\n<li>Alternatively, from the Anaconda Prompt you can also start Jupyter by typing the following command positioned at your user&#8217;s home path.:<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p style=\"padding-left: 120px;\"><code>jupyter notebook<\/code><\/p>\n<ul>\n<li>You will see Jupyter and your User\u2019s folders by default.<\/li>\n<li>Open the &#8216;New&#8217; menu option and you should see the &#8216;.NET (C#)&#8217; and &#8216;.NET (F#)&#8217; menu options:<img decoding=\"async\" class=\"alignnone wp-image-12721\" src=\"http:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/dotnet-kernel-in-jupyter.png\" alt=\"\" width=\"682\" height=\"240\" srcset=\"https:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/dotnet-kernel-in-jupyter.png 1841w, https:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/dotnet-kernel-in-jupyter-300x105.png 300w, https:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/dotnet-kernel-in-jupyter-768x269.png 768w, https:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/dotnet-kernel-in-jupyter-1024x359.png 1024w\" sizes=\"(max-width: 682px) 100vw, 682px\" \/><\/li>\n<li>\u00a0Select &#8216;.NET (C#)&#8217; and start hacking in C# in a new Jupyter notebook! \ud83d\ude42<\/li>\n<li>For instance, you can test that C# is working with simple code like the following:<img decoding=\"async\" class=\"alignnone size-full\" src=\"https:\/\/user-images.githubusercontent.com\/2546640\/63956636-d0c2c600-ca54-11e9-9559-c2cc2e18d882.png\" width=\"1449\" height=\"329\" \/><\/li>\n<\/ul>\n<p>Ok, let&#8217;s hack for a while and start writing ML.NET C# code in a Jupyter notebook! \ud83d\ude42<\/p>\n<h2>Install NuGet packages in your notebook<\/h2>\n<p>First things first. Before writing any ML.NET code you need the notebook to have access to the NuGet packages you are going to use. In this case, we&#8217;re going to use ML.NET and XPlot for plotting data distribution and the regression chart once the ML model is built.<\/p>\n<p>For that, write code like the following. Versions might vary and you could also add the &#8216;using&#8217; namespaces later on or in this same Jupyter cell:<\/p>\n<pre class=\"font:consolas lang:default decode:true \">\/\/ ML.NET Nuget packages installation\r\n#r \"nuget:Microsoft.ML,1.3.1\"\r\n\r\n\/\/Install XPlot package\r\n#r \"nuget:XPlot.Plotly,2.0.0\"<\/pre>\n<p>Run this cell once. It &#8216;ll take some time in order to download and install the NuGet packages, that&#8217;s why it is a good idea to have this installation in a separated cell.<\/p>\n<p><img decoding=\"async\" class=\"alignnone wp-image-12734\" src=\"http:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/1-install-nuget-packages.png\" alt=\"\" width=\"682\" height=\"292\" srcset=\"https:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/1-install-nuget-packages.png 1764w, https:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/1-install-nuget-packages-300x128.png 300w, https:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/1-install-nuget-packages-768x329.png 768w, https:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/1-install-nuget-packages-1024x438.png 1024w\" sizes=\"(max-width: 682px) 100vw, 682px\" \/><\/p>\n<h2>Declare the data-classes<\/h2>\n<p>When loading the datasets and when training or predicting you need to use an input class and a prediction class, like the following classes:<\/p>\n<p><img decoding=\"async\" class=\"alignnone wp-image-12736\" src=\"http:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/2-declare-data-classes.png\" alt=\"\" width=\"696\" height=\"412\" srcset=\"https:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/2-declare-data-classes.png 1593w, https:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/2-declare-data-classes-300x177.png 300w, https:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/2-declare-data-classes-768x454.png 768w, https:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/2-declare-data-classes-1024x606.png 1024w\" sizes=\"(max-width: 696px) 100vw, 696px\" \/><\/p>\n<p>Here&#8217;s the code you can copy\/paste in your notebook:<\/p>\n<pre class=\"font:consolas lang:default decode:true \">public class TaxiTrip\r\n{\r\n[LoadColumn(0)]\r\npublic string VendorId;\r\n\r\n[LoadColumn(1)]\r\npublic string RateCode;\r\n\r\n[LoadColumn(2)]\r\npublic float PassengerCount;\r\n\r\n[LoadColumn(3)]\r\npublic float TripTime;\r\n\r\n[LoadColumn(4)]\r\npublic float TripDistance;\r\n\r\n[LoadColumn(5)]\r\npublic string PaymentType;\r\n\r\n[LoadColumn(6)]\r\npublic float FareAmount;\r\n}\r\n\r\npublic class TaxiTripFarePrediction\r\n{\r\n[ColumnName(\"Score\")]\r\npublic float Score;\r\n}<\/pre>\n<p>&nbsp;<\/p>\n<h2>Load dataset in IDataView<\/h2>\n<p>The way you load data is exactly the same way you&#8217;d do in a regular C# project. You only need to place the dataset files in the same folder where you have your just created Jupyter notebook which by default will be your user&#8217;s root folder. You can copy the .csv files from this GitHub repo:<\/p>\n<p>https:\/\/github.com\/CESARDELATORRE\/mlnet-on-jupyter-samples\/tree\/master\/mlnet-jupyter-samples\/taxifare-mlnet-jupyter-demos<\/p>\n<p>Then, just write the following code and run it so you see the training IDataView schema:<\/p>\n<pre class=\"font:consolas font-size:10 lang:default decode:true \">display(h1(\"Code for loading the data into IDataViews: training dataset and test dataset\"));\r\n\r\nMLContext mlContext = new MLContext(seed: 0);\r\n\r\nstring TrainDataPath = \".\/taxi-fare-train.csv\";\r\nstring TestDataPath = \".\/taxi-fare-test.csv\";\r\n\r\nIDataView trainDataView = mlContext.Data.LoadFromTextFile&lt;TaxiTrip&gt;(TrainDataPath, hasHeader: true, separatorChar: ',');\r\nIDataView testDataView = mlContext.Data.LoadFromTextFile&lt;TaxiTrip&gt;(TestDataPath, hasHeader: true, separatorChar: ',');\r\n\r\ndisplay(h4(\"Schema of training DataView:\"));\r\ndisplay(trainDataView.Schema);<\/pre>\n<p>Here&#8217;s how you see it in Jupyter:<\/p>\n<p><img decoding=\"async\" class=\"alignnone wp-image-12738\" src=\"http:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/3-load-datasets-and-display-schema.png\" alt=\"\" width=\"653\" height=\"390\" srcset=\"https:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/3-load-datasets-and-display-schema.png 1503w, https:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/3-load-datasets-and-display-schema-300x179.png 300w, https:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/3-load-datasets-and-display-schema-768x458.png 768w, https:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/3-load-datasets-and-display-schema-1024x610.png 1024w\" sizes=\"(max-width: 653px) 100vw, 653px\" \/><\/p>\n<p>You can also visualize a few rows of the data loaded into any IDataView such as here:<\/p>\n<p><img decoding=\"async\" class=\"alignnone wp-image-12739\" src=\"http:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/4-visualize-few-rows-of-data.png\" alt=\"\" width=\"655\" height=\"409\" srcset=\"https:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/4-visualize-few-rows-of-data.png 1444w, https:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/4-visualize-few-rows-of-data-300x187.png 300w, https:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/4-visualize-few-rows-of-data-768x479.png 768w, https:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/4-visualize-few-rows-of-data-1024x639.png 1024w\" sizes=\"(max-width: 655px) 100vw, 655px\" \/><\/p>\n<p>This action is a bit more verbose, but we&#8217;re working on another data structure in .NET for exploring data named &#8216;DataFrame&#8217; very similar to the DataFrame in Pandas in Python which is a lot simpler than when working with the IDataview because the DataFrame is eager instead of lazy loading plus you don&#8217;t need to work with typed data classes just for exploring data.<\/p>\n<h2>Plotting data with XPlot<\/h2>\n<p>XPlot is a popular plotting library in the F# community that you can also use from C#:\u00a0<a href=\"https:\/\/fslab.org\/XPlot\/\">https:\/\/fslab.org\/XPlot\/<\/a><\/p>\n<p>In the initial cell you already installed its Nuget package so now you can simply use it in Jupyter.<\/p>\n<h3>Prepare data in arrays<\/h3>\n<p>XPlot works with any IEnumerable based type but the most common way is by using arrays, so first of all we&#8217;re going to extract some input variables data in a few arrays:<\/p>\n<pre class=\"font:consolas font-size:11 lang:default decode:true \">\/\/Extract some data into arrays for plotting:\r\n\r\nint numberOfRows = 1000;\r\nfloat[] fares = trainDataView.GetColumn&lt;float&gt;(\"FareAmount\").Take(numberOfRows).ToArray();\r\nfloat[] distances = trainDataView.GetColumn&lt;float&gt;(\"TripDistance\").Take(numberOfRows).ToArray();\r\nfloat[] times = trainDataView.GetColumn&lt;float&gt;(\"TripTime\").Take(numberOfRows).ToArray();\r\nfloat[] passengerCounts = trainDataView.GetColumn&lt;float&gt;(\"PassengerCount\").Take(numberOfRows).ToArray();<\/pre>\n<p>After running that in a Jupyter cell, you can now plot data distributions such as the following histogram where you can see that most of the taxi trips were between $5 and $10.<\/p>\n<p><img decoding=\"async\" class=\"alignnone wp-image-12791\" src=\"http:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/5-histogram-taxi-trips-per-cost.2.png\" alt=\"\" width=\"689\" height=\"517\" srcset=\"https:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/5-histogram-taxi-trips-per-cost.2.png 976w, https:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/5-histogram-taxi-trips-per-cost.2-300x225.png 300w, https:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/5-histogram-taxi-trips-per-cost.2-768x576.png 768w\" sizes=\"(max-width: 689px) 100vw, 689px\" \/><\/p>\n<p>Or more interestingly, you can see how the &#8216;distance&#8217; input variable impacts the fare\/price of the taxi trips, although you can also see that some other variables might be influencing, as well, because when the distance is higher the dots are more sparse probably due to the &#8216;time&#8217; variable that you can also plot.<\/p>\n<p><img decoding=\"async\" class=\"alignnone size-full wp-image-12792\" src=\"http:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/7-plot-distance-vs-fares.2.png\" alt=\"\" width=\"696\" height=\"648\" srcset=\"https:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/7-plot-distance-vs-fares.2.png 696w, https:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/7-plot-distance-vs-fares.2-300x279.png 300w\" sizes=\"(max-width: 696px) 100vw, 696px\" \/><\/p>\n<p>You can check the Jupyter notebook file (MLNET-Jupyter-Demo.ipynb) I&#8217;m providing and see additional plotting charts I explored.<\/p>\n<h2>Create the ML Regression model with ML.NET<\/h2>\n<p>Now, let&#8217;s get into ML.NET code. We&#8217;ll first work on the data transformations then we&#8217;ll add the trainer\/algorithm and finally we&#8217;ll train the model which creates the model itself.<\/p>\n<h2>Data transformations in the model pipeline<\/h2>\n<p>In order to create a regression model we first need to make some data transformations (convert text to numbers, normalize and concatenate input variables) in our pipeline such as the following:<\/p>\n<pre class=\"font:consolas font-size:9 lang:default decode:true\">display(h1(\"Apply Data Transformations pipeline\"));\r\n\r\n\/\/ STEP 2: Common data process configuration with pipeline data transformations\r\nvar dataProcessPipeline = mlContext.Transforms.Categorical.OneHotEncoding(outputColumnName: \"VendorIdEncoded\", \r\n                                                                          inputColumnName: nameof(TaxiTrip.VendorId))\r\n     .Append(mlContext.Transforms.Categorical.OneHotEncoding(outputColumnName: \"RateCodeEncoded\", \r\n                                                             inputColumnName: nameof(TaxiTrip.RateCode)))\r\n     .Append(mlContext.Transforms.Categorical.OneHotEncoding(outputColumnName: \"PaymentTypeEncoded\",\r\n                                                             inputColumnName: nameof(TaxiTrip.PaymentType)))\r\n     .Append(mlContext.Transforms.NormalizeMeanVariance(outputColumnName: nameof(TaxiTrip.PassengerCount)))\r\n     .Append(mlContext.Transforms.NormalizeMeanVariance(outputColumnName: nameof(TaxiTrip.TripTime)))\r\n     .Append(mlContext.Transforms.NormalizeMeanVariance(outputColumnName: nameof(TaxiTrip.TripDistance)))\r\n     .Append(mlContext.Transforms.Concatenate(\"Features\", \"VendorIdEncoded\", \"RateCodeEncoded\", \"PaymentTypeEncoded\",\r\n                                              nameof(TaxiTrip.PassengerCount), nameof(TaxiTrip.TripTime), \r\n                                              nameof(TaxiTrip.TripDistance)));\r\n<\/pre>\n<p>You should run that code in a new Jupyter cell you create.<\/p>\n<p>If you want to learn more about the data transformations needed for a regression problem, take a look to this tutorial:<\/p>\n<p><a href=\"https:\/\/docs.microsoft.com\/en-us\/dotnet\/machine-learning\/tutorials\/predict-prices\">https:\/\/docs.microsoft.com\/en-us\/dotnet\/machine-learning\/tutorials\/predict-prices<\/a><\/p>\n<h2>Add the trainer\/algorithm and train the model<\/h2>\n<p>In the following code we add the trainer\/algorithm SDCA (Stochastic Dual Coordinate Ascent) to the pipeline and then we train the model by calling the fit() method and providing the training dataset:<\/p>\n<pre class=\"font:consolas font-size:11 lang:default decode:true \">%%time\r\nvar trainer = mlContext.Regression.Trainers.Sdca(labelColumnName: \"FareAmount\", featureColumnName: \"Features\");\r\nvar trainingPipeline = dataProcessPipeline.Append(trainer);\r\n\r\nvar trainedModel = trainingPipeline.Fit(trainDataView);<\/pre>\n<p>And here&#8217;s the execution in Jupyter with just some more &#8216;displaying info&#8217; lines of code:<\/p>\n<p><img decoding=\"async\" class=\"alignnone size-full wp-image-12745\" src=\"http:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/8-train-model.png\" alt=\"\" width=\"1283\" height=\"576\" srcset=\"https:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/8-train-model.png 1283w, https:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/8-train-model-300x135.png 300w, https:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/8-train-model-768x345.png 768w, https:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/8-train-model-1024x460.png 1024w\" sizes=\"(max-width: 1283px) 100vw, 1283px\" \/><\/p>\n<p>A very interesting thing you can use in C# when running a cell is the <strong>&#8216;%%time&#8217;<\/strong> code which will measure the time it needed to run all the code in that Jupyter cell. This is especially interesting when you know something is going to take its time, like when training an ML model, depending on how much data you have for training. In that case above it tells us it needed almost 3 seconds, but if you have a lot of data it could be minutes or even hours.<\/p>\n<h2>Evaluate the model&#8217;s quality: Metrics<\/h2>\n<p>Once you have the model another important step is to figure out how good it is by calculating the performance metrics with some predictions that are compared to the actual values from a test-dataset, like in the following code:<\/p>\n<pre class=\"font:consolas font-size:11 lang:default decode:true \">IDataView predictionsDataView = trainedModel.Transform(testDataView);\r\nvar metrics = mlContext.Regression.Evaluate(predictionsDataView, labelColumnName: \"FareAmount\", scoreColumnName: \"Score\");\r\n\r\ndisplay(metrics);<\/pre>\n<p>Here you can directly see the metrics in the Jupyter notebook in a very neat way by simply calling &#8216;display(metrics)&#8217; \ud83d\ude42<\/p>\n<p><img decoding=\"async\" class=\"alignnone wp-image-12747\" src=\"http:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/9-show-quality-metrics.png\" alt=\"\" width=\"702\" height=\"245\" srcset=\"https:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/9-show-quality-metrics.png 1289w, https:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/9-show-quality-metrics-300x105.png 300w, https:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/9-show-quality-metrics-768x268.png 768w, https:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/9-show-quality-metrics-1024x357.png 1024w\" sizes=\"(max-width: 702px) 100vw, 702px\" \/><\/p>\n<h2>Make predictions in bulk and show a bar diagram comparing predictions vs. actual values<\/h2>\n<p>Here&#8217;s the code on how to make a few predictions and show in a bar chart a comparison of predictions versus actual values from the test dataset:<\/p>\n<pre class=\"font:consolas font-size:9 lang:default decode:true\">\/\/ Number of rows to use for Bar chart\r\nint totalNumberForBarChart = 20;\r\n\r\nfloat[] actualFares = predictionsDataView.GetColumn&lt;float&gt;(\"FareAmount\").Take(totalNumberForBarChart).ToArray();\r\nfloat[] predictionFares = predictionsDataView.GetColumn&lt;float&gt;(\"Score\").Take(totalNumberForBarChart).ToArray();\r\nint[] elements = Enumerable.Range(0, totalNumberForBarChart).ToArray();\r\n\r\n\/\/ Define group for Actual values\r\nvar ActualValuesGroupBarGraph = new Graph.Bar()\r\n{\r\n   x = elements,\r\n   y = actualFares,\r\n   name = \"Actual\"\r\n};\r\n\r\n\/\/ Define group for Prediction values\r\nvar PredictionValuesGroupBarGraph = new Graph.Bar()\r\n{\r\n   x = elements,\r\n   y = predictionFares,\r\n   name = \"Predicted\"\r\n};\r\n\r\nvar chart = Chart.Plot(new[] {ActualValuesGroupBarGraph, PredictionValuesGroupBarGraph});\r\nvar layout = new Layout.Layout(){barmode = \"group\", title=\"Actual fares vs. Predicted fares Comparison\"};\r\nchart.WithLayout(layout);\r\nchart.WithXTitle(\"Cases\");\r\nchart.WithYTitle(\"Fare\");\r\nchart.WithLegend(true);\r\nchart.Width = 700;\r\nchart.Height = 400;\r\n\r\ndisplay(chart);\r\n<\/pre>\n<p>And here&#8217;s the bar chart in Jupyter:<\/p>\n<p><img decoding=\"async\" class=\"alignnone wp-image-12797\" src=\"http:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/10-bar-chart-predictions-vs-actual.2.png\" alt=\"\" width=\"683\" height=\"476\" srcset=\"https:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/10-bar-chart-predictions-vs-actual.2.png 634w, https:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/10-bar-chart-predictions-vs-actual.2-300x209.png 300w\" sizes=\"(max-width: 683px) 100vw, 683px\" \/><\/p>\n<h2>Plotting Predictions vs. Actual values plus the Regression line<\/h2>\n<p>Finally, with the following code you can plot the predictions vs. the actual values. If the regression model is working well the dots should be most of them around a straight line which is the regression line. Also, the closer the regression line is to the &#8216;perfect line&#8217; (prediction is equal to the actual value in the test dataset), the better quality your model has.<\/p>\n<p>Here&#8217;s the code:<\/p>\n<pre class=\"font:consolas font-size:9 lang:default decode:true\">using XPlot.Plotly;\r\n\r\n\/\/ Number of rows to use for Plotting the Regression chart\r\nint totalNumber = 500;\r\n\r\nfloat[] actualFares = predictionsDataView.GetColumn&lt;float&gt;(\"FareAmount\").Take(totalNumber).ToArray();\r\nfloat[] predictionFares = predictionsDataView.GetColumn&lt;float&gt;(\"Score\").Take(totalNumber).ToArray();\r\n\r\n\/\/ Display the Best Bit Regression Line\r\n\r\n\/\/ Define scatter plot grapgh (dots)\r\nvar ActualVsPredictedGraph = new Graph.Scatter()\r\n{\r\n   x = actualFares,\r\n   y = predictionFares,\r\n   mode = \"markers\",\r\n   marker = new Graph.Marker() { color = \"purple\"} \/\/\"rgb(142, 124, 195)\"\r\n};\r\n\r\n\/\/ Calculate Regression line\r\n\/\/ Get a touple with the two X and two Y values determining the regression line\r\n(double[] xArray, double[] yArray) = CalculateRegressionLine(actualFares, predictionFares, totalNumber);\r\n\r\n\/\/ Define grapgh for the line\r\nvar regressionLine = new Graph.Scatter()\r\n{\r\n   x = xArray,\r\n   y = yArray,\r\n   mode = \"lines\"\r\n};\r\n\r\n\/\/ Display the 'Perfect' line, 45 degrees (Predicted values equal to actual values)\r\nvar maximumValue = Math.Max(actualFares.Max(), predictionFares.Max());\r\n\r\nvar perfectLine = new Graph.Scatter()\r\n{\r\n   x = new[] {0, maximumValue},\r\n   y = new[] {0, maximumValue},\r\n   mode = \"lines\",\r\n   line = new Graph.Line(){color = \"grey\"}\r\n};\r\n\r\n\/\/ XPlot CSharp samples: https:\/\/fslab.org\/XPlot\/chart\/plotly-line-scatter-plots.html\r\n\/\/Display the chart's figures\r\nvar chart = Chart.Plot(new[] {ActualVsPredictedGraph, regressionLine, perfectLine });\r\nchart.WithXTitle(\"Actual Values\");\r\nchart.WithYTitle(\"Predicted Values\");\r\nchart.WithLegend(true);\r\nchart.WithLabels(new[]{\"Prediction vs. Actual\", \"Regression Line\", \"Perfect Regression Line\"});\r\nchart.Width = 700;\r\nchart.Height = 600;\r\n\r\ndisplay(chart);<\/pre>\n<p>And this is how you&#8217;ll see the regression line and plot chart:<\/p>\n<p><img decoding=\"async\" class=\"alignnone wp-image-12798\" src=\"http:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/11-regression-line-and-plotting-chart.2.png\" alt=\"\" width=\"610\" height=\"452\" srcset=\"https:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/11-regression-line-and-plotting-chart.2.png 1235w, https:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/11-regression-line-and-plotting-chart.2-300x222.png 300w, https:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/11-regression-line-and-plotting-chart.2-768x569.png 768w, https:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/11-regression-line-and-plotting-chart.2-1024x759.png 1024w\" sizes=\"(max-width: 610px) 100vw, 610px\" \/><\/p>\n<p>&nbsp;<\/p>\n<h2>Save the ML model as a file<\/h2>\n<p>Finally, you can also save the ML.NET model file and see it in the same folder than your Jupyter notebook:<\/p>\n<p><img decoding=\"async\" class=\"alignnone wp-image-12750\" src=\"http:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/12-saving-model-file.png\" alt=\"\" width=\"660\" height=\"517\" srcset=\"https:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/12-saving-model-file.png 1013w, https:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/12-saving-model-file-300x235.png 300w, https:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-content\/uploads\/sites\/32\/2019\/09\/12-saving-model-file-768x602.png 768w\" sizes=\"(max-width: 660px) 100vw, 660px\" \/><\/p>\n<p>You can the take that .ZIP file (ML.NET model) and deploy it (consume it) in any .NET application like you can see here for making predictions in an Azure Function or an ASP.NET Core app\/WebAPI:<\/p>\n<ul>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/dotnet\/machine-learning\/how-to-guides\/serve-model-serverless-azure-functions-ml-net\">Deploying an ML.NET model into an Azure Function<\/a><\/li>\n<li><a href=\"https:\/\/docs.microsoft.com\/en-us\/dotnet\/machine-learning\/how-to-guides\/serve-model-web-api-ml-net\">Deploying an ML.NET model into an ASP.NET Core app\/WebAPI<\/a><\/li>\n<\/ul>\n<h2>Conclusions and take aways<\/h2>\n<p>Jupyter is a great environment for scenarios such as:<\/p>\n<ul>\n<li>Data exploration and plotting<\/li>\n<li>Documenting Machine Learning model experiments and conclusions<\/li>\n<li>Creating courses based on Jupyter notebooks. Great for many learning scenarios<\/li>\n<li>Labs or Hands on labs<\/li>\n<li>Creating quizzes for learning environments<\/li>\n<\/ul>\n<p>And now with the .NET kernel for Jupyter you can take advantage of it for all those scenarios.<\/p>\n<p>Please, feel free to send us your feedback through this blog post comments or into the following GitHub issues:<\/p>\n<p>dotnet-try feedback: <a href=\"https:\/\/github.com\/dotnet\/try\/issues\">https:\/\/github.com\/dotnet\/try\/issues<\/a><\/p>\n<p>ML.NET feedback:\u00a0 <a href=\"https:\/\/github.com\/dotnet\/machinelearning\/issues\">https:\/\/github.com\/dotnet\/machinelearning\/issues<\/a><\/p>\n<p>We can&#8217;t wait to hear from you about the ideas and assets you can create with Jupyter+ML.NET! \ud83d\ude42<\/p>\n<p>Happy coding!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I do believe this is great news for the ML.NET community and .NET in general. You can now run .NET code (C# \/ F#) in Jupyter notebooks and therefore run ML.NET code in it as well! &#8211; Under the covers, this is enabled by &#8216;dotnet-try&#8217; and its related .NET kernel for Jupyter (as early previews). [&hellip;]<\/p>\n","protected":false},"author":362,"featured_media":12764,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[1],"tags":[370,586,308,309,299,587],"class_list":["post-12717","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-cesardelatorre","tag-ai","tag-jupyter","tag-machine-learning","tag-ml","tag-ml-net","tag-notebooks"],"acf":[],"blog_post_summary":"<p>I do believe this is great news for the ML.NET community and .NET in general. You can now run .NET code (C# \/ F#) in Jupyter notebooks and therefore run ML.NET code in it as well! &#8211; Under the covers, this is enabled by &#8216;dotnet-try&#8217; and its related .NET kernel for Jupyter (as early previews). [&hellip;]<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-json\/wp\/v2\/posts\/12717","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-json\/wp\/v2\/users\/362"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-json\/wp\/v2\/comments?post=12717"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-json\/wp\/v2\/posts\/12717\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-json\/wp\/v2\/media\/12764"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-json\/wp\/v2\/media?parent=12717"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-json\/wp\/v2\/categories?post=12717"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/cesardelatorre\/wp-json\/wp\/v2\/tags?post=12717"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}