{"id":38941,"date":"2022-02-28T11:31:36","date_gmt":"2022-02-28T18:31:36","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/dotnet\/?p=38941"},"modified":"2022-02-28T11:31:36","modified_gmt":"2022-02-28T18:31:36","slug":"training-a-ml-dotnet-model-with-azure-ml","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/dotnet\/training-a-ml-dotnet-model-with-azure-ml\/","title":{"rendered":"Training a ML.NET Model with Azure ML"},"content":{"rendered":"<p>Model Builder makes it easy to get started with Machine Learning and create your first model. As you gather more data over time, you may want to continuously refine or retrain your model. Using a combination of CLI and Azure tooling, you can train a new ML.NET model and integrate the training into a pipeline. This blog post shows an example of a training pipeline that can be easily rerun using Azure. <\/p>\n<p>We\u2019re going to use <a href=\"https:\/\/docs.microsoft.com\/azure\/machine-learning\/concept-azure-machine-learning-architecture#datasets-and-datastores\">Azure Machine Learning Datasets<\/a> to track data and an Azure ML Pipeline to train a new model. This retraining pipeline can then be triggered by Azure DevOps.\nIn this post, we will cover: <\/p>\n<ol>\n<li>Creating an Azure Machine Learning Dataset<\/li>\n<li>Training a ML.NET model via the Azure Machine Learning CLI (v2)<\/li>\n<li>Creating a pipeline in Azure DevOps for re-training<\/li>\n<\/ol>\n<h2>Prerequisites<\/h2>\n<ol>\n<li><a href=\"https:\/\/docs.microsoft.com\/azure\/machine-learning\/how-to-manage-workspace?tabs=azure-portal#create-a-workspace\">Azure Machine Learning Workspace<\/a><\/li>\n<li><a href=\"https:\/\/docs.microsoft.com\/azure\/machine-learning\/how-to-create-attach-compute-cluster\">Compute Cluster<\/a> in the workspace<\/li>\n<\/ol>\n<h2>Creating an Azure Machine Learning Dataset<\/h2>\n<ol>\n<li>\n<p>Open the workspace in the <a href=\"https:\/\/ml.azure.com\">Microsoft Azure Machine Learning Studio<\/a>. <\/p>\n<\/li>\n<li>\n<p>We need to create a file dataset. Navigate to <strong>Datasets<\/strong>. Click <strong>+ Create Dataset<\/strong>. <\/p>\n<\/li>\n<li>\n<p>Choose the datasource. We will upload a copy of this <a href=\"https:\/\/www.kaggle.com\/yasserh\/song-popularity-dataset\">song popularity dataset<\/a> available from Kaggle. It\u2019s a fairly large dataset that I don\u2019t want to maintain locally. <\/p>\n<\/li>\n<li>\n<p>Give the dataset a unique name and make sure to choose &#8220;File\u201d as the Dataset type.\n<img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2022\/02\/CreateDatasetBasicInfo.png\" alt=\"Screenshot of Azure ML Create Dataset Basic Info step\" \/><\/p>\n<\/li>\n<li>\n<p>Upload from a local file to the default <em>workspaceblobstore<\/em>. Take note of the file name.\n<img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2022\/02\/CreateDatasetDatastoreSelection.png\" alt=\"Screenshot of Azure ML Create Dataset Datastore and file selection step\" \/><\/p>\n<\/li>\n<li>\n<p>When the data upload finishes, create the dataset. <\/p>\n<\/li>\n<li>\n<p>Click on the completed dataset to view it. Confirm the preview available in the <strong>Explore<\/strong> tab looks correct. <\/p>\n<\/li>\n<li>\n<p>Make note of the dataset name, file name, and if you uploaded multiple versions, the version number. We will use these values in the next step. <\/p>\n<\/li>\n<\/ol>\n<h2>Training a ML.NET model via Azure Machine Learning<\/h2>\n<p>Now that we have a dataset uploaded to Azure ML we can create an Azure ML training pipeline, and use Azure CLI v2 to run it. The pipeline below will create a Docker container with a ML.NET CLI instance that will conduct the training.<\/p>\n<ol>\n<li>\n<p>Create the Dockerfile and save it in a new folder for this experiment. If not familiar with Dockerfiles, these file types don&#8217;t have an extension. The file should be called &#8220;Dockerfile&#8221; with no extension, and contain the following: <\/p>\n<pre><code class=\"language-DOCKERFILE\">FROM mcr.microsoft.com\/dotnet\/sdk:6.0\r\nRUN dotnet tool install -g microsoft.mlnet-linux-x64\r\nENV PATH=\"$PATH:\/root\/.dotnet\/tools\"<\/code><\/pre>\n<\/li>\n<li>\n<p>We will need to figure out our ML.NET CLI command to train our model. If needed, see <a href=\"https:\/\/docs.microsoft.com\/dotnet\/machine-learning\/how-to-guides\/install-ml-net-cli\">installation instructions for the ML.NET CLI<\/a>.\n<img decoding=\"async\" src=\"https:\/\/devblogs.microsoft.com\/dotnet\/wp-content\/uploads\/sites\/10\/2022\/02\/CLICommandInfo.png\" alt=\"ML.NET CLI Regression command information\" \/><\/p>\n<p>We\u2019re doing regression and will specify a dataset and label column. Text classification and recommendation are also supported for tabular files. Check the command information or <a href=\"https:\/\/docs.microsoft.com\/dotnet\/machine-learning\/reference\/ml-net-cli-reference\">ML.NET CLI docs<\/a> for more details on other training scenarios. <\/p>\n<p>Make sure to include the option <code>--verbosity q<\/code>, as some of the CLI features can cause problems in the Linux environment. <\/p>\n<p><code>mlnet regression --dataset &lt;YOUR_DATA_FILE_NAME&gt; --label-col &lt;YOUR_LABEL&gt; --output outputs --log-file-path outputs\/logs --verbosity q<\/code><\/p>\n<\/li>\n<li>\n<p>Create the AzureTrain.yml file in the same folder as the Dockerfile. This is what will be passed to the Azure CLI.  By using input data in the pipeline, Azure ML will download the file dataset to our compute. The training file can then be referenced directly. We just need to specify the path in the command to the ML.NET CLI.  Do the following:<\/p>\n<ul>\n<li>Replace  with the unique dataset name, and  with the version number (likely 1). Both values are visible in the Dataset tab. In this example the value is <code>dataset: azureml:song_popularity:1<\/code>. <\/li>\n<li>Replace command with the local ML.NET CLI command. Instead of the local file path, we\u2019ll use {inputs.data} to tell the pipeline to use the download path on the Azure compute. Add the data file name. In this example it is <code>--dataset {inputs.data}\/song_data.csv<\/code>.<\/li>\n<li>Replace the compute with our compute name. The available compute clusters in the workspace are visible under <strong>Computes<\/strong> -&gt; <strong>Compute clusters<\/strong>. <\/li>\n<\/ul>\n<\/li>\n<\/ol>\n<p>For more information see <a href=\"\/\/docs.microsoft.com\/azure\/machine-learning\/reference-yaml-job-command\">command job YAML schema documentation<\/a>. <\/p>\n<pre><code class=\"language-YAML\">inputs:\r\n  data:\r\n    dataset: azureml:&lt;DATASET_NAME&gt;:&lt;VERSION&gt;\r\n    mode: download\r\nexperiment_name: mldotnet-training\r\ncode:\r\n  local_path: .\r\ncommand: mlnet regression --dataset {inputs.data}\/&lt;YOUR_DATA_FILE_NAME&gt; --label-col &lt;YOUR_LABEL_COLUMN&gt; --output outputs --log-file-path outputs\/logs --verbosity q\r\ncompute: azureml:&lt;YOUR-COMPUTE-NAME&gt;\r\nenvironment: \r\n  build:\r\n    local_path: .\r\n    dockerfile_path: Dockerfile<\/code><\/pre>\n<h3>Run manually<\/h3>\n<p>To kick off training from a local machine, or just test the functionality of the run, we can <a href=\"https:\/\/docs.microsoft.com\/azure\/machine-learning\/how-to-configure-cli\">install and setup the Azure CLI (v2) with ML extension<\/a>. In these instructions I&#8217;m running ml extension version 2.0.7.<\/p>\n<ol>\n<li>\n<p>Machine learning subcommands require the &#8211;workspace\/-w and &#8211;resource-group\/-g parameters. Configure the defaults for the group and workspace of the dataset.<br \/>\n<code>az configure \u2013-defaults group=&lt;YOUR_RESOURCE_GROUP&gt; workspace=&lt;YOUR_WORKSPACE&gt;<\/code><\/p>\n<\/li>\n<li>\n<p>Run the retraining pipeline created in the previous step.<br \/>\n<code>az ml job create \u2013-file AzureTrain.yml<\/code><\/p>\n<\/li>\n<li>\n<p>Check the results of the run online in the Azure Machine Learning Studio under <strong>Experiments<\/strong> -&gt; <em>mldotnet-training<\/em><\/p>\n<\/li>\n<\/ol>\n<h2>Automate training with Azure DevOps Services pipelines<\/h2>\n<p>We can run the Azure ML training via Azure DevOps Pipelines. This allows the use of any trigger, including time based or file changes.<\/p>\n<p>Below are the steps to get the Azure ML pipeline running. For more details, see <a href=\"https:\/\/github.com\/MicrosoftDocs\/pipelines-azureml\/blob\/master\/docs\/getting_started.md\">step-by-step instructions for setting up Azure DevOps and Azure ML<\/a>.<\/p>\n<ol>\n<li>Check the Dockerfile and AzureTrain.yml into source control. It is best to create a new subfolder to put these files into. Azure CLI will upload the whole containing folder when running the experiment. <\/li>\n<li>Create a service connection between Azure ML and Azure DevOps. In Azure DevOps:  \n<ol>\n<li>Go to <strong>Project settings<\/strong>. Select <strong>Pipelines<\/strong> -&gt; <strong>Service connections<\/strong><\/li>\n<li>Create a new connection of type Azure Resource Manager<\/li>\n<li>Select Service principal (automatic) and Scope Level Machine Learning Workspace. Configure it to the Resource Group of your Machine Learning workspace. Name it aml-ws.<\/li>\n<\/ol>\n<\/li>\n<li>In Azure DevOps create a new pipeline, using the following file as a template. Replace the variables and trigger (if applicable). The ml-ws-connection is the connection created in step 2. Depending on where the file is checked in, add the AzureTrain.yml file path to the &#8216;Create training job&#8217; step. <\/li>\n<\/ol>\n<pre><code class=\"language-YAML\">variables:\r\n  ml-ws-connection: 'aml-ws' # Workspace Service Connection name\r\n  ml-ws: '&lt;YOUR_VALUE&gt;' # AML Workspace name\r\n  ml-rg: '&lt;YOUR_VALUE&gt;' # AML resource Group name\r\n\r\ntrigger:\r\n  &lt;YOUR_TRIGGER&gt;\r\n\r\npool:\r\n  vmImage: ubuntu-latest\r\n\r\nsteps:\r\n\r\n- task: AzureCLI@2\r\n  displayName: 'Set config functionality'\r\n  inputs:\r\n    azureSubscription: $(ml-ws-connection)\r\n    scriptLocation: inlineScript\r\n    scriptType: 'bash'\r\n    inlineScript: 'az config set extension.use_dynamic_install=yes_without_prompt'\r\n\r\n- task: AzureCLI@2\r\n  displayName: 'Install AML CLI (azureml-v2-preview)'\r\n  inputs:\r\n    azureSubscription: $(ml-ws-connection)\r\n    scriptLocation: inlineScript\r\n    scriptType: 'bash'\r\n    inlineScript: 'az extension add -n ml'\r\n\r\n- task: AzureCLI@2\r\n  displayName: 'Setup default config values'\r\n  inputs:\r\n    azureSubscription: $(ml-ws-connection)\r\n    scriptLocation: inlineScript\r\n    scriptType: 'bash'\r\n    inlineScript: 'az configure --defaults group=$(ml-rg) workspace=$(ml-ws)'\r\n\r\n- task: AzureCLI@2\r\n  displayName: 'Create training job'\r\n  inputs:\r\n    azureSubscription: $(ml-ws-connection)\r\n    scriptLocation: inlineScript\r\n    scriptType: 'bash'\r\n    inlineScript: 'az ml job create --file &lt;YOUR_PATH&gt;\/AzureTrain.yml'<\/code><\/pre>\n<p>Running the Azure CLI job either locally or from Azure DevOps will create an output model in Azure. To see the model, go to <a href=\"https:\/\/ml.azure.com\">Microsoft Azure Machine Learning Studio<\/a> and navigate to your ML workspace. Click on <strong>Experiments<\/strong> -&gt; <em>mldotnet-training<\/em>. Toggle &#8220;View only my runs&#8221; to see runs started by the Azure Pipelines Service Principal. The completed training run should be visible. The trained model, and example code, is generated in the <strong>Outputs + Logs<\/strong> section, in the outputs folder.  <\/p>\n<p>In this post, we&#8217;ve created a flexible way to track our data and model via Azure ML.  The Azure ML Dataset can be added to and updated while maintaining historical data. This Azure ML retraining pipeline can be run manually or automatically in Azure DevOps.  Once your model is trained, you can deploy it using <a href=\"https:\/\/docs.microsoft.com\/azure\/machine-learning\/how-to-deploy-custom-container\">Azure ML custom containers<\/a>. <\/p>\n<p>Set up your own ML.NET retraining pipeline with Azure Machine Learning Datasets and Azure DevOps? Let us know of any issues, feature requests, or general feedback by filing an issue in the <a href=\"https:\/\/github.com\/dotnet\/machinelearning-modelbuilder\">ML.NET Tooling (Model Builder &amp; ML.NET CLI)<\/a> GitHub repo.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Learn how to create a training pipeline for ML.NET using Azure ML and Azure Devops.<\/p>\n","protected":false},"author":83667,"featured_media":38942,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[685,327,688,691],"tags":[37,6791,93,96],"class_list":["post-38941","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-dotnet","category-azure","category-machine-learning","category-ml-dotnet","tag-azure","tag-azure-machine-learning","tag-machine-learning","tag-ml-net"],"acf":[],"blog_post_summary":"<p>Learn how to create a training pipeline for ML.NET using Azure ML and Azure Devops.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/posts\/38941","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/users\/83667"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/comments?post=38941"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/posts\/38941\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/media\/38942"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/media?parent=38941"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/categories?post=38941"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/dotnet\/wp-json\/wp\/v2\/tags?post=38941"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}