{"id":35507,"date":"2019-03-01T06:00:04","date_gmt":"2019-03-01T13:00:04","guid":{"rendered":"http:\/\/devblogs.microsoft.com\/premier-developer\/?p=35507"},"modified":"2019-02-28T20:59:59","modified_gmt":"2019-03-01T03:59:59","slug":"exploring-feature-weights-using-r-and-azure-machine-learning-studio","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/premier-developer\/exploring-feature-weights-using-r-and-azure-machine-learning-studio\/","title":{"rendered":"Exploring Feature Weights using R and Azure Machine Learning Studio"},"content":{"rendered":"<p><span style=\"font-size: 12pt\"><span style=\"font-family: 'Segoe UI',sans-serif;color: #333333;background: white\">In this post, Senior App Dev Manager <\/span><a href=\"https:\/\/www.linkedin.com\/in\/parkrandy\/\"><span style=\"font-family: 'Segoe UI',sans-serif;color: #005da6;text-decoration: none\">Randy Park<\/span><\/a><span style=\"font-family: 'Segoe UI',sans-serif;color: #333333;background: white\"> performs an exploratory data analysis using R and Azure Machine Learning Studio.<\/span><\/span><\/p>\n<hr \/>\n<p><span style=\"font-size: 12pt\">Suppose we have to design a black box which will display a \u201cthumbs up\u201d<\/span> or \u201cthumbs down\u201d depending on hundreds of different combinations of inputs, we probably have a general idea of utilizing machine learning methodologies designing of the black box, here we call it a model. But often, more important question than how to build the model is rather how we can find out by looking at the proposed model, which of the input is more important than others. What is the weightiest input or feature to drive the outcome? We may think such problem is rather simple in this day and age, where <a href=\"https:\/\/docs.microsoft.com\/en-us\/machine-learning-server\/install\/microsoftml-install-pretrained-models\">abundance of machine learning algorithms and models<\/a> are readily available. Once we start to analyze the each of implementation deeper and engage into coding or hands-on exercise, we come to appreciate the simple abstractions on the complexities underneath. We, however, will see outcome of design approach may not necessarily result the same outcome, and digging in details may provide further insights on the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Exploratory_data_analysis\">exploratory data analysis (EDA)<\/a>.<\/p>\n<p><strong>Objective<\/strong>: Put it more practically, the exercises in this article will utilize <a href=\"https:\/\/www.ibm.com\/communities\/analytics\/watson-analytics-blog\/hr-employee-attrition\/\">a very popular exercise\u00a0dataset to conduct EDA<\/a> to determine factors that lead to attrition. The snapshot of data with first 12 columns, out of 35 total columns is as below.:<\/p>\n<p><img decoding=\"async\" width=\"1826\" height=\"784\" class=\"wp-image-35530\" src=\"http:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2019\/02\/word-image-158.png\" srcset=\"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2019\/02\/word-image-158.png 1826w, https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2019\/02\/word-image-158-300x129.png 300w, https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2019\/02\/word-image-158-768x330.png 768w, https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2019\/02\/word-image-158-1024x440.png 1024w\" sizes=\"(max-width: 1826px) 100vw, 1826px\" \/><\/p>\n<p>We will compare high level findings by common implementations in R vs. utilizing Machine Learning modules provided by Azure Machine Learning Studio.<\/p>\n<p><a href=\"https:\/\/docs.microsoft.com\/en-us\/machine-learning-server\/r-client\/what-is-microsoft-r-client\"><span style=\"font-size: 14pt\"><strong>R implementation using Microsoft R Client or R Studio<\/strong><\/span><\/a><\/p>\n<p style=\"padding-left: 30px\">Using R, we will utilize simple finding of correlation coefficients to locate highest correlating coefficients with the Attrition.<\/p>\n<p><strong><a href=\"https:\/\/studio.azureml.net\/\"><span style=\"font-size: 14pt\">Azure Machine Learning Studio<\/span><\/a><\/strong><\/p>\n<p style=\"padding-left: 30px\">We will utilize Azure machine learning studio to utilize simple and powerful <a href=\"https:\/\/blogs.technet.microsoft.com\/machinelearning\/2015\/04\/14\/permutation-feature-importance\/\">permutation feature importance<\/a>.<\/p>\n<p>There are many implementations of above-mentioned dataset; anyone can find abundance of implementations linked to the <a href=\"https:\/\/www.kaggle.com\/datasets\">dataset hosted in kaggle.com<\/a>. Practical outcome of such exploratory data analysis is to identify at least the top three or four factors that contribute to employee turnover. Each of the analysis should be backed up by robust experimentation and where applicable, the appropriate visualization and you will find plenty examples in kaggle.com and others alike.<\/p>\n<p><span style=\"font-size: 14pt\"><strong>1. R implementations using with Microsoft R Client or R Studio<\/strong><\/span><\/p>\n<p>Follow installation of Microsoft R client instructions in this <a href=\"https:\/\/docs.microsoft.com\/en-us\/machine-learning-server\/r-client\/what-is-microsoft-r-client\">Microsoft R Client introduction documentation<\/a>. Or you can choose to utilize popular OSS tool of <a href=\"https:\/\/www.rstudio.com\/\">R Studio<\/a>.<\/p>\n<p>Below steps are <span style=\"text-decoration: underline\">not<\/span> necessarily the entire code and not naming all the R package requirements, rather highlights the key details of implementation in R.<\/p>\n<p>1. Import Data and prepare the <a href=\"https:\/\/www.rdocumentation.org\/packages\/base\/versions\/3.5.2\/topics\/data.frame\">data frame<\/a> to operate computations.<\/p>\n<pre class=\"lang:default decode:true\">## Import the data\r\ncase_data&lt;-data.frame(read_excel(\"Data\/attrition-data.xlsx\"))<\/pre>\n<p>2. Preprocess<\/p>\n<p>We need to remove non value-added variables from the dataset<\/p>\n<ul>\n<li><em>EmployeeCount<\/em>: Always 1, since the data set is by employee.<\/li>\n<li><em>Over18<\/em>: All employees are \u201cY\u201d. Age is a more meaningful and relevant variable.<\/li>\n<li><em>StandardHours<\/em>: All are \u201c80\u201d.<\/li>\n<\/ul>\n<pre class=\"lang:default decode:true\">## Remove redundandt info: EmployeeCount, Over18, StandardHours\r\ndf&lt;-case_data[,-c(9,22,27)]\r\n\r\n## Convert characters to factors\r\ndf %&gt;% map_if(is.character, as.factor) %&gt;% as_data_frame -&gt; df\r\n\r\n## Adjust factor levels as needed\r\nlevels(df$BusinessTravel)&lt;-c(\"Non-Travel\",\"Travel_Rarely\",\"Travel_Frequently\")\r\n\r\n## Make all variable numeric\r\nnumdf&lt;-data.frame(sapply(df,as.numeric))<\/pre>\n<p>3. Calculate <a id=\"post-35507-_Hlk2202566\"><\/a>correlation coefficients to locate highest correlating coefficients with the Attrition.<\/p>\n<pre class=\"lang:default decode:true\">## Correlate variables\r\nAttcor&lt;-data.frame(cor(numdf))\r\n\r\n## Create Attrition object for Attrition correlation coefficients\r\nAttrition&lt;-data.frame(Attcor$Attrition)\r\n\r\n## Name attrition rows\r\nAttrition$Parameter&lt;-row.names(Attcor)\r\n\r\n## Rename titles Attrition\r\nnames(Attrition)&lt;-c(\"Correlation\", \"Parameter\")\r\n\r\n## Sort positive Attrition\r\nSortAtt&lt;-Attrition[order(-Attrition$Correlation),]\r\n\r\n## Display top 10 Positively Correlated Parameters\r\nrow.names(SortAtt)&lt;-NULLknitr::kable(head(SortAtt, 10))<\/pre>\n<p>The output of above command is as below:<\/p>\n<table style=\"width: 368px\">\n<thead>\n<tr>\n<th style=\"width: 165.27px\"><strong>Correlation<\/strong><\/th>\n<th style=\"width: 201.73px\"><strong>Parameter<\/strong><\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td style=\"width: 165.27px\">1.0000000<\/td>\n<td style=\"width: 201.73px\">Attrition<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 165.27px\">0.2461180<\/td>\n<td style=\"width: 201.73px\">OverTime<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 165.27px\">0.1620702<\/td>\n<td style=\"width: 201.73px\">MaritalStatus<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 165.27px\">0.0779236<\/td>\n<td style=\"width: 201.73px\">DistanceFromHome<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 165.27px\">0.0671515<\/td>\n<td style=\"width: 201.73px\">JobRole<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 165.27px\">0.0639906<\/td>\n<td style=\"width: 201.73px\">Department<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 165.27px\">0.0434937<\/td>\n<td style=\"width: 201.73px\">NumCompaniesWorked<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 165.27px\">0.0294533<\/td>\n<td style=\"width: 201.73px\">Gender<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 165.27px\">0.0268455<\/td>\n<td style=\"width: 201.73px\">EducationField<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 165.27px\">0.0151702<\/td>\n<td style=\"width: 201.73px\">MonthlyRate<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n<p>What does the above outcome highlight? Perhaps one can come to below observations:<\/p>\n<table>\n<thead>\n<tr>\n<th style=\"width: 213px\"><strong>Top 3 Parameters<\/strong><\/th>\n<th style=\"width: 802px\"><strong>Initial Observation<\/strong><\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td style=\"width: 213px\">Overtime<\/td>\n<td style=\"width: 802px\"><strong>Higher<\/strong>\u00a0reported Overtime, employees are\u00a0<strong>more<\/strong>\u00a0likely to leave<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 213px\">Marital Status<\/td>\n<td style=\"width: 802px\"><strong>Single<\/strong>\u00a0employees are\u00a0<strong>more<\/strong>\u00a0likely to leave. This observation may be tied to other factors and can be challenging for company to address<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 213px\">Distance from Home<\/td>\n<td style=\"width: 802px\"><strong>Farther<\/strong>\u00a0employees (employees whose home is farther from work) are\u00a0<strong>more<\/strong>\u00a0likely to leave<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n<p>This should not be looked as if de-facto conclusion regarding attrition, but rather one of many possible observations. As any of <a href=\"https:\/\/en.wikipedia.org\/wiki\/Exploratory_data_analysis\">exploratory data analysis (EDA)<\/a> leads to, the outcome is dependent upon the exploration methods, algorithm and how you prepare the data .. etc.<\/p>\n<p><span style=\"font-size: 14pt\"><strong>2. Utilizing Azure Machine Learning Studio<\/strong><\/span><\/p>\n<p>Before we get into the implementation specifics, it is beneficial to remind the typical five stages of machine learning implementations. Borrowing from the <a href=\"http:\/\/download.microsoft.com\/download\/C\/4\/6\/C4606116-522F-428A-BE04-B6D3213E9E52\/ml_studio_overview_v1.1.pdf\">Microsoft Azure Machine Learning Studio Capabilities Overview diagram<\/a>, below figure explains the those stages of the machine learning implementation. This approach should be also utilized in any of machine learning projects such as R, python or ML.Net.<\/p>\n<p><img decoding=\"async\" width=\"665\" height=\"842\" class=\"wp-image-35531\" src=\"http:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2019\/02\/word-image-159.png\" srcset=\"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2019\/02\/word-image-159.png 665w, https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2019\/02\/word-image-159-237x300.png 237w\" sizes=\"(max-width: 665px) 100vw, 665px\" \/><\/p>\n<p>To those who are experienced in machine learnings, they would mention of two possible approaches to solve this problem. First, <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/machine-learning\/studio-module-reference\/filter-based-feature-selection\">Filter Based Feature Selection<\/a> and second, <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/machine-learning\/studio-module-reference\/permutation-feature-importance\">Permutation Feature Importance<\/a>.<\/p>\n<p>They seem to accomplish similar tasks in that both assign scores to variables so that we can identify which variables or features are significant. \u00a0However, the approach of each determination is quite different. Note the definition of each from Microsoft documentation:<\/p>\n<ul>\n<li>Filter Based Feature Selection (FBFS) \u2013 Identifies the features in a dataset with the greatest predictive power.<\/li>\n<li>Permutation Feature Importance (PFI) \u2013 Computes the permutation feature importance scores of feature variables given a trained model and a test dataset.<\/li>\n<\/ul>\n<p>We will compare each outcome to the previously hand-coded R implementation.<\/p>\n<p>First, login to <a href=\"https:\/\/studio.azureml.net\/\">studio.azure.net<\/a> and create a Permutation Feature Importance (PFI).<\/p>\n<p><img decoding=\"async\" width=\"1096\" height=\"942\" class=\"wp-image-35532\" src=\"http:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2019\/02\/word-image-160.png\" srcset=\"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2019\/02\/word-image-160.png 1096w, https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2019\/02\/word-image-160-300x258.png 300w, https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2019\/02\/word-image-160-768x660.png 768w, https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2019\/02\/word-image-160-1024x880.png 1024w\" sizes=\"(max-width: 1096px) 100vw, 1096px\" \/><\/p>\n<p>Once the workspace is opened, click on <strong>Saved Datasets<\/strong> and <strong>My Datasets<\/strong>. You will see <strong>+New<\/strong> button in bottom of the screen, by clicking it, you should be able to choose <strong>DATASET<\/strong>, then click \u201c<strong>From Local File<\/strong>\u201d. The upload of dataset dialog box will appear as below:<\/p>\n<p><img decoding=\"async\" width=\"808\" height=\"758\" class=\"wp-image-35533\" src=\"http:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2019\/02\/word-image-161.png\" srcset=\"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2019\/02\/word-image-161.png 808w, https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2019\/02\/word-image-161-300x281.png 300w, https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2019\/02\/word-image-161-768x720.png 768w\" sizes=\"(max-width: 808px) 100vw, 808px\" \/><\/p>\n<p>Once it\u2019s available in the workspace, drag it to workspace.<\/p>\n<p>Then, choose <strong>Feature Section<\/strong> and drag <strong>Filter Based Feature Selection<\/strong> item on the workspace. After connect the \u201cattrition.csv\u201d dataset to the input of FBFS, you would choose the feature scoring method to <strong>Chi Squared<\/strong> and choose the target column to Attrition.<\/p>\n<p><img decoding=\"async\" width=\"1662\" height=\"455\" class=\"wp-image-35534\" src=\"http:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2019\/02\/word-image-162.png\" srcset=\"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2019\/02\/word-image-162.png 1662w, https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2019\/02\/word-image-162-300x82.png 300w, https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2019\/02\/word-image-162-768x210.png 768w, https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2019\/02\/word-image-162-1024x280.png 1024w\" sizes=\"(max-width: 1662px) 100vw, 1662px\" \/><\/p>\n<p>Hover over FBFS box, then right click, execute <strong>Run selected<\/strong>. It will execute the selection. After the run, right click on the FBFS box, then select <strong>Features<\/strong> &gt; <strong>Visualize <\/strong>and the below output is displayed.<\/p>\n<p><img decoding=\"async\" width=\"894\" height=\"387\" class=\"wp-image-35535\" src=\"http:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2019\/02\/word-image-163.png\" srcset=\"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2019\/02\/word-image-163.png 894w, https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2019\/02\/word-image-163-300x130.png 300w, https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2019\/02\/word-image-163-768x332.png 768w\" sizes=\"(max-width: 894px) 100vw, 894px\" \/><\/p>\n<p>Interestingly, as with first experiment with R using correlation coefficients, it identified the overtime as the large factor.<\/p>\n<p>Then from <strong>Data Transformation,<\/strong> choose \u201c<strong>Select Columns in Dataset<\/strong>\u201d and the workspace will now look like somewhat below,<\/p>\n<p><img decoding=\"async\" width=\"1183\" height=\"1084\" class=\"wp-image-35536\" src=\"http:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2019\/02\/word-image-164.png\" srcset=\"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2019\/02\/word-image-164.png 1183w, https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2019\/02\/word-image-164-300x275.png 300w, https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2019\/02\/word-image-164-768x704.png 768w, https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2019\/02\/word-image-164-1024x938.png 1024w\" sizes=\"(max-width: 1183px) 100vw, 1183px\" \/><\/p>\n<p>Then we will choose all the columns except <em>EmployeeCount<\/em>, <em>Over18<\/em> and <em>StandardHours<\/em> as stated previously.<\/p>\n<p><img decoding=\"async\" width=\"1330\" height=\"659\" class=\"wp-image-35537\" src=\"http:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2019\/02\/word-image-165.png\" srcset=\"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2019\/02\/word-image-165.png 1330w, https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2019\/02\/word-image-165-300x149.png 300w, https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2019\/02\/word-image-165-768x381.png 768w, https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2019\/02\/word-image-165-1024x507.png 1024w\" sizes=\"(max-width: 1330px) 100vw, 1330px\" \/><\/p>\n<p>Then we will choose <strong>Train Model<\/strong> box, then choose the Attrition column as Label.<\/p>\n<p><img decoding=\"async\" width=\"619\" height=\"225\" class=\"wp-image-35538\" src=\"http:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2019\/02\/word-image-166.png\" srcset=\"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2019\/02\/word-image-166.png 619w, https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2019\/02\/word-image-166-300x109.png 300w\" sizes=\"(max-width: 619px) 100vw, 619px\" \/><\/p>\n<p>Finally, we will run PFI experiment. Click <strong>Run<\/strong>. Once it is run, the model is trained and available for review. Hover over Permutation Feature Importance then right click as below:<\/p>\n<p><img decoding=\"async\" width=\"831\" height=\"626\" class=\"wp-image-35539\" src=\"http:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2019\/02\/word-image-167.png\" srcset=\"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2019\/02\/word-image-167.png 831w, https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2019\/02\/word-image-167-300x226.png 300w, https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2019\/02\/word-image-167-768x579.png 768w\" sizes=\"(max-width: 831px) 100vw, 831px\" \/><\/p>\n<p>Once you click Visualize, we should be able to see the PFI report as below.<\/p>\n<p><img decoding=\"async\" width=\"789\" height=\"1109\" class=\"wp-image-35540\" src=\"http:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2019\/02\/word-image-168.png\" srcset=\"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2019\/02\/word-image-168.png 789w, https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2019\/02\/word-image-168-213x300.png 213w, https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2019\/02\/word-image-168-768x1079.png 768w, https:\/\/devblogs.microsoft.com\/premier-developer\/wp-content\/uploads\/sites\/31\/2019\/02\/word-image-168-729x1024.png 729w\" sizes=\"(max-width: 789px) 100vw, 789px\" \/><\/p>\n<p>What does the above PFI findings highlight? Perhaps we can come to below observation:<\/p>\n<table style=\"height: 218px\">\n<tbody>\n<tr style=\"height: 28px\">\n<td style=\"width: 177px;height: 28px\"><strong>Top 4 Features<\/strong><\/td>\n<td style=\"width: 837px;height: 28px\"><strong>Initial Observation<\/strong><\/td>\n<\/tr>\n<tr style=\"height: 28px\">\n<td style=\"width: 177px;height: 28px\">Overtime<\/td>\n<td style=\"width: 837px;height: 28px\"><strong>Higher<\/strong>\u00a0reported Overtime, employees are\u00a0<strong>more<\/strong>\u00a0likely to leave.<\/td>\n<\/tr>\n<tr style=\"height: 54px\">\n<td style=\"width: 177px;height: 54px\">Business Travel<\/td>\n<td style=\"width: 837px;height: 54px\"><strong>More <\/strong>travel requirements may result\u00a0<strong>more<\/strong>\u00a0likely to leave. This observation may be tied to job roles and can be challenging for company to address.<\/td>\n<\/tr>\n<tr style=\"height: 54px\">\n<td style=\"width: 177px;height: 54px\">Job Involvement<\/td>\n<td style=\"width: 837px;height: 54px\">Employees consider their level of job involvement seriously and those may not feel involved are\u00a0<strong>more<\/strong>\u00a0likely to leave<\/td>\n<\/tr>\n<tr style=\"height: 54px\">\n<td style=\"width: 177px;height: 54px\">Job Satisfaction<\/td>\n<td style=\"width: 837px;height: 54px\">As tied to \u201cJob Involvement\u201d feature, employees consider their job satisfaction seriously and those unsatisfied are\u00a0<strong>more<\/strong>\u00a0likely to leave<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n<p>Again, as stated before, this should not be looked as if de-facto conclusion in regard to attrition. However, it\u2019s interesting to note of top concern of \u201cOvertime\u201d agreed with R implementation of correlation coefficients. If you would like to find out more on this powerful experiment, please read about <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/machine-learning\/studio-module-reference\/permutation-feature-importance\">permutation feature importance<\/a>.<\/p>\n<p><span style=\"font-size: 14pt\"><strong>Wrap Up<\/strong><\/span><\/p>\n<p>I find that machine learning experiment\u2019s results are always interesting and somewhat unexpected in certain cases. On this comparison, the feature ranking results of PFI are often different from the feature selection statistics that are utilized before a model is created. This is useful in many cases, especially when training \u201cblack-box\u201d models where it is difficult to explain how the model characterizes the relationship between the features and the target variable.<\/p>\n<p>We encourage everyone try this out and apply various statistical modules and machine learning models. In upcoming blog, I plan to bring out how we can achieve similar implementation applying the <a href=\"https:\/\/devblogs.microsoft.com\/dotnet\/announcing-ml-net-0-9-machine-learning-for-net\/#feature-contribution-calculation-and-other-model-explainability-improvements\">PFI from ML.Net<\/a> and comparison reports with this blog.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I find that machine learning experiment\u2019s results are always interesting and somewhat unexpected in certain cases. On this comparison, the feature ranking results of PFI are often different from the feature selection statistics that are utilized before a model is created. This is useful in many cases, especially when training \u201cblack-box\u201d models where it is difficult to explain how the model characterizes the relationship between the features and the target variable.<\/p>\n","protected":false},"author":582,"featured_media":35509,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[8,2824,129,1],"tags":[265,3],"class_list":["post-35507","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data","category-ml","category-premier","category-permierdev","tag-machine-learning","tag-team"],"acf":[],"blog_post_summary":"<p>I find that machine learning experiment\u2019s results are always interesting and somewhat unexpected in certain cases. On this comparison, the feature ranking results of PFI are often different from the feature selection statistics that are utilized before a model is created. This is useful in many cases, especially when training \u201cblack-box\u201d models where it is difficult to explain how the model characterizes the relationship between the features and the target variable.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/posts\/35507","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/users\/582"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/comments?post=35507"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/posts\/35507\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/media\/35509"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/media?parent=35507"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/categories?post=35507"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/tags?post=35507"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}