{"id":38842,"date":"2020-04-07T07:19:49","date_gmt":"2020-04-07T14:19:49","guid":{"rendered":"https:\/\/devblogs.microsoft.com\/premier-developer\/?p=38842"},"modified":"2020-03-23T07:34:29","modified_gmt":"2020-03-23T14:34:29","slug":"bias-in-machine-learning","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/premier-developer\/bias-in-machine-learning\/","title":{"rendered":"Bias in Machine Learning"},"content":{"rendered":"<p>Dev Consultant <a href=\"https:\/\/www.linkedin.com\/in\/ashley-shorter\/\">Ashley Shorter<\/a> examines the dangers of bias and importance of ethics in Machine Learning.<\/p>\n<hr \/>\n<p><span data-contrast=\"auto\">In our digital <\/span><span data-contrast=\"auto\">era<\/span><span data-contrast=\"auto\">, efficiency is expected. We can instantly find the fastest route to a destination, make purchases with our voice, and get recommendations based on our previo<\/span><span data-contrast=\"auto\">us purchases<\/span><span data-contrast=\"auto\">. <\/span><span data-contrast=\"auto\">T<\/span><span data-contrast=\"auto\">hese examples of machine learning have impacted our lives <\/span><span data-contrast=\"auto\">positively,<\/span><span data-contrast=\"auto\"> saving customers time and companies money.<\/span><span data-wac-het=\"1\" data-ccp-props=\"{&quot;134233117&quot;:true,&quot;134233118&quot;:true,&quot;201341983&quot;:0,&quot;335559740&quot;:240}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">Machine learning is the scientific study of algorithms and statistical models that result in devices automatically learning and improving from experiences <\/span><b><span data-contrast=\"auto\">without being explicitly programmed<\/span><\/b><span data-contrast=\"auto\">. With so much success integrating machine learning into our everyday lives, the <\/span><span data-contrast=\"auto\">obvious <\/span><span data-contrast=\"auto\">next step is to <\/span><span data-contrast=\"auto\">integrate <\/span><span data-contrast=\"auto\">machine learning into even more systems. <\/span><span data-contrast=\"auto\">Unfortunately,<\/span><span data-contrast=\"auto\"> the collected data used to train machine learning models is often riddled with bias.\u00a0\u00a0<\/span><span data-wac-het=\"1\" data-ccp-props=\"{&quot;134233117&quot;:true,&quot;134233118&quot;:true,&quot;201341983&quot;:0,&quot;335559740&quot;:240}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">Every time a dataset includes human decisions there is bias. Bias is learned stereotypes or prejudice in favor or against one thing or another. It can be conscious or unconscious. Datasets are vulnerable to bias during the data cleaning process where data is <\/span><span data-contrast=\"auto\">manipulated <\/span><span data-contrast=\"auto\">to increase overall effectiveness and accuracy. Both missing data and outliers need to be handled during data cleaning. If done inaccurately, bias may be introduced into the <\/span><span data-contrast=\"auto\">model and machine<\/span><span data-contrast=\"auto\">.\u00a0<\/span><span data-wac-het=\"1\" data-ccp-props=\"{&quot;134233117&quot;:true,&quot;134233118&quot;:true,&quot;201341983&quot;:0,&quot;335559740&quot;:240}\">\u00a0<\/span><\/p>\n<h3><span data-contrast=\"auto\">Missing Data<\/span><span data-wac-het=\"1\" data-ccp-props=\"{&quot;134233117&quot;:true,&quot;134233118&quot;:true,&quot;201341983&quot;:0,&quot;335559740&quot;:240}\">\u00a0<\/span><\/h3>\n<p><span data-contrast=\"auto\">There are three main types of missing <\/span><span data-contrast=\"auto\">data:<\/span><span data-contrast=\"auto\"> missing completely at random (MCAR), missing at random (MAR), and missing not at random or non-ignorable (MNAR\/NI). In MCAR, the cause of missing data is independent of any values and\/or variables in the dataset (i.e. a broken test tube). This is uncommon but ideal because analysis remains unbiased. MAR is where the cause of the missing data is related to some of the non-missing values of other variables. Bias may be introduced when analyzing if an observation is dependent on a missing value (MCAR or MAR) and how to handle it. For example, a dataset of patient information includes age and blood pressure. A data scientist notices multiple blood pressure observations missing from patient data. During data cleaning, she concludes that the missing blood pressure observations are MCAR and deletes <\/span><span data-contrast=\"auto\">all<\/span><span data-contrast=\"auto\"> the patient information for anyone who is missing a blood pressure reading. What she doesn&#8217;t know is that doctors are less likely to take the blood pressure of younger patients, so the missing data is MAR, not MCAR. The data scientist has now unconsciously introduced bias toward younger patients into her dataset and, eventually, her model. A similar problem can happen with MNAR data because it is missing data whose cause is related to other missing data (i.e. people with higher and lower income are less likely to provide their income information). It is important to correctly identify and handle missing data to prevent bias.<\/span><span data-wac-het=\"1\" data-ccp-props=\"{&quot;134233117&quot;:true,&quot;134233118&quot;:true,&quot;201341983&quot;:0,&quot;335559740&quot;:240}\">\u00a0<\/span><\/p>\n<h3><span data-contrast=\"auto\">Outliers<\/span><\/h3>\n<p><span data-contrast=\"auto\">Sometimes there are observations that are significantly distant from other observations in a dataset. These are called outliers and are caused by errors within a dataset or natural variability. While outliers can reveal valuable information about a dataset, it can also skew your model and lead to less than optimal results. It is up to the data scientist to determine if outliers need to be removed, replaced, or remain in the dataset. If natural variability is treated as an error by mistake, a data scientist will remove the observation and valuable information will be lost. For example, in the early stages of the Flint Water Crisis in Flint, Michigan, water samples were taken from select homes in the community and tested for lead. The results revealed that <\/span><span data-contrast=\"auto\">a majority of<\/span><span data-contrast=\"auto\"> homes had safe levels of lead in their water supply, but a couple of homes came back with results of dangerously high levels of lead in the water supply. The cause of the outliers was deemed human error and they were consequently removed from the dataset. The revised dataset was then sent back to the appropriate authorities and no action was taken to improve the water quality. In this case, the outlier was not dealt with appropriately and, as a result, introduced bias into the dataset, putting the health of people at risk.<\/span><span data-wac-het=\"1\" data-ccp-props=\"{&quot;134233117&quot;:true,&quot;134233118&quot;:true,&quot;201341983&quot;:0,&quot;335559740&quot;:240}\">\u00a0<\/span><\/p>\n<p>Bias can have dangerous consequences. The effects of writing our unconscious bias into machine learning models can make a machine, whose task is efficiency, just as flawed as human beings. \u00a0As flawed as we are \u2212 How can we fix a problem that we cannot see? \u00a0While Machine Learning is a powerful tool that brings values to\u00a0 many industries and problems, it\u2019s critically important to be aware of the inherent bias humans bring to the table.\u00a0 This is also a key reason that ethical principles must be considered in the future of AI.\u00a0 To learn more about what Microsoft is doing in this space, visit <a href=\"https:\/\/www.microsoft.com\/en-us\/ai\/responsible-ai\">Microsoft AI Principals<\/a>.<\/p>\n<p><span data-wac-het=\"1\" data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:259}\">\u00a0<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Machine learning is the scientific study of algorithms and statistical models that result in devices automatically learning and improving from experiences without being explicitly programmed. With so much success integrating machine learning into our everyday lives, the obvious next step is to integrate machine learning into even more systems.<\/p>\n","protected":false},"author":582,"featured_media":38844,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[14],"tags":[15,9459,265,3],"class_list":["post-38842","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai","tag-ai","tag-ethics","tag-machine-learning","tag-team"],"acf":[],"blog_post_summary":"<p>Machine learning is the scientific study of algorithms and statistical models that result in devices automatically learning and improving from experiences without being explicitly programmed. With so much success integrating machine learning into our everyday lives, the obvious next step is to integrate machine learning into even more systems.<\/p>\n","_links":{"self":[{"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/posts\/38842","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/users\/582"}],"replies":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/comments?post=38842"}],"version-history":[{"count":0,"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/posts\/38842\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/media\/38844"}],"wp:attachment":[{"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/media?parent=38842"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/categories?post=38842"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/devblogs.microsoft.com\/premier-developer\/wp-json\/wp\/v2\/tags?post=38842"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}