Responsibility and Fairness With Microsoft Machine Learning

What is unfairness in machine learning?

AI and machine learning hold the promise of creating a positive societal change. Ideally for the collective good when a model is developed well, but could also have unintended collective bad results when not. The AI/ML industry realized from early implementation mistakes that the technology while designed with good intent, but without proper guardrails, could be manipulated to produce unintended results often derived from the misclassification of the input data resulting in biases around age, race, gender, religion among others.

There is a quote from Aristotle that goes "There is nothing so unequal as the equal treatment of unequals." All people should be treated in a fair and ethical manner, but what is fairness to a machine? Can this very human concept be reduced into mathematical algorithms? The truth is that fairness and equality are difficult concepts that do not translate well into machine learning. And without fairness, we are not able to establish trust.

Why and how does it happen? We flock together.

It is worth mentioning here that in the United States, discrimination or bias based on personal characteristics is considered illegal, and any determination based on a person's race, sexual orientation, or age, will be met with punitive action. In many industries such as credit/financial, this practice has been illegal for quite some time, but then how does it happen? If you are a student of human behavior, the answer may not surprise you.

There are specific aspects to human behavior we don't need machine learning to help us understand. People tend to prefer mingling with other like-minded individuals or groups that look, sound, and think similarly to them. We have been behaving this way all through human existence. You can see this evidenced by looking at any major city such as New York where in Lower Manhattan which has the largest ethnic Chinese population outside of Asia. Or in the city of Dearborn in Michigan, you can find the largest concentration of Arab Americans in the United States. You can even observe this on a much smaller scale such as in a school cafeteria where children will often choose to sit and converse with other children with similar affinity.

Dr. Cynthia Dwork, a scientist within the Microsoft Research group who has studied and worked on the problem of "Algorithmic Fairness" illustrated a good example of what is referred to as the Birds of a Feather phenomenon which describes how people tend to fall into groups similar to themselves. The example Cynthia described used an undergraduate research study from MIT that performed an analysis on public data from Facebook such as a person's feed. The study postulated that if a person were a male, and 5% of that person's friend connections were self-described as gay, then there would be a strong probability that person is also gay without even publicly revealing that information to the world.

At the macro-level (city, state, or zip code), this Birds of a Feather phenomenon could come into play where a data set of applicants for a credit card was cleansed of feature criteria for evaluation such as race or gender but could still be biased by offering up less desirable APR terms to applicants in different zip codes due to these redundant encodings within the data set. An evaluation of an applicant's level of risk and ability to pay back the loan amount could be inferred based on where the person lives (e.g. The locale where the person resides is known to be an impoverished community).

The above examples show that most issues with AI/ML are at the group, not the individual level due to how the data is typically processed.

It can be difficult to distinguish whether the source of the fairness concern was societal (e.g. redundant features in the data set such as the above Facebook example) or technical (based on 'bad' data being fed into the model). The reality is that no AI/ML solution today can be assumed to be 100 percent perfect when it is applied to humans.

One attempt to understand "unfairness' in AI/ML is to determine the type of harm that it can cause. Two common types are:

Harm of allocation: Where a model withholds resources, opportunities, or information for certain groups over others such as job applications, school admissions, or financial lending.
Harm of quality of service: Where a model benefits one group more than others such as a visual recognition system that incorrectly identifies a person's race or gender.

Mitigate fairness risk: The tools

We live in a world that is inherently unfair, but Microsoft as well as others have been working to develop tools and frameworks that we can leverage to help mitigate fairness concerns by measuring the trade-offs between fairness and model performance. Fairness issues can arise from multiple sources such as societal as well as technical which make it nearly impossible to "de-bias" a system entirely, but the goal should always be to mitigate harm as much as possible.

One tool is the open-source community-driven project Fairlearn which can help Data Scientists with the evaluation of fairness concerns in binary classification and regression models for groups (not individuals). The goal of the tool is to help identify specific groups of individuals that are considered at high risk for experiencing unfairness. If you are planning on leveraging Fairlearn SDK for image or text data, you will need to provide a classifier that provides your own fit() and predict() methods.

Fairlearn is easily installed as a Python PIP package and once integrated into your code can be leveraged by a built-in interactive dashboard or via Matplotlib visualizations.

As of today, Fairlearn provides three algorithms (with more in development) for detecting fairness across multiple dimensions including Demographic Parity (DP), Equalized Odds (EO), True Positive Rate Parity (TPRP) and False Positive Rate Parity (FPRP).

Fairlearn supported algorithms (source: fairlearn.org)

Another great option from Microsoft for leveraging the Fairlearn SDK is available within their AzureML service which provides integration and a dashboard for fairness built into the workspace.

The above image illustrates the accuracy of identifying sensitive features such as race. The below image illustrates the accuracy of false negatives to positives.

AzureML - fairness false negative to positive rates

Mitigate fairness risk: The process

Fairness mitigation is a hot area of research that continues to show great promise and will continue to improve over time. The tools and frameworks available today such as Fairlearn can be used to help combat fairness concerns however these solutions are not mature enough to handle the job on their own today. To help address the gap in technology you should consider the use of an Institutional Review Board (IRB) or Ethics Committees (EC) to help mitigate the shortcoming of the tools/frameworks. When people are involved, it should not be the domain of computer engineers and scientists to solve this problem on their own.

AI/ML technology has the power to enhance the human experience and turn our dreams into reality. However, when bias is dependent on the use case to determine the "ethicalness" of it, having an independent body review along with the data science teams can go a long way in helping to establish trust in data science when applied to groups of people.

Please reach out to WWT and let us know how we can help you on your journey to making the world a fairer place for everyone with your AI/ML use cases.

Getting started

If you are looking for some resources to help you get started with deploying Fairness across your AI/ML models, the following resources sourced for this article may help.