All of these applications have a direct effect on our lives and can harm our society if not designed and engineered correctly, with considerations to fairness. These examples serve to underscore why it is so important for managers to guard against the potential reputational and regulatory risks that can result from biased data. Therefore, it is important for researchers and engineers to be concerned about the downstream applications and their potential harmful effects when modeling an algorithm or a system.
For every modern person whose life involves the use of a device to access the Internet, machine learning (ML) algorithms have become the fabric of their experience. Using machine learning, search engines like Google find us the “best” results, email hosts like Office 365 filter our spam, and social networks like Facebook track and tag our best friends and family for us. We are building algorithmic computations into our everyday experience. Some consequences of bias in machine learning can seem innocuous with a hypothetical long- term impact that can incur financial or mission loss. These machine learning applications are identified as “Type B” by researchers of cyber-physical safety at IBM. For example, a person could apply for a loan and get denied because of a decision made by a machine. The Type A application of machine learning is more insidious, with real-time or near term impact. Consider, for example, the use of an algorithm to calculate the risk of a criminal to commit a second crime or “recidivate”. The algorithm’s programmers must choose the right machine learning algorithm, the right metrics to weigh in making predictions, and an appropriately
Algorithms have been criticized as a method for obscuring racial prejudices in decision-making. Because of how certain races and ethnic groups were treated in the past, data can often contain hidden biases. For example, black people are likely to receive longer sentences than white people who committed the same crime. This could potentially mean that a system amplifies the original biases in the data. Machine learning algorithms proliferate in everyday life, the scientific community has become increasingly aware of the ethical challenges, both simple and complex, that arise with the technology. Machine bias is one such ethical challenge.
There are signs of existing self-correction in the AI industry: Researchers are looking at ways to reduce bias and strengthen ethics in rule-based artificial systems by taking human biases into account, for example.
These are good practices to follow; it’s important to be thinking proactively about ethics regardless of the regulatory environment. Let’s take a look at several points to keep in mind as you work on your AI.
Root out bias
To address potential machine-learning bias, the first step is to honestly and openly question what preconceptions could currently exist in an organization’s processes and actively hunt for how those biases might manifest themselves in data. Since this can be a delicate issue, many organizations bring in outside experts to challenge their past and current practices.
Once potential biases are identified, companies can block them by eliminating problematic data or removing specific components of the input data set. Managers for a credit card company, for example, when considering how to address late payments or defaults, might initially build a model with data such as zip codes, type of car driven, or certain first names — without acknowledging that these data points can correlate with race or gender. But that data should be stripped, keeping only data directly relevant to whether or not customers will pay their bills, such as data on credit scores or employment and salary information. That way, companies can build a solid machine-learning model to predict the likelihood of payment and determine which credit card customers should be offered more flexible payment plans and which should be referred to collection agencies.
A company can also expand the training data set with more information to counterweight potentially problematic data. Some companies, for example, have started to include social media data when evaluating the risk of a customer or client committing a financial crime. A machine-learning algorithm may flag a customer as high risk if he or she starts to post photos on social media from countries with potential terrorist or money-laundering connections. This conclusion can be tested and overridden, though, if a user’s nationality, profession, or travel proclivities are included to allow for a native visiting their home country or a journalist or businessperson on a work trip.
Regardless of which approach is used, as a best practice, managers must not take data sets at face value. It is safe to assume that bias exists in all data. The question is how to identify it and remove it from the model.
Choose a representative training data set
Machine-learning models are, at their core, predictive engines. Large data sets train machine-learning models to predict the future based on the past. Models can read masses of text and understand intent, where the intent is known. They can learn to spot differences — between, for instance, a cat and a dog — by consuming millions of pieces of data, such as correctly labeled animal photos.
The advantage of machine-learning models over traditional statistical models is their ability to quickly consume enormous numbers of records and thereby more accurately make predictions. But since machine-learning models predict exactly what they have been trained to predict, their forecasts are only as good as the data used for their training.
For example, a machine-learning model designed to predict the risk of business loan defaults may advise against extending credit to companies with strong cash flows and solid management teams if it draws a faulty connection — based on data from loan officers’ past decisions — about loan defaults by businesses run by people of a certain race or in a particular zip code. A machine-learning model used to scan reams of résumés or applications to schools might mistakenly screen out female applicants if the historical data used to train it reflects past decisions that resulted in few women being hired or admitted to a college.
These types of biases are especially pervasive in data sets based on decisions made by a relatively small number of people. As a best practice, managers must always keep in mind that if humans are involved in decisions, bias always exists — and the smaller the group, the greater the chance that the bias is not overridden by others.
Everyone’s participation is needed in the ML project to actively guard against bias in data selection. There’s a fine line you have to walk. Making sure the training data is diverse and includes different groups is essential, but segmentation in the model can be problematic unless the real data is similarly segmented.
It’s inadvisable — both computationally and in terms of public relations — to have different models for different groups. When there is insufficient data for one group, you could use weighting to increase its importance in training, but this should be done with extreme caution. It can lead to unexpected new biases.
For example, if you have only 40 people from Cincinnati in a data set and you try to force the model to consider their trends, you might need to use a large weight multiplier. Your model would then have a higher risk of picking up on random noise as trends — you could end up with results like “people named Brian have criminal histories.” This is why you need to be careful with weights, especially large ones.
Choose the right learning model for the problem
There’s a reason all AI models are unique: Each problem requires a different solution and provides varying data resources. There’s no single model to follow that will avoid bias, but there are parameters that can inform your team as it’s building.
For example, supervised and unsupervised learning models have their respective pros and cons. Unsupervised models that cluster or do dimensional reduction can learn bias from their data set. If belonging to group A highly correlates to behavior B, the model can mix up the two. And while supervised models allow for more control over bias in data selection, that control can introduce human bias into the process.
Non-bias through ignorance — excluding sensitive information from the model — may seem like a workable solution, but it still has vulnerabilities. In college admissions, sorting applicants by ACT scores are standard, but taking their ZIP code into account might seem discriminatory. But because test scores might be affected by the preparatory resources in a given area, including the ZIP code in the model could decrease bias.
You have to require your data scientists to identify the best model for a given situation. Sit down and talk them through the different strategies they can take when building a model. Troubleshoot ideas before committing to them. It’s better to find and fix vulnerabilities now — even if it means taking longer — than to have regulators find them later on.
Monitor performance using real data
No company is knowingly creating biased AI, of course — all these discriminatory models probably worked as expected in controlled environments. Unfortunately, regulators (and the public) don’t typically take the best intentions into account when assigning liability for ethical violations. That’s why you should be simulating real-world applications as much as possible when building algorithms.
It’s unwise, for example, to use test groups on algorithms already in production. Instead, run your statistical methods against real data whenever possible. Ask the data team to check simple test questions like “Do tall people default on AI-approved loans more than short people?” If they do, determine why.
When you’re examining data, you could be looking for two types of equality: equality of outcome and equality of opportunity. If you’re working on AI for approving loans, result equality would mean that people from all cities get loans at the same rates; opportunity equality would mean that people who would have returned the loan if given the chance are given the same rates regardless of city. Without the latter, the former could still hide if one city has a culture that makes defaulting on loans common.
Result equality is easier to prove, but it also means you’ll knowingly accept potentially skewed data. While it’s harder to prove opportunity equality, it is at least valid morally. It’s often practically impossible to ensure both types of equality, but oversight and real-world testing of your models should give you the best shot.
Eventually, these ethical AI principles will be enforced by legal penalties. If New York City’s early attempts at regulating algorithms are any indication, those laws will likely involve government access to the development process, as well as stringent monitoring of the real-world consequences of AI. The good news is that by using proper modeling principles, bias can be greatly reduced or eliminated, and those working on AI can help expose accepted biases, create a more ethical understanding of tricky problems and stay on the right side of the law — whatever it ends up being.
Counter bias in “dynamic” data sets
Another challenge for machine-learning models is to avoid bias where the data set is dynamic. Since machine-learning models are trained on events that have already happened, they cannot predict outcomes based on behavior that has not been statistically measured. For example, even though machine learning is extensively used in fraud detection, fraudsters can outmaneuver models by devising new ways to steal or escape detection. Employees can hide bad behavior from machine-learning tools used to identify bad conduct by using underhanded techniques like conversing in code.
To attempt to draw new conclusions from current information, some companies use more experimental, cognitive, or artificial intelligence techniques that model potential scenarios. For example, to outsmart money launderers, banks may conduct so-called war games with ex-prosecutors and investigators to discover how they would beat their system. That data is then used to handcraft a more up-to-date machine-learning algorithm.
But even in this situation, managers risk infusing bias into a model when they introduce new parameters. For example, social media data, such as pictures posted on Facebook and Twitter, is increasingly being used to drive predictive models. But a model that ingests this type of data might introduce irrelevant biases into its predictions, such as correlating people wearing blue shirts with improved creditworthiness.
To avoid doing so, managers must ensure that the new parameters are comprehensive and empirically tested — another best practice. Otherwise, those parameters might skew the model, especially in areas where data is poor. Insufficient data could impact, say, credit decisions for classes of borrowers who a bank has never lent to previously but wants to in the future.
Balance transparency against performance
One temptation with machine learning is to throw increasingly large amounts of data at a sophisticated training infrastructure and allow the machine to “figure it out.” For example, public cloud companies have recently released comprehensive tools that use automated algorithms instead of an expert data scientist to train and determine the parameters intended to optimize machine-learning models.
While this is a powerful method for building complex predictive algorithms quickly and at a lower cost, it also comes with the downside of limited visibility and the risk of the “machine running wild” and having an unconscious bias due to training data that is extraneous (like the blue shirt bias described above). The other challenge is that it is very difficult to explain how complex machine-learning models work, which is problematic in industries that are heavily regulated.
One of the potential options to address this risk is to take a staged approach to increase the sophistication of the model and making a conscious decision to progress at every stage.
A good example is a process used by a major bank in building a model that attempted to predict whether a mortgage customer was about to refinance, intending to make a direct offer to that customer and ideally retaining their business. The bank started with a simple regression-based model that tested its ability to predict when customers would refinance. It then created a set of more sophisticated “challenger” models that used more advanced machine-learning techniques and were more precise. By confirming that the challenger models were more accurate than the base regression model, bank managers became comfortable that their more complex and opaque machine-learning approach was operating in line with expectations and not propagating unintended biases. The process also enabled them to verify that the machine-learning tool’s balance between transparency and sophistication was in line with what is expected in the highly regulated financial services industry.