Companies use machine learning algorithms to help with everything from identifying business opportunities and optimizing marketing spend to personalizing the customer experience. These algorithms are typically built on a solid base of historical data and well-trained prior to deployment. They are often also built to operate dynamically, so that they become smarter and faster over time. Even so, it’s not uncommon for machine learning algorithms to be marred by unintended biases.
At RPA, we’ve worked with many brands across many different algorithmic solutions, and we routinely come across the same basic types of biases. While algorithms are getting smarter all the time, it often takes a human eye to detect these biases as well as a human hand to steer them back on course. This is imperative to do because a biased algorithm produces incorrect results. In the very worst case, a bias can sabotage your entire project.
What do these biases look like, and how do you address them? Here are the 5 main types of biases we find lurking in algorithms, and recommendations on how to get a handle on them.
1. The Square-Peg Bias
The Square-Peg bias comes into play when your algorithm is built on the wrong data. It is a “foundational data bias,” where the “foundation” of data used to build the algorithm is not representative of the current use case. And it happens all the time. For instance, perhaps your algorithm was built based on people who signed on during a promotion, but you want to apply it to a broader consumer market. In this case, the algorithm is likely to recommend the kinds of promotional ideas that are exactly wrong for a broader market.
Examples of the Square-Peg bias abound. Perhaps your algorithm was based on brick-and-mortar sales, or a short timeframe of action, and you want to apply it to digital sales, or a longer window of time. Any time your foundational data is very different from your use case, your algorithm is likely to produce biased—and, therefore, unhelpful—results.
Recommendation: For each new use case for an algorithm, go back to the foundational data. Is the foundational data reasonably representative of the new use case? If yes, great. If no, consider what adjustments can be made to remove the bias. And whenever data outputs raise red flags, go back to the foundational data one more time.
2. The Wolf-in-Sheep's-Clothing Bias
Sometimes the metrics your algorithm uses doesn’t mean what you think they do. When this happens, you have a Wolf-in-Sheep’s-Clothing hiding in your algorithm. And when you think a metric means one thing, but it really means another, you can end up with systematically biased output.
A classic example comes from website retargeting. Retargeting algorithms try to lure people back to a website on the assumption that past-viewed content is especially compelling to people. However, that assumption is oftentimes dead-wrong. In fact, past-viewed content is often what triggered people to leave a website—and therefore not a compelling draw at all. Time spent on website, repeat store visits, and even content clicks are other examples of metrics that often act as sneaky wolves. All of these might be good or bad, depending on the context, but if your algorithm is making the wrong assumptions then it could be sabotaging success, rather than helping to drive it.
Recommendation: On a regular basis, consider what assumptions are baked into your algorithm. Are those assumptions valid? If you aren’t sure, consider conducting qualitative research, survey research, or additional analysis. Since the wrong assumptions can literally sabotage your work, no assumptions should be left unchecked.
3. The Has-Been Bias
An algorithm’s star can fade with time and, often, this brings with it a bias of its own. Many things change over time—everything from shopping habits, to computer processing speeds, to the acceptability of fraud, to customers’ social norms. If your algorithm is making “dated” assumptions about how the world works, it will create a bias in the outputs.
To take a very straightforward example, several years back, we worked with a technology client who conducted a customer segmentation to identify different “technology mindsets.” But the analysis was largely based on technology usage (e.g., use of digital satellite TV)—rather than more dynamic/relative metrics or psychographic/attitudinal metrics. Technology usage, of course, lives on shifting sands, and it proved to be a short-lived signifier of mindsets. Indeed, within 1–2 years, the original algorithm was no longer able to distinguish among different groups of people (the moderately tech-savvy people were being put into the same bucket as the most tech-savvy), so the segmentation completely lost its utility.
Recommendation: No matter what type of algorithms you are working with, always ask when they were created, and consider how the environment may have changed, and whether that affects how well your algorithm might work. Often very subtle changes over time (e.g., shifts in customer demographics) can introduce significant biases.
4. The Missing-the-Forest-for-the-Trees Bias
Even the most complex algorithms can have blind spots, and these blind spots can introduce bias into your outputs. The Missing-the-Forest-for-the-Trees bias occurs when your algorithm is missing a critical piece of the puzzle. Often those missing pieces can heavily influence or even flip a recommendation, so this is an important bias to hunt for.
Consider multi-touch attribution (MTA) algorithms. These algorithms crunch through every digital media metric under the sun to generate recommended media allocations for digital marketers. But they typically turn a blind eye to non-digital media metrics (like TV activity) as well as non-media metrics (like product pricing and competitive spending)—a situation that can quickly open the floodgates for biases.
An MTA solution may reveal that search media is highly efficient at driving sales, but what if search bids happen to be elevated during TV promotion windows (meaning TV might have been the real sales driver)? Almost all algorithms are missing some puzzle pieces. However, it’s important to think through what those missing pieces are and whether there are ways to account for them.
Recommendation: The best way to keep an eye out for missing information is to routinely consider whether there are viable alternative explanations for the outputs you are seeing. Is there another reason you might be getting the outputs you are getting, or has your algorithm really vetted all possible explanations? If no, then you are set for the time being. If yes, consider whether there are ways to adjust the algorithm to account for those alternatives.
5. The Insatiable-Monster Bias
Last but not least, some algorithms have greed written into their DNA. And some are outright monsters. What does it mean for an algorithm to be a monster? It means that certain guardrails are not in place, such that the algorithm is free to operate in a “reckless” way.
An example of the Insatiable-Monster bias is when a pricing algorithm is missing an “upper limit” guardrail; such an algorithm could end up listing a product for $800 or even $8,000,000, when it should really be about $8. Not a problem if it’s a single unit that you actually manage to sell at the higher limit. But often we assume guardrails are in place when they aren’t.
To take another example, many algorithms used by social media platforms—like the default ad optimization engine used by Facebook—have no guardrails to account for important differences in how different demographic groups behave. Analysts who have spent a lot of time with social media know that older people and women tend to click on social media advertising more frequently than other groups, regardless of the content. But Facebook’s advertising algorithms ignore these differences when optimizing advertising campaigns—resulting, toward the end of a campaign, in disproportionate targeting toward older females. Since older females click on advertising more often than other groups, the immediate results of the optimized campaign may look very good. But the reality is that they click more on everything. So the good results are illusory. When we’ve added demographic guardrails into our overall social media optimization process at RPA, we’ve seen actual business results (not just clicks) improve substantially.
It is easy to see how missing guardrails can create Insatiable Monsters. In 2015, Carnegie Mellon professor Annupam Datta and his colleagues built a tool called Ad Fisher to track how user behavior on Google influences personalized Google ads. The researchers created a series of fake accounts, all of which went to job sites and nowhere else. But some accounts listed their gender as female and some as male. The Ad Fisher team saw that when Google presumed users to be male, they were much more likely to display high-paying executive jobs. Google showed the executive job ads 1,852 times to the males, but just 318 times to the females. This wouldn’t necessarily be a problem, if there weren’t an assumption on the part of users that guardrails had been put into place—in this case, to ignore gender rather than to take it into account.
Recommendation: What are all the ways your algorithm could turn into a Monster? Are there guardrails in place to ensure that it doesn’t operate “out of bounds”? Is it failing to account for important differences in geography or product mix, for example, that could be skewing results? Is it pulling data points into the system that it shouldn’t be? Algorithms aren’t people, of course, so they don’t know right from wrong. Extra attention is therefore needed to think through whether your algorithm could be resulting in biases towards certain groups of people.
One of the huge benefits of machine learning algorithms is that they can operate autonomously, saving us the time and work of analyzing each situation anew. But the idea that you can “set it and forget it” can easily get you into hot water. Many algorithms have hidden biases in them, and many perfectly unbiased algorithms can become biased if they stay the same as the world changes around them. Keeping a vigilant (human) eye out for biases helps to ensure that your algorithms are working the way they are supposed to, and giving you the best outputs.
Bias-Watch: Questions You Should Be Asking (They Are Not All Easy!)
1. When was the algorithm built?
2. What’s changed since it was built?
3. Was it built on foundational data that is reasonably similar to the current use case?
4. What assumptions does the algorithm make about what different metrics mean?
5. How does it define the hardest-to-define concepts—like interest, engagement, and satisfaction?
6. Does it factor in, or account for, all of the relevant metrics?
7. Are there other viable alternative explanations for why you are seeing the outputs you are seeing?
8. What guardrails are in place to ensure that the algorithm doesn’t run wild in one way or another?
9. Is there a danger of the algorithm showing unwanted preference toward one group or another?
10. Finally: Does anything seem fishy, or just intuitively wrong, with the outputs you are getting?