Home » Practicing Principles » Modern Causal Inference » Augmenting » Books, and papers » Using Causal Reasoning To Guide Algorithms Toward a Fairer World

Using Causal Reasoning To Guide Algorithms Toward a Fairer World

John C. Malone Assistant Professor of Computer Science, Johns Hopkins University

Researcher, Johns Hopkins University


Learning algorithms, which are becoming an increasingly ubiquitous part of our lives, do precisely what they are designed to do: find patterns in the data they are given. The problem, however, is that even if patterns in the data were to lead to very accurate predictions, the processes by which the data is collected, relevant variables are defined, and hypotheses are formulated may depend on structural unfairness found in our society. Algorithms based on such data may well serve to introduce or perpetuate a variety of discriminatory biases, and thereby maintain the cycle of injustice.

For instance, it is well known that statistical models predict recidivism at higher rates among certain minorities in the United States [1]. To what extent are these predictions discriminatory? What is a sensible framework for thinking about these issues? A growing community of experts with a variety of perspectives is now addressing issues like this, in part by defining and analyzing data science problems through the lens of fairness and transparency, and proposing ways to mitigate harmful effects of algorithmic bias [2-7].

Standard algorithms, both in statistics and machine learning, work by identifying patterns in data to learn relationships between specific features and outcomes—such that outcomes can be predicted using those features, even if an instance is previously unseen. These patterns are used to link prior criminal history to the likelihood of recidivism in the context of sentencing and parole hearings, prior financial history to the likelihood of defaulting on a personal loan, and in resume screening to inform which candidates are invited to interview. Naturally, variable definitions and data collection practice may result in these links being built in an inappropriate way that serves to perpetuate structural injustice in our society. As an example, in criminal justice, the outcome in predicting recidivism rates is often defined as a subsequent arrest rather than a subsequent conviction. In addition, typical definitions of recidivism and prior criminal history, especially among certain vulnerable populations, often include instances of arbitrary arrest and detention. If it is true that in the United States some minorities have more frequent police encounters (because some neighborhoods are more policed) or are more likely to be arrested (to say nothing of guilt or conviction), then the data will reflect this fact; there will be an imbalance in recidivism rates along racial or ethnic lines. Learning algorithms will pick up on this pattern, of course, but will unable to tell, without appropriate adjustments, whether this pattern is there for a legitimate reason.


Similarly, features such as prior compensation and employment history in resume screenings may differ systematically across ethnic groups and genders. There is nothing in the data to indicate whether heterogeneity of this type may be due to unfair differences in treatment of these groups. For example, a number of structural features of modern society may disincentivize women from arguing for appropriate compensation, career promotion, and fair evaluation of job performance relative to equivalent male peers in certain employment contexts. In other types of employment, it is more difficult to get hired as a member of certain minority groups.

Since, unsurprisingly, learning algorithms that use unfair data can lead to biased or unfair conclusions, two questions immediately suggest themselves. First, what does it mean for a world and data that comes from this world to be fair? And second, if data is indeed unfair, what adjustments must be made to learning algorithms that use this data as input to produce fairer outputs?

A common approach in the fairness and transparency literature is to define a fairness criterion, ideally motivated by legal or ethical intuition or principle, and develop a method for solving a data science problem in such a way that this criterion is satisfied, even if the only sources of data for the problem suffer from discriminatory bias.

Bias in Data Analysis, Causal Inference, and Missing Data

Adjusting for biases in data for the purposes of learning and inference is a well-studied issue in statistics. Take, for example, the perennial problem of missing data when trying to analyze polls. If we wish to determine who will win in a presidential election, we could ask a subset of people for whom they will vote, and try to extrapolate these findings to the general population. The difficulty arises, however, when one group of a candidate’s supporters are systematically underrepresented in polls (perhaps because those supporters are less likely to trust mainstream polling). This creates what is known as selection bias in the data available from the poll. If not carefully addressed in statistical analysis, selection bias may easily yield misleading inferences about the voting patterns of the general population. Common methods for adjustment for systematically missing data include imputation methods and weighting methods. Imputation aims to learn a model that can correctly predict missing entries, even if missing due to complex patterns, and then “fill in” those entries using the model. Weighting aims to find a weight that quantifies how “typical’’ a fully observed element in the data is, and then use these weights to adjust parts of the data that are fully observed to more closely approximate the true underlying population.

Alternatively, causal inference problems occur when we want to use data to guide decision-making. For example, should we eat less fat or fewer carbohydrates? Should we cut or raise taxes? Should we give a particular type of medication to a particular type of patient? Answering questions of this type introduces analytic challenges not present in outcome prediction tasks. Consider the problem of using data collected using electronic health records to determine which medications to assign to which patients. The difficulty is that medications within the hospital are assigned with the aim of maximizing patient welfare. In particular, patients who are sicker are more likely to get certain types of medication, and sicker patients are also more likely to develop adverse health outcomes or even die. In other words, an observed correlation between receiving a particular type of medication and an adverse health event may be due to the poor choice of medication, or it may be due to a spurious correlation due to underlying health status. These sorts of correlations are so common in data, that they led to a common refrain in statistics: correlation does not imply causation.

Teasing out causality from data often involves obtaining data in a carefully controlled and impartial way. An early example of careful data collection for establishing causality was done by James Lindt [8]. Lindt was a physician in the Royal Navy when scurvy among sailors was both a public health crisis and an issue of pressing strategic importance for the British Empire, reliant as it was on naval dominance over its rivals. Lindt organized what came later to be viewed as one of the first documented instances of a clinical trial. He arranged twelve sailors into six pairs, and gave each pair one of six scurvy treatments thought at the time to be potentially effective. Of the treatments, only citrus was effective, which ultimately led to citrus products being issued on all Royal Navy ships.

A key feature of Lindt’s trial was that the sailors he chose were, as he later wrote, “as similar as he could make them,” were kept in the same place on the ship, fed the same diet, and so on. Because Lindt was careful to control all inputs to his experiment in this way, varying only the treatments sailors received, he was able to attribute any variation in the outcome to the treatment that was given. Causal validity of trials of this type is still used today for determining effects of medical treatments via randomized, controlled experiments, where membership in treatment groups is determined by an impartial coin flip, rather than by individual characteristics. Since data obtained in settings that aren’t rigorously controlled may suffer from spurious correlations and other sources of bias, it is not used directly to draw inferences about treatment efficacy.

Instead, data is adjusted by a variety of causal inference methods to more closely approximate what would have happened in an experiment that resembles the data actually observed, but where assignment to treatment group was impartially assigned by a coin flip. In this way, a world where treatment assignment is biased, for example data from a hospital ship where only those sailors suffering from scurvy are given citrus, is made to more closely resemble Lindt’s experiment where similar looking sailors receive one of a variety of treatments, including citrus, essentially at random.

What Is Fairness?

Causal inference methods thus adjust data gathered in the wild from an “uncontrolled world” to more closely resemble data that would have been collected from a hypothetical “controlled world,” where causally valid inferences may be drawn. Similarly, data gathered in the wild from an “unfair world” that suffers from various forms of biases due to discrimination and disparate impacts may be adjusted to more closely resemble data obtained from a hypothetical “fair world.” What would such a fair world look like?

Issues of fairness and justice have occupied the ethical, legal, and political literature for centuries. While many general principles are known, such as fairness-as-proportionality, just compensation, and social equality, general definitions have proven elusive. Indeed, a general definition may not be possible since notions of fairness are ultimately rooted in either ethical principle or ethical intuition, and both principles and intuitions may conflict. For example, in deciding the fairness of a particular affirmative-action policy in university admissions, intuitions about proportionality (applicants with higher scores should be more likely to be admitted) may conflict with intuitions about just compensation (applicants that were systematically disadvantaged must be compensated for this injustice in some way).

The community within statistics and machine learning that works on issues of fairness in data analysis have taken a variety of approaches to defining fairness formally, with the aim of ultimately ensuring that learning algorithms are fair. My recent work in this area is based on intuitions underscoring the so-called “resume name swapping experiments.” These experiments aim to expose mechanisms by which a sensitive feature such as race or gender determines a hiring decision, with the aim of understanding how such mechanisms operate, and eventually changing them to ensure fair hiring practices [9].

Consider an algorithm that tries to map job applicant features to the outcome of asking them in for an interview. One could imagine a variety of characteristics that are both relevant for the job and correlated with a potentially sensitive feature, such as gender or ethnicity. For example, a physical fitness test, which may be passed at greater rates by men compared to women, may be reasonable to administer for a fire department, but not for an accounting firm. However, the intuition in these situations is that using gender itself in the hiring or interview decision is inappropriate.

This intuition underscores the validity of experiments that create a set of resumes for evaluation, where a resume is either unaltered, or unaltered except for the name of the candidate—which might be changed from a Caucasian-sounding name to an African-American sounding name, or from a male name to a female name. If only the resume is seen, the name acts as a proxy for the direct influence of the sensitive feature. These experiments attempt to operationalize the central feature of discrimination cases: that the outcome changes when the sensitive feature does, even as everything else—including other features potentially related to the sensitive feature—had stayed the same [10].

A branch of causal inference called mediation analysis aims to quantify the extent to which a particular feature, such as gender or race, influences the outcome directly or indirectly. While the “fair world” may possess a number of desirable properties that are difficult to fully enumerate, one property reasonable to expect is that it be sensitive-feature-blind, in the sense that a sensitive feature itself should not directly affect decision-making. Once we identify and quantify a feature that is present in the “unfair world” and absent in the “fair world,” we may make adjustments to the data to have it resemble the world we observe, but which lacks the “unfair” feature. Once these adjustments are made, learning algorithms can be trained as usual. In a recent paper presented at TK, my co-author Razieh Nabi and I used methods from causal inference and semi-parametric statistics to do precisely this, specifically showing how learning algorithms may be trained on a version of the data closely related to the observed data, but where undesirable direct effects of the sensitive feature on the outcome are removed [11].

Moving Forward

Learning algorithms have the potential to create enormous positive impact in people’s lives, in criminal justice, employment, finance, healthcare, and many other areas. Before this potential can be realized, however, learning algorithms must be made not only accurate but fair. In particular, data analysis tasks must be approached in a way that acknowledges and corrects for the fact that our data is a reflection of our world, and our world is unfortunately often unfair. Making learning algorithms fair entails defining what features of the world make it unfair, identifying these features with undesirable patterns in the data, and finding ways to make adjustments such that data no longer has these undesirable patterns. While a number of approaches have been tried in the algorithmic fairness community, a growing consensus is emerging that many fairness notions are causal and counterfactual in nature. The approach, taken by myself and my collaborators [12], for example, identifies unfairness with the presence of a direct influence of a sensitive feature, such as gender or ethnicity, on the outcome, and provides methods for removing this source of unfairness from the data as a step to be performed prior to applying learning algorithms.

This work considered features whose unfairness rests on a broad ethical consensus. In general, however, there is disagreement on what is fair and unfair. This disagreement is rooted in political, and ultimately ethical disagreement among human beings. Real progress on algorithmic fairness entails not only formalizing human ethical consensus on what is fair and unfair, but explicitly considering, within the marketplace of ideas, differences in our ethical commitments as well.


1. Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner, “Machine Bias,” ProPublica, May 23, 2016,

2. Dino Pedreshi, Salvatore Ruggieri, and Franco Turini, “Discrimination-Aware Data Mining,” in Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, Nevada, August 24–27, 2008, 560–568.

3. Michael Feldman et al., “Certifying and Removing Disparate Impact,” in Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2015, 259–268.

4. Moritz Hardt, Eric Price, and Nati Srebro, “Equality of Opportunity in Supervised Learning,” Cornell University Library, arxiv preprint, October 7, 2016,

5. Faisal Kamiran, Indre Zliobaite, and Toon Calders, “Quantifying Explainable Discrimination and Removing Illegal Discrimination in Automated Decision Making,” Knowledge and Information Systems 35, no. 3 (2013): 613–644.

6. Sam Corbe-Davies et al., “Algorithmic Decision Making and the Cost of Fairness,” Cornell University Library, arxiv preprint, January 28, 2017,

7. Shahin Jabbari et al., “Fair Learning in Markovian Environments,” Cornell University Library, arxiv preprint, November 9, 2016,

8. James Lind, A Treatise of the Scurvy in Three Parts (1753).

9. Marianne Bertrand and Sendhil Mullainathan, “Are Emily and Greg More Employable Than Lakisha and Jamal? A Field Experiment on Labor Market Discrimination,” American Economic Review 94, no. 4 (September 2004): 991–1013.

10. Carson v. Bethlehem Steel Corp., US 7th Circuit (1996).

11. Razieh Nabi and Ilya Shpitser, “Fair Inference on Outcomes,” in proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, AAAI Press, February 2–7, 2018.

12. Ibid.