Algorithmic Fairness

1. Introduction

The first question we might ask about algorithmic fairness is whether the term identifies a distinct moral notion that differs in important ways from the more generic (albeit contested) notion of fairness. In other words, is the concept of algorithmic fairness simply describing what constitutes fairness when machine learning algorithms are used in various contexts? The answer appears to be yes. “Algorithmic fairness” does not have a special meaning that is normatively distinctive. Rather the term refers to debates in the literature regarding when and whether the use of machine learning algorithms in different ways and in different contexts is fair, simpliciter. That said, because fairness itself is a contested notion, it is unsurprising that it is used in the algorithmic fairness literature to pick out different moral notions. For example, in the philosophical literature, “fairness” can refer to question of distributive justice and also to questions about how one person ought to relate to another. In addition, people disagree about what fairness requires in each of these contexts.

Some complaints of unfairness in the contexts of algorithms focus on how one person or group is treated as compared to how another person or group is treated. Other complaints of unfairness have a different form: a person asserts that she is treated less well in some way than she ought to be treated. Feinberg divided conceptions of justice into two types in a similar way (Feinberg 1974: 298). As he explained

justice consists in giving a person his due, but in some case one’s due is determined independently of that of other people, while in other cases, a person’s due is determinable only by reference to his relations to other persons. (Feinberg 1974: 298)

By contrast, some people may reserve “fairness” for the comparative concept and “justice” for the non-comparative concept. In U.S. constitutional law, questions about whether a person is treated as well as others are largely addressed via the “equal protection clause” and questions regarding whether a person is treated as she should be (understood in the non-comparative sense) are addressed via the “due process clause”, which might be thought to capture notions of fairness and justice respectively. Some scholars might thus contest whether the non-comparative complaints are best thought of as raising issues of “fairness” at all because these scholars understand “fairness” as an inherently comparative notion, in which what one is due can only be ascertained by reference to how others are treated. As this entry aims to canvas the debates that make up the algorithmic fairness literature, it includes a discussion of the non-comparative complaints as well, whether those are properly understood as issues of fairness or not. The important point to stress is not the terminological dispute about whether particular issues are matters of “fairness” or instead of “justice” but rather that this literature includes discussion of both comparative and non-comparative complaints.

Machine learning algorithms, their use and the data on which they rely, have spawned a significant literature. In addition, this field is new and dynamic. The algorithmic fairness literature is particularly difficult to summarize because the field is interdisciplinary and draws on work in philosophy, computer science, law, information science and other fields. As a result, the discussion below provides an overview of some of the central and emerging debates but is bound to be incomplete.

That said, one should begin by noting a few important distinctions. First, some algorithms are used to allocate or assist in allocating benefits and burdens. For example, a bank might use an algorithmic tool to determine who should be granted loans. Or a state might use an algorithmic tool to assist in determining which inmates should remain in jail and which should be released on parole. Other algorithms are used to provide information, like translation services or search engines, or to accomplish tasks, like driving or facial recognition. While each of these functions raises fairness issues, because the allocative function has spawned the most literature, this entry will devote the most attention to it.

Second, some algorithms are used to predict future events like who will repay a loan or who will commit future crime. Others are used to “predict” or report some presenting occurring fact like whether an individual currently has a disease or whether the person holding the phone is the owner of the phone. Third, the score presented by the algorithm can be binary (yes or no) or continuous.

For example, suppose a state uses an algorithmic tool to evaluate how likely a person it arrests is to commit another crime and uses this score to assist in setting bail. This algorithm predicts a future event and is used to allocate benefits and burdens (high bail, low bail, no bail). If the score it produces is binary, it will provide one of two scores (high risk or low risk, for example). If it is continuous, it will provide a number (between 0 and 1, for example) which indicates the likelihood that the scored individual will commit a future crime.

Alternatively, consider the context of a facial recognition tool used to unlock a phone. An algorithm used for this purpose is used to determine what is presently occurring: whether the person in front of the phone is the person who owns the phone. The algorithmic tool is not used to allocate benefits or burdens, rather it is used for a task, to ensure the security of the phone. The score it provides is binary (recognition, non-recognition), which in turns determine whether the phone is unlocked or not.

Interestingly, when an algorithm is used to determine a presently occurring fact, there is a truth about the matter at issue (the person holding the phone is the owner or is not). Where there is a truth to be ascertained, one can say that the algorithm scored the individual correctly or incorrectly. By contrast, when an algorithm is used to predict a future event, ascertaining its accuracy is more complex. Suppose the algorithm determines that the scored individual is likely to reoffend (meaning, say, .7 of the people so scored will reoffend). In that case, the algorithm may be accurate if 7 out of 10 of the people scored as likely to reoffend do reoffend even if this particular individual does not.

As a descriptive matter, the algorithmic fairness literature was jumpstarted by a particular real-world controversy about the use of the algorithmic tool (COMPAS). Because the COMPAS controversy played such a pivotal role in framing how latter debates evolved, it makes sense to begin by describing COMPAS and the debate surrounding it.

1.1 The COMPAS controversy

The Correctional Offender Management Profile for Alternative Sanctions (COMPAS) algorithm predicts recidivism risk and is used by many U.S. states and localities to aid officials in making decisions about bail, criminal sentencing and parole. COMPAS bases its predictions on the answers to a series of questions. Of note, COMPAS does not base its predictions on the race of the people it scores. Nonetheless, in 2016, the website ProPublica published an article alleging that COMPAS treated Black people differently than it treated white people in a way that was unfair to Black people. In particular, the Black people scored by the algorithm were far more likely than were white people also scored by the algorithm to be erroneously classified as risky. ProPublica claimed:

In forecasting who would re-offend, the algorithm made mistakes with black and white defendants at roughly the same rate but in very different ways. The formula was particularly likely to falsely flag black defendants as future criminals, wrongly labeling them this way at almost twice the rate as white defendants. White defendants were mislabeled as low risk more often than black defendants. (Angwin et al. 2016)

This passage from the ProPublica article illustrates the controversy as a disagreement about how to assess whether an algorithm treats those it scores fairly. On the one hand, a Black person and white person who were each given the same score were equally likely to recidivate. Thus, according to one measure, COMPAS treated the two groups equally. However, a Black person and a white person who did not go on to recidivate were not equally likely to be scored as low risk—the Black person was significantly more likely to have been scored as high risk than the white person. These different ways of assessing whether COMPAS is fair pick out two different measures, each of which was offered as a measure of fairness.

Following the publication of the ProPublica article, computer scientists demonstrated that except in highly unusual circumstances, it is impossible to treat two groups of people equally according to both those measures simultaneously (Kleinberg et al. 2016: 1–3; Chouldechova 2016: 5). This conclusion is often referred to as the “impossibility result” or the “Kleinberg-Chouldechova impossibility theorem”. After the publication of these papers, other scholars proposed other potential statistical measures of fairness. Yet the impossibility of satisfying all at once, and especially ones that had a family resemblance to the first two flagged in the ProPublica controversy, has endured.

The fact that this controversy has played such a pivotal role in the development of the field of algorithmic fairness has influenced the literature in multiple ways. Perhaps most importantly, it has led many to assume that mathematical formulas of some kind (even though there is disagreement about which ones) are important to assessing the fairness of algorithms. This is a controversial claim, however, with which not all scholars agree. For example, Green and Hu challenge the significance of the impossibility result for discussions of algorithmic fairness, arguing that “[l]abeling a particular incompatibility of statistics as an impossibility of fairness generally is mistaking the map for the territory” (Green & Hu 2018: 3). That said, the fact that proposed statistical measures of algorithmic fairness conflict has led to a rich debate about which ones, if any, relate to fairness, understood as a moral notion, and why.

The debate between different statistical measures, as well as the debate about whether any statistical measures are relevant, highlights the fact that the algorithmic fairness literature has a hole at its very center. It is because scholars in this field have not clarified what they understand fairness to require that they often talk past one another. This Entry will attempt to clarify the different conceptions of fairness at issue and how they track different views of algorithmic fairness that are extant in the literature.

1.2 Competing fairness measures used to assess the debate about COMPAS

The measure COMPAS used to assess the fairness of its product is often called “predictive parity” and what it requires is that those given the same score by the algorithm must have an equal likelihood of recidivating. For example, the score of high risk must be equally predictive of recidivism for Black people as for white people to satisfy predictive parity.

An alternative way to assess whether the algorithm is fair, and which ProPublica replied upon, looks at whether the false positive rate, false negative rate, or both, are the same for each of the relevant groups. The false positive rate is the number of people erroneously predicted to have the trait in question out of the total number of people predicted to have the trait. To have equal false positive rates, that number must be the same for each of the groups we are comparing. The term “equalized odds” refers to the situation where both the false positive rate and the false negative rate is the same for each of the groups being compared.

The reason that COMPAS cannot achieve both predictive parity and equalized odds is because the rate at which Black people and white people recidivate is different (the “base rate”). An algorithmic tool cannot satisfy both predictive parity and equalized odds when base rates are different unless the tool is perfectly accurate. As most predictive algorithms are imperfect and base rates of properties of interest like recidivism, loan repayment, and success on the job often differ between groups, it will be impossible to satisfy both types of measures in most contexts (Kleinberg et al. 2016: 3: Chouldechova 2016: 5).

Noting group-based differences in base rates gives rise to several important questions including the following. Is the base rate data accurate? If it is not, how does this inaccuracy affect algorithmic fairness? If the data is accurate, one might wonder what caused the observed differences between socially salient groups like racial groups. Does the causal etiology affect how we should think about algorithmic fairness? Questions about how the accuracy of data relate to fairness will be discussed in Section 4.

Predictive parity and equalized odds are the most prominent of the statistical measures of algorithmic fairness proposed in the literature, but more recently other measures have been suggested. For example, Hedden canvases eleven different statistical measures that have been proposed—though several share a family resemblance (Hedden 2021: 214–218). In particular, predictive parity has a similar motivation as “Equal Positive Predictive Value”, which requires that the “(expected) percentage of individuals predicted to be positive who are actually positive is the same for each relevant group” (Hedden 2021: 215).

There are also several measures that focus in different ways on error rates including ones that require false positive rates, false negative rates, or both to be equal, as well as those that look to the ratio of false positive to false negative rates. In addition to these two families of metrics, we also might measure the fairness of an algorithm by reference to whether the percentage of a socially salient group that is predicted to be positive (or negative) is the same for each group. This criterion resembles the U.S. legal concept of disparate impact. Some scholars have also proposed composite measures that attempt to maximize the satisfaction of measures from each family of approaches.

The controversy about which statistical measure to prioritize can be usefully understood as grounded, albeit implicitly, on assumptions about what the moral notion of fairness requires. What that notion is, however, can be difficult to unearth as scholars writing in this field often implicitly assume that something is manifestly fair or unfair without being explicit about what they understand fairness to be.

2. Fairness as treating like cases alike

On one view, what fairness requires is treating like cases alike. Views that fall within this framework see fairness as a comparative notion. This conception of fairness focuses to how one person or group is treated as compared to how another person or group is treated. Conceptions of algorithmic fairness that are not comparative in orientation are discussed in Section 3.

The view that fairness requires that like cases should be treated alike does not help to settle the COMPAS dispute however as this maxim can support both types of so-called fairness measures offered by the parties to that controversy. The treat likes alike principle (or TLA for short) could require that people who are equally likely to recidivate are treated the same, meaning that they get the same risk score (as equalized odds would require). Or alternatively, the TLA principle could require that two people with the same risk score are equally likely to recidivate (as predictive parity requires).

The TLA principle, while important, thus provides only the first organizing division among competing conceptions of fairness that are extant in the literature. While it does not provide a way to choose between the two measures proposed in the COMPAS controversy, it allows us to see what the supporters and critics of COMPAS agreed about regarding the nature of algorithmic fairness (and how they differ from the non-comparative conceptions of fairness discussed in §3).

2.1 The epistemic version of treat like cases alike

For one family of views (and the scholars who defend them), the likeness that is relevant for fairness has an epistemic cast. To be fair, a risk score derived from an algorithmic tool must “mean the same thing” for members of group X and group Y. Hedden, for example, argues that “Calibration with Groups” is the only statistical measure that is necessary for fairness in precisely these terms. Calibration within Groups, according to Hedden, requires that

[f]or each possible risk score, the (expected) percentage of individuals assigned that risk score who are actually positive is the same for each relevant group and is equal to that risk score. (Hedden 2021: 214)

The reason Calibration Within Groups is necessary for fairness, in his view, is that when it is violated “that would mean that the same risk score would have different evidential import for the two groups” which amounts, in his view, to “treating individuals differently in virtue of their differing group membership” (Hedden 2021: 225–26). In this passage, we can see that Hedden assumes that fairness requires that scores mean the same thing and that treating people differently in virtue of their group membership just is giving them risk scores with different evidential support.

Some scholars challenge whether Calibration Within Groups does in fact assure that a score will mean the same thing for each of the groups affected. Hu argues that while an algorithm may achieve calibration within groups when we focus on a particular pair of groups (blacks and whites, men and women), it will not achieve calibration across all possible groups to which the actual people belong unless it is perfectly accurate. For that reason, Hu concludes that it is false to claim that the score means the same thing, no matter the group to which a person belongs (Hu 2025).

Eva, liked Hedden, also understands fairness in epistemic terms, though he speaks about accuracy, rather than meaning. Eva proposes a statistical criterion of fairness he calls “base rate tracking” which requires that the average risk scores for each group mirror the base rates of the underlying phenomenon in each group (Eva 2022: 258). Fairness is, on this view, a matter of the score accurately reflecting the world for each of the groups at issue. Interestingly, Eva’s view not only requires that differences in the scores reflect differences in underlying base rates. In his view, differences in base rates must also be reflected in differences in scores. For Eva, this requirement rests on what he takes to be a “natural intuition: that it would be unfair to treat two groups as equally risky if one was in fact more risky than another” (Eva 2022: 262). However, this intuition may not be as universal as Eva supposes, as sometimes what fairness requires is ignoring differences between people and treating people who are different the same (Schauer 2003: chapter 7). For example, airport screening requires that everyone put their bags through a screener despite the fact that people are not equally likely to be carrying impermissible items onto the plane. In this case, treating people who pose different degrees of risk the same is seen as the embodiment of fairness and departures from this process (increased screening for high-risk individuals, for example) are seen as potentially unfair.

2.2 The equal care version of treat like cases alike

Scholars who focus on the false positive rate, false negative rate or both (i.e., equalized odds) see the fairness of an algorithmic score in terms other than evidential support and accuracy. Instead, they speak in terms of “care”, about whether and how the algorithm’s predictions affect existing social structures and tend to emphasize the connection between the algorithm itself and the decisions it aids and supports.

Babic and Johnson King, for example, argue that fairness requires that an actor care equally about each of the groups scored by the algorithm (Babic & Johnson King 2025: 116). They operationalize the notion of caring, in the algorithmic context, by reference to the ratio of false positives to false negatives and treat differences in this ratio as prima facie evidence of unfairness.

Lazar and Stone argue that a model is unfair if it is less accurate for members of disadvantaged groups than for advantaged groups and could be improved without unreasonably compromising predictions for advantaged groups (Lazar & Stone 2023: 112–15). They emphasize that when people endorse a model that works less well for the disadvantaged group and especially if this fact results from prior injustice toward members of these disadvantaged groups, the user of the algorithm (unfairly) reinforces structures of injustice in the world.

Some scholars shift the emphasis from the scores themselves to how they are used. These scholars resist a sharp delineation between assessing the fairness of the algorithm itself and assessing its fairness in the context of its likely use. To see the intuition animating this view, imagine that COMPAS is not being used to aid decisionmakers in making decisions about who will remain in prison but instead to determine who will be provided with beneficial services. The fact that the algorithm generates more false positives for Black inmates than for white inmates seems unfair to the Black inmates if the result will be incarceration but not if it will be beneficial services (Mayson 2019: 2293; Grant 2023: 101).

Because the consequences of each type of error (false positives and false negatives) differ depending the action to be taken as a result (incarceration and high bail versus beneficial services), the significance of each type of error differs depending on what will happen in response to the score. In addition, the consequences of the same type of error can be different for different groups. For example, the cost of setting bail higher than warranted may be more burdensome for an already disadvantaged social group than for one that is more privileged. Long, for example, argues that treating people fairly requires valuing the risk and cost of error equally for the relevant groups being compared. Where the costs of error for two groups differ, one should take this into account by setting the threshold for the relevant action differently for each group (Long 2022: 54–55).

2.3 Who are the likes that must be treated alike?

Scholars of algorithmic fairness also differ in their methodological approach. Some approach the problem of assessing whether an algorithm treats like cases alike by constructing hypothetical cases in which the question is asked about how Group A is treated as compared to Group B, where both groups are artificial groups constructed for the hypothetical. Others focus on real social groups like racial groups. These methodological differences likely embed assumptions about the nature of social groups. For example, Hedden builds his argument in favor of “Calibration within Groups” using an example in which the groups at issue are members of room A or room B rather than using socially salient groups. In so doing, he assumes that differences between such hypothetical groups and real social groups do not matter to ascertaining what fairness requires, a position that others contest. In other words, fairness in the view of Hedden and others, can be understood as a purely abstract notion, untethered from the real-world social situations in which algorithms and the systems that deploy them are used. To others, described below, fairness always requires assessing how an algorithmic tool interacts with a history and culture that may be marked by injustice.

Eva emphasizes that an algorithm is unlikely to be calibrated with regard to all possible groups and recognizes that we are likely to be interested only in whether a statistical measure of fairness is satisfied with regard to significant social groups like those defined by race and sex (Eva 2022). However, he also thinks that statistical measures of fairness speak to whether an algorithm is “intrinsically” fair or unfair, in a manner that is meaningful for both socially salient groups and artificial groups.

For Hedden and Eva, the fairness of algorithms (or at least their so-called “inherent fairness”) can be assessed by ignoring real world facts about actual social groups. For others, the social situation of disadvantaged groups is relevant to the inherent fairness of algorithms. For example, Grant emphasizes the fact that people who are disadvantaged often have fewer of the features that predict desirable outcomes. Living in a poor neighborhood is predictive of recidivism and failure to repay loans. As a result, individual poor people who are in fact unlikely to recidivate and likely to repay their loans will have difficulty identifying themselves as such, as they will have difficulty showing that they differ from the socially salient groups to which they belong. Grant calls this problem “evidentiary unfairness” (Grant 2023: 101).

Lazar and Stone make the case for what they term “situated predictive justice”. They argue that if a model performs less well for a disadvantaged group due to systematic background injustice,

we can acquire reasons to care about differential model performance which are dependent on those situated, contextual facts, rather than being applicable in every logically possible world. (Lazar & Stone 2023: 21)

Zimmerman and Lee-Stronach make similar arguments (2022: 6–7).

The relevance of the real world to assessments of unfairness also arises when assessing the fairness of the manner in which search engines deliver content. Delivery can be skewed due to the actions taken by others, a phenomenon termed “market effects”. For example, Lambrecht and Tucker describe how an advertisement for careers in STEM (Science, Technology, Engineering and Math) fields was shown to men more than to women, even though it was designed to be shown on a gender-neutral basis. They hypothesized that this skew resulted from the fact that the STEM ad was outbid by other advertisers who wanted to get the attention of women, who are more desirable consumers of advertising (Lambrecht & Tucker 2019: 2966–97). This result raises a question about whether skewed delivery of information, especially information about valuable opportunities, is unfair when caused by the interaction of various forces as opposed to resulting from the algorithm itself. In this case, the fact that women engage in more commercial activity than men (if that was the cause) had the effect of making women less likely to receive information about STEM careers. Kim finds this unfair and argues that the law should address disparities in access to valuable opportunities (Kim 2020: 934–35). In making this argument, Kim appears to understand fairness in terms of equality of opportunity. See the entry on equality of opportunity.

Barocas, Hardt, and Narayanan describe several examples that have a similar structure and appear to consider each to raise problems of fairness (Barocas, Hardt, & Narayanan 2023: 199). For example, they report that staples.com offered lower prices to people in certain ZIP codes which were, on average, wealthier. The reason it did so was that the algorithm provided lower prices in locations in which a competitor store was located. From a moral perspective, one might wonder whether differential treatment is unfair when the reason for it is market-based but its effect tracks protected attributes. Finding this differential pricing unfair draws on the same moral intuitions as do claims of disparate impact (or indirect) discrimination.

2.4 Are the measures constitutive or only suggestive of unfairness?

Scholars also differ with regard to whether they think their preferred measure of fairness is constitutive of fairness or unfairness or merely suggestive of fairness or unfairness. Hedden, for example, treats “calibration within groups” as necessary for fairness (Hedden 2021: 222). By contrast, Hellman argues that when the ratio of false positives to false negatives is different for different groups (which she terms “error ratio parity”), this is suggestive of unfairness only (Hellman 2020: 835–36). Setting the balance between false positives and false negatives instantiates an important normative judgment about the costs of each type of error. Is it more important to avoid incarcerating an innocent person (false positive) or to avoid allowing a dangerous criminal to go free (false negative) and by how much? Fairness requires weighing the costs of these two types of errors in the same way for Black criminal defendants as for white criminal defendants. One might be tempted to think that error ratio parity does just that, and so differences in this ratio for different groups is inherently unfair. It does not. Weighing the costs of each type of error equally for each group cannot be directly translated into the form via error ratio parity because the error ratio also depends on whether the incidence of the feature being tested for (the base rate) is equally present in each group. As a result, a lack of error ratio parity is suggestive but not constitutive of unfairness, according to Hellman.

2.5 Connection to debates about the nature of social groups

If fairness requires treating like cases alike, one must articulate in what ways two people are like (and unalike) one another. For example, if we want to assess whether Black loan applicants are treated fairly as compared to white loan applicants, we need to assess the features that make two applicants like one another in all respects except race. Suppose the Black applicant is denied a loan while the white applicant is granted a loan on the grounds that the income of the Black loan applicant is lower than the income of the white loan applicant. Should we conclude that these applicants are unlike one another, because their income is different, and thus the different treatment does not demonstrate unfairness? Or should we conclude that income is, in part, constitutive of race and thus these applicants may be like one another after all?

When scholars discuss algorithmic fairness as applied to real social groups, some scholars ignore this debate about how the group itself should be defined while others emphasize its importance.

Beigang proposes that we focus on matched pairs from the relevant groups and then assess whether predictive parity and equalized odds can be satisfied for groups that are similar in all respects other

than the trait with reference to which we are assessing fairness between groups (Beigang 2023: 179, 183). Beigang’s assumption that we can identify matched pairs of people who differ only with respect to race assumes that race can be defined in a manner that allows it to be clearly isolated from correlated traits.

Others disagree. They emphasize that race and sex-based groups are importantly different from hypothetical, abstract groups. While membership in group A can be given a stipulated definition such that a scholar can isolate a particular issue under discussion, social categories like race and sex can never be so neatly hived off from questions about what such an approach is aiming to elucidate.

For example, Hu and Kohler-Hausmann argue that algorithms are making decisions on the basis of race and sex, even when these labels are absent from the data. To see the idea, suppose an algorithm used to predict who is likely to be a successful computer programmer is unaware of the sex of the individuals in the training data set. The algorithm learns that being a math major in college is a good predictor of job success. Should we conclude that sex did not affect the algorithmic recommendation? Or should we recognize that math is a “male-y thing”, as men are steered, directly or implicitly, into math and women away from it. If male sex is constructed, at least in part, as being a person reasonably likely to major in math, then the algorithm is not unaware of sex after all (Hu & Kohler-Hausmann 2020: 7–8).

This view of protected attributes like race and sex is influenced by an understanding of race and sex as “socially constructed” (Haslanger 2012: chapter 7). For example, one could understand black race as constituted by social subordination, a likelihood of being subject to substantial police supervision, etc. If so, then removing racial labels does not make the algorithm unaware of race (Hu 2024: 13–14). Weinberger challenges this view, arguing that by focusing on the signals of race, sex, and other protected attributes, the causal inference model of discrimination can be maintained (Weinberger 2022: 1265).

The upshot of the social constructivist view of race and sex and its implications for formal measures of algorithmic fairness can be put in a stronger or weaker form. In the strong form, any so-called “fairness measure” which aims to identify whether people are treated fairly regardless of their group membership, but which defines such membership in terms of a label in the data misunderstands the nature of race and sex. As a result, any conclusions that follow from a finding of parity in a statistical measure of fairness is not meaningful for genuine fairness between real social groups scored by the algorithm.

In its more modest form, such “fairness measures” rest on a contested ontological view about the nature of race and sex. The upshot of this view is that scholars and policy makers must acknowledge and defend the particular understanding of race or sex on which their conclusions of algorithmic fairness depend. In either form, this critique suggests that statistical fairness measure may say less about whether an algorithm treats people fairly irrespective of the race or sex than their proponents imagine.

3. Non-comparative conceptions of algorithmic fairness

In this section, I examine several different ways in which complaints of algorithmic unfairness occur where that the unfairness is not a matter of examining whether a person or group is treated fairly given how another person or group is treated. Rather, the complaint is that person or group is treated less well (in some manner) than they ought to be treated. The sections that follow describe several such concerns, though this is not an exhaustive list. First, the complaint might point to the fact that the algorithm is inaccurate, or less accurate that it could or should be (§3.1). Second, the complaint might object that a judgment about an individual is impermissibly based on features of the group to which the individual belongs (§3.2). Third, perhaps the use of algorithms is unfair because machines rather than humans are making consequential decisions that affect people (§3.3). Lastly, the incomprehensibility of complex algorithmic systems may be unfair to those affected by them, leading to demands for explainable AI (§3.4).

3.1 Unfairness as inaccuracy or arbitrariness

Algorithms may be unfair to individuals because they are inaccurate. This charge of inaccuracy can be understood in two different ways. The algorithm as a whole may not do a good (or good enough) job of producing the outcomes it is designed to achieve. In such a case, the people affected by this inaccurate algorithm may claim that being judged by an inaccurate algorithm is unfair. Alternatively, the algorithm, though accurate in general, may mischaracterize a particular individual. This occurs because even a very accurate algorithm is not perfect and so will mischaracterize or predict inaccurately for some subset of individuals. When a generally accurate algorithm makes a mistake about a particular person, that person inaccurately assessed by the algorithm may assert that she has been treated unfairly.

Inaccuracy in algorithmic predictions or assessment can be an “artifact” of the data on which the algorithm is trained. For example, suppose a bank uses an algorithmic tool to determine to whom loans should be awarded. Further suppose the data on which an algorithm is trained just happens to contain several examples in which all borrowers named “Jamila” defaulted on their loans. In such a case, the system might learn that being named “Jamila” predicts failure to repay even though there is no reason to think that having the name “Jamila” has anything to do with repaying loans. If a new Jamila scored by the algorithm is actually a good risk, one might think that the use of the algorithm trained on this data is unfair because it is inaccurate (or because it is inaccurate and there is no plausible causal story about why having this name relates to loan repayment).

Of course, there may well be an explanation for why having the name Jamila is predictive of loan repayment. The name may be correlated with race, and race may be correlated with loan repayment. If so, then this example raises the issue of when and why non-sensitive attributes should be treated as proxies for race or other protected attributes (see §5)

Assuming the correlation between the name and failure to repay does not correlate with sensitive attributes and thus is truly arbitrary, Creel and Hellman argue that this arbitrariness itself is not of moral concern (Creel & Hellman 2022: 2). However, they argue, arbitrariness can become of moral concern when it leads to wide-spread exclusion from important opportunities. For example, if the algorithmic tool used by Jamila’s bank is also used by all other lenders, then Jamila will be denied not only this loan but all opportunities to borrow money. This systemic exclusion may be of moral concern and that moral concern could be characterized as a form of unfairness. Systemic exclusion can occur even if different lenders use different algorithmic products to predict loan repayment if they were all trained on the same data set with the same data anomaly. Note, however, that on this account the unfairness resides not in the inaccuracy but in the systemic exclusion from an important opportunity.

3.2 Unfairness as related to stereotyping and generalization

Another complaint about algorithms is that they work by making assessments about individuals on the basis of group-based statistical generalizations. This complaint ties the unfairness of algorithms to familiar debates about when and whether such group to individual inferences are morally permissible. Schauer, Eidelson and others discuss these issues outside of the algorithmic context (Schauer 2003: chapters 2–3, 7; Eidelson 2020: 1635–36). Debates about the moral permissibility of racial profiling, the wrongs of stereotyping, the permissibility of reaching legal judgments on the basis of statistical evidence and legal requirements of treating people “as individuals” all raise similar moral issues. When algorithms are used to predict crime or recidivism in particular, concerns about whether statistical information, on which such tools rely, is the right sort of information on which to base this decision may raise special moral issues (Grant, Behrends, & Basl 2025: 63–64).

It is unclear whether fairness requires that judgments about individuals or actions that affect individuals prescind from reliance on statistical generalizations about the groups to which these individuals belong. Some argue that fairness requires that one refrain from making judgments about an individual on the basis of group membership, or membership in certain groups (as avoiding statistical reasoning altogether may not be possible). Others argue that decisionmakers should supplement coarse-grained statistical information with more fine-grained statistical information, thus narrowing the groups on the basis of which inferences about individuals are based. Still others contend that fairness permits the use of statistical generalization about membership in socially salient groups (like race and sex) so long as one also pays attention to how the exercise of individual autonomy distinguishes a particular individual from the group (Eidelson 2020: 1635–39). In contrast, others argues that sometimes the value of equality requires ignoring individual differences and treating everyone the same (Schauer 2003: 258–59; Lippert-Rasmussen 2011: 57–58).

Issues about reliance on group-based generalizations can arise when an algorithm is used to aid in the allocation of burdens or benefits. They also arise in debates about search engines, which provide information to users. There are several related and important fairness concerns in the information-provision setting. First, the use of machine-learning algorithms raises an issue that is the inverse of the generalization problem just discussed: personalization. When algorithms personalize, targeting information to ever smaller, more “intersectional” groups, this raises worries about the segmentation of information in our increasingly polarized environment and gives rise to privacy concerns because the algorithm seems to know what each person likes and dislikes. In addition, when the results of searches conform to negative stereotypes about a group, the harm they cause also may be a form of unfairness. If so, one could go on to ask whether this unfairness is attributable to the harm such results cause, reinforcing negative stereotypes. Alternatively, perhaps these results are unfair because they are themselves a form of stereotyping that is morally objectionable regardless of the harm that results.

For example, Noble finds that search results for terms like “Black girls” generate results that consist largely of pornography, which seems unfair as it may harm Black women and girls. In addition, the result may be a kind of comparative unfairness as well if the same result does not occur for white women and girls (Noble 2018: chapter 2) or for men of any race. Sweeney finds that searches for Black-sounding first names were 25% more likely to return a result related to an arrest record than were searches of white-sounding first names (Sweeney 2013: 51). Selbst and Barocas describe this problem as “representational harm[]” (2023: 1036). These scholars see these results as unfair even if a higher percentage of Black women are involved in pornography compared to non-Black women (Noble 2018: chapter 2).

3.3 Fairness requires a human decisionmaker

Much like algorithms, human beings frequently rely on group-based generalizations to make judgments about individuals and do so both accurately and inaccurately. Indeed, some scholars argue that algorithmic decisions promote fairness because they reduce bias (Lobel 2022: introduction; Sunstein 2022: 1203–05). Albright has found that when judges have discretion about whether to follow or not follow bail recommendations based on risk scores, they are more likely to deviate from these recommendations for Black defendants than for white defendants by imposing cash bail when it was not recommended (Albright 2019: 4).

If basing decisions on the results of a machine learning algorithm is nonetheless problematic, perhaps this is because fairness requires a human decisionmaker or does so in certain contexts or with regard to certain kinds of decisions.

The challenge is to articulate why that is so. For one, it might matter because machines differ from human beings in several ways including that they are not constrained by moral duties (Eggert 2025: 4–5). In addition, machines lack what Eggert terms “moral receptiveness”, that is, the ability to appreciate the moral dimension of their actions.

Second, automation by algorithm may be problematic in contexts in which a certain type of process is normatively required, including the opportunity to challenge the results and be heard (Citron 2008: 1283). In addition, when algorithms replace experts, the justification for allocating the decision to the expert in the first instance may be undermined. Calo and Citron argue, for example, that use of algorithms by administrative agencies, without specific safeguards, may undermine the legal justifications for allocating decisions to agency judgment in the first instance (Calo & Citron 2021: 844–45). However, if the algorithmic decision is more accurate, perhaps it provides the requisite expertise.

3.4 Unfairness as inscrutability

Algorithmic fairness may require an explanation of how the algorithm reached the result that it did. Where the processes are complex, additional questions arise about what such an explanation should consist of. It could require information about the data on which the algorithm was trained or audits that demonstrate the accuracy of the system’s results. Alternatively, the demand for explanation could require a detailed account of the factors that were relevant or dominant in reaching the particular result that the algorithm reached. If fairness requires the latter, additional questions arise about how to structure such an explanation. In particular, does fairness require that the individual affected be provided with information about the factors that weighed the most heavily in the algorithmic determination about that individual’s case? Or does it require that the individual be told what she might change about herself or her circumstances to get a different result? Or both?

One might wonder what the relationship is between the demand for explanation and concerns of fairness. To some, the explanation requirement derives from legal norms of “due process”, which could be glossed morally as fair process (Citron 2008: 1276–77). On this view, in order for an individual who is scored by an algorithm to be treated fairly, the individual must have access to some information about how that result is reached.

One can understand the explanation requirement instrumentally or non-instrumentally. If fairness requires that an individual be able to understand the factors that affected the score the algorithm reached so that the individual can identify errors that might have been made and correct them, then the explanation requirement is instrumental. On this account, explanation helps an individual to detect and correct error. To the extent that explanation is required for fairness, then, this is because fairness relates to the accuracy of the score. Alternatively, explanation may be required for non-instrumental reasons. Perhaps treating others with appropriate respect or consideration requires that the scored individual have access to the factors that affected the algorithmic score (Grant, Behrends, & Basl 2025: 56–58). This gloss on the explanation requirement seems especially plausible in instances where the algorithm is used to make, or assist others in making, consequential decisions.

Some scholars see both instrumental and noninstrumental reasons that explanations must be provided. For example, Vredenburgh grounds the right to an explanation on what she terms “informed self-advocacy” which requires that a person have the ability to determine how to act given how decisions will be made, and that she knows how decisions were reached so that she can correct errors (Vredenburgh 2022: 212–13). For Jorgensen, individuals are entitled to know the factors that may affect how an algorithm will score them so that they can make informed and autonomous choices about how to act in the world. In addition, Jorgensen argues that explanation is especially important in the context of criminal law because legitimate law must be genuinely public (Jorgensen 2022: 64).

Each of these four versions of the non-comparative complaint of unfairness asserts that the use of algorithms in the particular context treats a person less well than the person ought to be treated. This formulation raises a question about how to think about flawed processes that are nonetheless an improvement on human judgment and decision-making. After all, human judgment is often inaccurate and relies on group-based generalizations. Johnson, for example, argues that because both humans and machine learning algorithms rely on induction, both are unavoidably subject to bias (Johnson 2021: 9951) and stresses the similarities between human and machine bias (Johnson 2024: 9–10). In addition, the workings of the human mind can be the ultimate “black box” (Selmi 2021: 632–33). Whether we should demand better explanations from AI systems than from human beings is another subject of dispute (compare Zerilli et al. 2019 and Günther & Kasirzadeh 2022).

4. Problems with the data

Claims of algorithmic fairness or unfairness sometimes focus particularly on the data on which algorithms are trained. We can divide data related issues into two broad categories: those where inaccurate or nonrepresentative data lead to unfairness (§4.1, §4.2) and those where the data, while accurate, nonetheless produces or reveals unfairness (Hellman 2024a: 80–81) (§4.3).

4.1 Measurement error

A common data collection problem is measurement error, which has several forms. First, measurement error occurs when the phenomenon one really cares about is difficult to measure. As a result, the researcher may collect data on something else that is easier to measure and thought to be closely correlated with the desired attribute. For example, a law school might want to admit students with an aptitude to learn complex and subtle legal material (a facility we might call “legal ability”). The problem, however, is that it is difficult to know who has legal ability until after the student arrives at law school. Thus, a law school admissions process might rely instead, in part, on the Law School Admissions Test (LSAT) which purports to measure legal ability. There is clearly a gap between the actual trait the law school seeks—legal ability—and LSAT scores; the traits are not the same, even if they are correlated. This gap is termed “measurement error”.

Second, measurement error can occur when the data are collected from a nonrepresentative subset of the population it purports to represent. For example, if a facial recognition tool is created using only images of light skinned faces, it may fail to work for dark skinned faces. Relatedly, perhaps the data are representative of the population but still do not include sufficient minority group members to be reliable for that subgroup. Fairness issues related to representation will be addressed in Section 4.2 below.

Measurement error is ubiquitous and often unavoidable. Of special concern morally, however, are skewed measurement errors. Consider, for example, an employer who desires to hire reliable employees. Reliability is difficult to measure directly, so the employer bases her judgment about whether job applicants will be reliable on the recommendations the applicants receive from their prior employers. Suppose the prior employers are biased, consciously or unconsciously, such that they are more likely to see their female employees as unreliable when they miss work occasionally than they are to reach that judgment about male employees who miss work equally often. If so, hiring based on reliability will import that bias into an algorithm through reliance on the recommendations of prior employers. When the measurement error is larger for one group than for another, the algorithm developed from this data will incorporate that bias. In a twist on the empiricist’s maxim, “Garbage in, Garbage out”, one might describe this problem as “Bias in, Bias out” (Mayson 2019: 2224).

While it is likely impossible to completely avoid biased data, its role can be minimized by choosing features that are less subject to bias. For this reason, Mayson and others object to the use of arrest records in recidivism risk assessment tools (Mayson 2017: 556). Prior criminal activity is predictive of reoffending. However, prior criminal activity itself is difficult to measure because some criminal activity is not detected. So, researchers often collect data on arrests instead. Arrests, however, are a product of both criminal activity and policing practices. If some groups are policed more heavily than others, the gap between arrests (which are measured) and criminal activity (which is not) is likely to be biased against these groups. For that reason, one might substitute arrests for violent crime (rather than arrests for any crime) because data about arrests for violent crime are thought to be less subject to bias.

4.2 Fair representation in training data

Data that is not representative of the relevant population can raise distinct fairness issues. For example, an algorithmic tool that is developed based on nonrepresentative data may work less well for some groups than for others. Some contend that this is a form of unfairness, as a product or service that is developed from this data and paid for by consumers works better for members of some groups than for others (Selbst & Barocas 2023: 1024). In addition, when consequential decisions are based on unreliable tools whose unreliability results from data inadequacies of this sort, serious harm may result. For example, Gebru and Buolamwini demonstrated that facial recognition works less well in identifying women than men and in identifying darker skinned faces than lighter skinned faces. In addition, it works especially poorly for dark-skinned women (Buolamwini & Gebru 2018: 10). Gebru and Buolamwini trace this disparity in reliability to the fact that the data on which the dominant three facial recognition tools were trained contained too few women and too few darker-skinned people. While this particular disparity has decreased over time, the same phenomenon occurs in other contexts.

The concept of fair representation can be understood in two distinct ways. First, fairness may be understood to require that the data be representative of the relevant population. Second, fairness may require that the data contain enough examples of all groups to be equally reliable for each group. These two demands are not equivalent. If a socially salient group is a minority group, a training data set could contain a representative sample but nonetheless contains too few examples of the minority group to produce reliable (or equally reliable) results. This is especially likely to occur in the context of intersectional classes like dark-skinned women. If algorithmic fairness is understood to require equal accuracy of the algorithmic tool, then having data that matches the population will be insufficient. It must also have sufficient minority group members to produce equally accurate results for these groups.

In addition, one might wonder if it matters why the data is less reliable for one group than another. For example, suppose the training data for the facial recognition tool did contain equal numbers of women and men yet was less accurate in predicting female sex, nonetheless. In the actual case dealing with facial recognition, some researchers have traced the disparity in effectiveness in facial recognition tools to sources other than a lack of representatives. In particular, researchers found that it was attributable to the use of celebrities in the training data. Because female celebrities tend to wear more makeup than female non-celebrities, the algorithm learned that makeup use was a predictive of female sex, which was less accurate in the context of non-celebrities than celebrities. Because the disparity in make-up use for men between celebrities and non-celebrities was less pronounced, the tool worked better for men. One might wonder whether this explanation makes the disparity in how well the tool worked less unfair than if it were traced to disparities in representation.

When data are flawed for any of these reasons, one can ask the further question whether this departure from good data practices should be seen simply as an error or instead as a form of unfairness. Perhaps the answer to that question depends on why it occurred. For example, should we see the error in the facial recognition tool as a simple oversight or instead see it as tied in important ways to differing beauty standards for women versus men that result from injustice. Or perhaps the answer depends on whether the consequences for those affected are serious or trivial. In other words, inaccuracies are surely problematic from an epistemic perspective. But whether they are also a problem of unfairness is an unsettled question.

4.3 Accurate data about an unjust world

Even if the data on which an algorithm is trained do not suffer from skewed measurement error, one still may worry about the fairness of an algorithm that relies on these data if injustice led to differences in traits that the algorithm uses to make predictions. For example, the average Black person has less wealth and a lower income than the average white person (see Irving 2023 in Other Internet Resources). As a result, an algorithm used by a bank to assist decision making about who should be offered loans may score the average Black person less well than the average white person, even if it has no access to the race of the prospective borrowers. If the fact that the average Black person has less wealth and income than the average white person is itself the result of prior discrimination and injustice, then when the bank acts on the basis of these data, it may wrong individuals by compounding that prior injustice (Wachter et al. 2021: 759; Hellman 2024a: 63–64). Friedman and Nissenbaum identify three types of bias in computational systems, including pre-existing bias, which “has its roots in social institutions, practices and attitudes”, technical bias, which arises from technical constraints, and “emergent bias” which arises from the use of the system in the real world (Friedman & Nissenbaum 1996: 332). Friedman and Nissenbaum’s conception of pre-existing bias has some similarities to the concepts of structural injustice (Lin & Chen 2022: 2–4) and “compounding injustice” (Hellman 2024a: 67–68).

The fact that an action will compound prior injustice provides a moral reason not to do that action. For example, Hellman argues that an actor has a moral reason, when interacting with victims of injustice, not to carry that injustice forward or into other domains. While this reason can be outweighed by other reasons, it counts in the balance of reasons that determine how one should act. Others think that we have reasons to care about the fact that social and economic inequality tracks socially salient traits like race because this “patterned inequality” causes harm. For example, Eidelson argues that banks and other actors should care about whether the use of algorithmic tools perpetuates this social stratification for consequentialist reasons (Eidelson 2021: 253–56).

The concern that algorithms based on data about the past may compound injustice or perpetuate patterned inequality is not novel or unique to the algorithmic context. Any decision based on data about the past risks compounding injustice or causing the harm of patterned inequality because injustice has affected people in meaningful ways. That said, because machine learnings algorithms are able to make use of so much data (so-called “Big Data”), the moral problems of compounding injustice and patterned inequality may occur at a larger and more comprehensive scale. In addition, the computational power now available through machine learning algorithms allows these tools to detect patterns that prior methods would have missed. If scale matters to fairness, then machine learning algorithms may present more acute moral problems than did older methods of predicting the future based on the past.

Even if these moral concerns are not new, the use of algorithmic tools illustrates the consequences of prior injustice in a dramatic fashion and so might shine a light on a moral concern that was previously less salient. When an algorithm used to assess who will repay a loan does not have access to data regarding the race of prospective borrowers but nonetheless recommends lending to members of racial minority groups at a significantly lower rate than to members of majority groups, the effect of prior injustice is made especially clear. In this sense, algorithmic tools may act as a mirror (Vallor 2024), allowing society to see itself more clearly, and also as a catalyst to reexamine existing legal doctrine in reaction to the fact that current law tolerates significant disparate negative impact on disadvantaged groups.

Algorithms based on accurate data can thus present problems of algorithmic fairness to the extent that they compound prior injustice or reinforce patterned inequality. At the same time, they may help to ameliorate unfairness by making ever present unfairness salient.

5. Proxies

5.1 The problem of proxies

The use of some attributes—race and sex paradigmatically—as proxies for legitimate target variables like loan repayment or recidivism is both legally prohibited in most contexts and generally viewed as morally problematic. However, because race, sex, and other protected attributes often are predictive of outcomes of interest (loan repayment, recidivism, health, etc.), policies that prohibit the use of a defined list of protected attributes may have limited value in the context of machine learning. The algorithm will unearth other traits that it can use instead to arrive at much the same results as it would have had it been able to use these prohibited attributes directly (Dwork et al. 2012: 215; Johnson 2021: 9942; Prince & Schwarcz 2020: 1283; Sapiezynski et al. 2022: 5–7). This issue is discussed in the literature as the proxy problem (Johnson 2021: 9942).

Consider the following example. An algorithm used by a lender is unaware of the race of the borrowers on whose data it was trained. In other words, the training data contains to racial labels and often the names of the individuals are removed as well. Yet, the algorithm learns that the ZIP code of the borrower’s residence is a strong predictor of loan repayment. Due to housing segregation in society, a borrower’s ZIP code is also strongly correlated with race. If the algorithm is permitted to use ZIP code as a factor in predicting loan repayment, the algorithm will have the effect of excluding a disproportionate number of Black borrowers, perhaps closely resembling what would occur had the algorithm used race to predict loan repayment in the first instance.

This fact may undermine the legal and moral prohibitions on the use of race, sex and other protected attributes themselves when such traits are correlated with legitimate outcomes of interest. If so, should such proxies also be prohibited? And if so, how should such “proxies” be defined?

The proxy problem arises because the law forbids (or society or morality frowns on) the use of som

Stay Informed

Get the best articles every day for FREE. Cancel anytime.