The Annals of Applied Statistics

The problem of infra-marginality in outcome tests for discrimination

Camelia Simoiu, Sam Corbett-Davies, and Sharad Goel

Full-text: Open access


Outcome tests are a popular method for detecting bias in lending, hiring, and policing decisions. These tests operate by comparing the success rate of decisions across groups. For example, if loans made to minority applicants are observed to be repaid more often than loans made to whites, it suggests that only exceptionally qualified minorities are granted loans, indicating discrimination. Outcome tests, however, are known to suffer from the problem of infra-marginality: even absent discrimination, the repayment rates for minority and white loan recipients might differ if the two groups have different risk distributions. Thus, at least in theory, outcome tests can fail to accurately detect discrimination. We develop a new statistical test of discrimination—the threshold test—that mitigates the problem of infra-marginality by jointly estimating decision thresholds and risk distributions. Applying our test to a dataset of 4.5 million police stops in North Carolina, we find that the problem of infra-marginality is more than a theoretical possibility, and can cause the outcome test to yield misleading results in practice.

Article information

Ann. Appl. Stat., Volume 11, Number 3 (2017), 1193-1216.

Received: January 2017
Revised: April 2017
First available in Project Euclid: 5 October 2017

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Tests for discrimination outcome test benchmark test infra-marginality traffic stops policing


Simoiu, Camelia; Corbett-Davies, Sam; Goel, Sharad. The problem of infra-marginality in outcome tests for discrimination. Ann. Appl. Stat. 11 (2017), no. 3, 1193--1216. doi:10.1214/17-AOAS1058.

Export citation


  • Alpert, G. P., Smith, M. R. and Dunham, R. G. (2004). Toward a better benchmark: Assessing the utility of not-at-fault traffic crash data in racial profiling research. Justice Res. Policy 6 43–69.
  • Antonovics, K. and Knight, B. G. (2009). A new look at racial profiling: Evidence from the Boston police department. Rev. Econ. Stat. 91 163–177.
  • Anwar, S. and Fang, H. (2006). An alternative test of racial prejudice in motor vehicle searches: Theory and evidence. Am. Econ. Rev. 96 127–151.
  • Arrow, K. (1973). The theory of discrimination. In Discrimination in Labor Markets Princeton Univ. Press, Princeton.
  • Ayres, I. (2002). Outcome tests of racial disparities in police practices. Justice Res. Policy 4 131–142.
  • Becker, G. S. (1957). The Economics of Discrimination. Univ. Chicago Press, Chicago, IL.
  • Becker, G. S. (1993). Nobel lecture: The economic way of looking at behavior. J. Polit. Econ. 101 385–409.
  • Carpenter, B., Gelman, A., Hoffman, M., Lee, D., Goodrich, B., Betancourt, M., Brubaker, M. A., Guo, J., Li, P. and Stan, A. R. (2016). A probabilistic programming language. J. Stat. Softw.
  • Carr, J. H. and Megbolugbe, I. F. (1993). The Federal Reserve Bank of Boston study on mortgage lending revisited. J. Hous. Res. 4 277–313.
  • Corbett-Davies, S., Pierson, E., Feller, A., Goel, S. and Huq, A. (2017). Algorithmic decision making and the cost of fairness. Preprint. Available at 1701.08230.
  • Duane, S., Kennedy, A. D., Pendleton, B. J. and Roweth, D. (1987). Hybrid Monte Carlo. Phys. Lett. B 195 216–222.
  • Engel, R. S. and Calnon, J. M. (2004). Comparing benchmark methodologies for police-citizen contacts: Traffic stop data collection for the Pennsylvania State Police. Police Q. 7 97–125.
  • Engel, R. S. and Tillyer, R. (2008). Searching for equilibrium: The tenuous nature of the outcome test. Justice Q. 25 54–71.
  • Epp, C. R., Maynard-Moody, S. and Haider-Markel, D. P. (2014). Pulled over: How Police Stops Define Race and Citizenship. Univ. Chicago Press, Chicago, IL.
  • Galster, G. C. (1993). The facts of lending discrimination cannot be argued away by examining default rates. Hous. Policy Debate 4 141–146.
  • Gelman, A., Fagan, J. and Kiss, A. (2007). An analysis of the New York City Police Department’s “stop-and-frisk” policy in the context of claims of racial bias. J. Amer. Statist. Assoc. 102 813–823.
  • Gelman, A., Meng, X.-L. and Stern, H. (1996). Posterior predictive assessment of model fitness via realized discrepancies. Statist. Sinica 6 733–807.
  • Gelman, A. and Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statist. Sci. 7 457–472.
  • Gelman, A., Carlin, J. B., Stern, H. S. and Rubin, D. B. (2004). Bayesian Data Analysis, 2nd ed. Chapman & Hall/CRC, Boca Raton, FL.
  • Goel, S., Rao, J. M. and Shroff, R. (2016). Precinct or prejudice? Understanding racial disparities in New York City’s stop-and-frisk policy. Ann. Appl. Stat. 10 365–394.
  • Goel, S., Perelman, M., Shroff, R. and Sklansky, D. (2017). Combatting police discrimination in the age of big data. New Crim. Law Rev. 20 181–232.
  • Grogger, J. and Ridgeway, G. (2006). Testing for racial profiling in traffic stops from behind a veil of darkness. J. Amer. Statist. Assoc. 101 878–887.
  • Hetey, R., Monin, B., Maitreyi, A. and Eberhardt, J. (2016). Data for change: A statistical analysis of police stops, searches, handcuffings, and arrests in oakland, Calif., 2013-2014. Technical report, Stanford University, SPARQ: Social Psychological Answers to Real-World Questions.
  • Hoffman, M. D. and Gelman, A. (2014). The no-U-turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res. 15 1593–1623.
  • Jordan, M. I. (2004). Graphical models. Statist. Sci. 19 140–155.
  • Knowles, J., Persico, N. and Todd, P. (2001). Racial bias in motor vehicle searches: Theory and evidence. J. Polit. Econ. 109 203–229.
  • Lange, J. E., Blackman, K. O. and Johnson, M. B. (2001). Speed violation survey of the New Jersey turnpike: Final report, Public Services Research Institute.
  • Maclin, T. (2008). Good and bad news about consent searches in the Supreme Court. McGeorge Law Rev. 39 27.
  • McConnell, E. H. and Scheidegger, A. R. (2001). Race and speeding citations: Comparing speeding citations issued by air traffic officers with those issued by ground traffic officers. In Annual Meeting of the Academy of Criminal Justice Sciences, Washington, DC.
  • Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H. and Teller, E. (1953). Equation of state calculations by fast computing machines. J. Chem. Phys. 21 1087–1092.
  • Neal, R. M. (1994). An improved acceptance procedure for the hybrid Monte Carlo algorithm. J. Comput. Phys. 111 194–203.
  • Phelps, E. S. (1972). The statistical theory of racism and sexism. Am. Econ. Rev. 62 659–661.
  • Pierson, E., Corbett-Davies, S. and Goel, S. (2017). Fast threshold tests for detecting discrimination. Preprint. Available at 1702.08536.
  • Ridgeway, G. (2006). Assessing the effect of race bias in post-traffic stop outcomes using propensity scores. J. Quant. Criminol. 22 1–29.
  • Ridgeway, G. and MacDonald, J. M. (2009). Doubly robust internal benchmarking and false discovery rates for detecting racial bias in police stops. J. Amer. Statist. Assoc. 104 661–668.
  • Walker, S. (2003). Internal benchmarking for traffic stop data: An early intervention system approach. Technical report, Police Executive Research Forum.