Open Access
September 2024 Mixture conditional regression with ultrahigh dimensional text data for estimating extralegal factor effects
Jiaxin Shi, Fang Wang, Yuan Gao, Xiaojun Song, Hansheng Wang
Author Affiliations +
Ann. Appl. Stat. 18(3): 2532-2550 (September 2024). DOI: 10.1214/24-AOAS1893

Abstract

Testing judicial impartiality is a problem of fundamental importance in empirical legal studies for which standard regression methods have been popularly used to estimate the extralegal factor effects. However, those methods cannot handle control variables with ultrahigh dimensionality, such as those found in judgment documents recorded in text format. To solve this problem, we develop a novel mixture conditional regression (MCR) approach, assuming that the whole sample can be classified into a number of latent classes. Within each latent class, a standard linear regression model can be used to model the relationship between the response and a key feature vector, which is assumed to be of a fixed dimension. Meanwhile, ultrahigh dimensional control variables are then used to determine the latent class membership, where a naïve Bayes type model is used to describe the relationship. Hence, the dimension of control variables is allowed to be arbitrarily high. A novel expectation-maximization algorithm is developed for model estimation. Therefore, we are able to estimate the key parameters of interest as efficiently as if the true class membership were known in advance. Simulation studies are presented to demonstrate the proposed MCR method. A real dataset of Chinese burglary offenses is analyzed for illustration purposes.

Funding Statement

Fang Wang’s research is supported by National Natural Science Foundation of China (T2293773, 72371145) and Taishan Scholars Project (tsqn202211004).
Yuan Gao’s research is partially supported by the Postdoctoral Fellowship Program of CPSF (GZC20230111).
Xiaojun Song’s research is partially supported by National Natural Science Foundation of China (72373007, 72333001).
Hansheng Wang’s research is partially supported by National Natural Science Foundation of China (12271012).

Acknowledgments

Fang Wang is the correpsonding author. The authors would like to thank the Editor, the Associate Editor, and the referees for their constructive comments and advice that improved the quality of this paper.

Citation

Download Citation

Jiaxin Shi. Fang Wang. Yuan Gao. Xiaojun Song. Hansheng Wang. "Mixture conditional regression with ultrahigh dimensional text data for estimating extralegal factor effects." Ann. Appl. Stat. 18 (3) 2532 - 2550, September 2024. https://doi.org/10.1214/24-AOAS1893

Information

Received: 1 September 2023; Revised: 1 March 2024; Published: September 2024
First available in Project Euclid: 5 August 2024

Digital Object Identifier: 10.1214/24-AOAS1893

Keywords: expectation-maximization algorithm , judicial impartiality , mixture conditional regression , naïve Bayes model , ultrahigh dimensional data

Rights: Copyright © 2024 Institute of Mathematical Statistics

Vol.18 • No. 3 • September 2024
Back to Top