A criterion for privacy protection in data collection and its attainment via randomized response procedures

Jichong Chai; Tapan K. Nayak

doi:10.1214/18-EJS1508

2018 A criterion for privacy protection in data collection and its attainment via randomized response procedures

Jichong Chai, Tapan K. Nayak

Electron. J. Statist. 12(2): 4264-4287 (2018). DOI: 10.1214/18-EJS1508

Abstract

Randomized response (RR) methods have long been suggested for protecting respondents’ privacy in statistical surveys. However, how to set and achieve privacy protection goals have received little attention. We give a full development and analysis of the view that a privacy mechanism should ensure that no intruder would gain much new information about any respondent from his response. Formally, we say that a privacy breach occurs when an intruder’s prior and posterior probabilities about a property of a respondent, denoted $p$ and $p_{*}$, respectively, satisfy $p_{*}<h_{l}(p)$ or $p_{*}>h_{u}(p)$, where $h_{l}$ and $h_{u}$ are two given functions. An RR procedure protects privacy if it does not permit any privacy breach. We explore effects of $(h_{l},h_{u})$ on the resultant privacy demand, and prove that it is precisely attainable only for certain $(h_{l},h_{u})$. This result is used to define a canonical strict privacy protection criterion, and give practical guidance on the choice of $(h_{l},h_{u})$. Then, we characterize all privacy satisfying RR procedures and compare their effects on data utility using sufficiency of experiments and identify the class of all admissible procedures. Finally, we establish an optimality property of a commonly used RR method.

Citation

Download Citation

Jichong Chai. Tapan K. Nayak. "A criterion for privacy protection in data collection and its attainment via randomized response procedures." Electron. J. Statist. 12 (2) 4264 - 4287, 2018. https://doi.org/10.1214/18-EJS1508