Electronic Journal of Statistics

A criterion for privacy protection in data collection and its attainment via randomized response procedures

Jichong Chai and Tapan K. Nayak

Full-text: Open access

Abstract

Randomized response (RR) methods have long been suggested for protecting respondents’ privacy in statistical surveys. However, how to set and achieve privacy protection goals have received little attention. We give a full development and analysis of the view that a privacy mechanism should ensure that no intruder would gain much new information about any respondent from his response. Formally, we say that a privacy breach occurs when an intruder’s prior and posterior probabilities about a property of a respondent, denoted $p$ and $p_{*}$, respectively, satisfy $p_{*}<h_{l}(p)$ or $p_{*}>h_{u}(p)$, where $h_{l}$ and $h_{u}$ are two given functions. An RR procedure protects privacy if it does not permit any privacy breach. We explore effects of $(h_{l},h_{u})$ on the resultant privacy demand, and prove that it is precisely attainable only for certain $(h_{l},h_{u})$. This result is used to define a canonical strict privacy protection criterion, and give practical guidance on the choice of $(h_{l},h_{u})$. Then, we characterize all privacy satisfying RR procedures and compare their effects on data utility using sufficiency of experiments and identify the class of all admissible procedures. Finally, we establish an optimality property of a commonly used RR method.

Article information

Source
Electron. J. Statist., Volume 12, Number 2 (2018), 4264-4287.

Dates
Received: February 2018
First available in Project Euclid: 15 December 2018

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1544842902

Digital Object Identifier
doi:10.1214/18-EJS1508

Mathematical Reviews number (MathSciNet)
MR3892142

Zentralblatt MATH identifier
07003243

Subjects
Primary: 62D05: Sampling theory, sample surveys
Secondary: 62B15: Theory of statistical experiments

Keywords
admissibility Bayes factor data utility privacy breach sufficiency of experiments transition probability matrix

Rights
Creative Commons Attribution 4.0 International License.

Citation

Chai, Jichong; Nayak, Tapan K. A criterion for privacy protection in data collection and its attainment via randomized response procedures. Electron. J. Statist. 12 (2018), no. 2, 4264--4287. doi:10.1214/18-EJS1508. https://projecteuclid.org/euclid.ejs/1544842902


Export citation

References

  • Aggarwal, C.C. and Yu, P.S. (Eds.) (2008)., Privacy-Preserving Data Mining: Models and Algorithms, New York: Springer Science and Business Media.
  • Agrawal, S., Haritsa, J.R. and Prakash, B.A. (2009). FRAPP: A Framework for high-accuracy privacy-preserving mining., Data Mining and Knowledge Discovery, 18, 101-139.
  • Basu, D. (1988). Likelihood and partial likelihood. In, Statistical Information and Likelihood: A Collection of Critical Essays by Dr. D. Basu, J.K. Ghosh (ed.), Springer, New York, pp. 313-320.
  • Blackwell, D. (1951). Comparison of experiments. In, Proceedings of Second Berkeley Symposium on Mathematical Statistics and Probability. University of California Press, Berkeley, pp. 93-102.
  • Blackwell, D. (1953). Equivalent comparison of experiments., Annals of Mathematical Statistics. 24, 265-272.
  • Boreale, M., and Paolini, M. (2015). Worst-and average-case privacy breaches in randomization mechanisms., Theoretical Computer Science, 597, 40-61.
  • Chakravarti, I.M. (1975). On a characterization of irreducibility of a non-negative matrix., Linear Algebra and Its Applications, 10, 103-109.
  • Chaudhuri, A. (2010)., Randomized Response and Indirect Questioning Techniques in Surveys. Boca Raton: CRC Press.
  • Chaudhuri, A. and Mukerjee, R. (1988)., Randomized Response: Theory and Techniques. New York: Marcel Dekker.
  • Chen, B-C., Kifer, D., LeFevre, K. and Machanavajjhala, A. (2009) Privacy-preserving data publishing., Foundations and Trends in Databases, 2, 1-167.
  • Cruyff, M.J., Van Den Hout, A., and Van Der Heijden, P.G. (2008). The analysis of randomized response sum score variables., Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70, 21-30.
  • Duchi, J.C., Jordan, M.I., and Wainwright, M.J. (2018). Minimax optimal procedures for locally private estimation., Journal of the American Statistical Association, 113, 182-201.
  • Erlingsson, U., Pihur, V. and Korolova, A. (2014). Rappor: Randomized aggregatable privacy-preserving ordinal response. In, Proceedings of the 2014 ACM SIGSAC conference on computer and communications security, Scottsdale, Arizona, pp. 1054-1067.
  • Evfimievski, A., Gehrke, J. and Srikant, R. (2003). Limiting privacy breaches in privacy-preserving data mining., Proceedings of the 22nd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS), San Diego, pp. 211-222.
  • Evfimievski, A., Srikant, R. Agrawal, R. and Gehrke, J. (2004) Privacy preserving mining of association rules., Information Systems, 29, 343-364.
  • Fung, B.C.M., Wang, K., Chen, R. and Yu, P.S. (2010). Privacy-preserving data publishing: A survey of recent developments., ACM Computing Surveys, 42, 14.
  • Gouweleeuw, J.M., Kooiman, P., Willenborg, L.C.R.J. and De Wolf, P.-P. (1998). Post randomisation for statistical disclosure control: Theory and implementation., Journal of Official Statistics, 14, 463-478.
  • Hundepool, A., Domingo-Ferrer, J., Franconi, L., Giessing, S., Nordholt, E.S., Spicer, K. and de Wolf, P.-P. (2012)., Statistical Disclosure Control. New York: John Wiley & Sons.
  • Kairouz, P., Bonawitz, K., and Ramage, D. (2016a). Discrete distribution estimation under local privacy. In, Proceedings of the 33rd International Conference on Machine Learning, New York, pp. 2436-2444.
  • Kairouz, P., Oh, S., and Viswanath, P. (2016b). Extremal Mechanisms for Local Differential Privacy., Journal of Machine Learning Research, 17, 1-51.
  • Kass, R.E., and Raftery, A.E. (1995). Bayes factors., Journal of the American Statistical Association, 90, 773-795.
  • Kifer, D. and Lin, B-R. (2012). An axiomatic view of statistical privacy and utility., Journal of Privacy and Confidentiality, 4, 5-49.
  • Minc, H. (1988)., Nonnegative Matrices. New York: John Wiley & Sons.
  • Nayak, T.K. and Adeshiyan, S.A. (2016). On invariant post-randomization for statistical disclosure control., International Statistical Review, 84, 26-42.
  • Nayak, T.K., Adeshiyan, S.A. and Zhang, C. (2016). A Concise Theory of Randomized Response Techniques for Privacy and Confidentiality Protection., Handbook of Statistics, 34, 273-286.
  • Nayak, T.K., Zhang, C., and Adeshiyan, S.A. (2015). Emerging applications of randomized response concepts and some related issues., Model Assisted Statistics and Applications, 10, 335-344.
  • Nayak, T.K., Zhang, C., and You, J. (2018). Measuring Identification Risk in Microdata Release and Its Control by Post-randomisation., International Statistical Review, 86, 300-321.
  • Taussky, O. (1949). A recurring theorem on determinants., The American Mathematical Monthly, 56, 672-676.
  • Torra, V. (2017)., Data Privacy: Foundations, New Developments and the Big Data Challenge. New York: Springer.
  • Van den Hout, A., and Elamir, E.A. (2006). Statistical disclosure control using post randomisation: Variants and measures for disclosure risk., Journal of Official Statistics, 22, 711-731.
  • Van den Hout, A. and Van der Heijedn, P.G. (2002). Randomized response, statistical disclosure control and misclassification: A review., International Statistical Review, 70, 269-288.
  • Warner, S.L. (1965). Randomized response: A survey technique for eliminating evasive answer bias., Journal of the American Statistical Association, 60, 63-69.
  • Willenborg, L.C.R.J. and De Waal, T. (2001)., Elements of Statistical Disclosure Control. New York: Springer.