Abstract
In multiple domains such as malware detection, automated driving systems, or fraud detection, classification algorithms are susceptible to being attacked by malicious agents willing to perturb the value of instance covariates to pursue certain goals. Such problems pertain to the field of adversarial machine learning and have been mainly dealt with, perhaps implicitly, through game-theoretic ideas with strong underlying common knowledge assumptions. These are not realistic in numerous application domains in relation to security and business competition. We present an alternative Bayesian decision theoretic framework that accounts for the uncertainty about the attacker’s behavior using adversarial risk analysis concepts. In doing so, we also present core ideas in adversarial machine learning to a statistical audience. A key ingredient in our framework is the ability to sample from the distribution of originating instances given the, possibly attacked, observed ones. We propose an initial procedure based on approximate Bayesian computation usable during operations; within it, we simulate the attacker’s problem taking into account our uncertainty about his elements. Large-scale problems require an alternative scalable approach implementable during the training stage. Globally, we are able to robustify statistical classification algorithms against malicious attacks.
Funding Statement
This work was supported by the Severo Ochoa Excellence Programme CEX2023-001347-S, the European Union’s Horizon 2020 Research and Innovation Program under Grant Agreements 815003 (Trustonomy), 101021797 (STAR- LIGHT), the EOARD-AFOSR project RC2APD, the FBBVA project AMALFI as well as NSF Grant DMS-1638521 at SAMSI.
DRI is supported by the AXA-ICMAT Chair and the Spanish Ministry of Science program PID2021-124662OB-I00.
AR was supported by project RTC-2017-6593-7.
VG acknowledges support from grant FPU16-05034 and PTQ2021-011758.
Part of the work was performed during the visit of VG, RN, DRI and FR to SAMSI (Statistical and Applied Mathematical Sciences Institute), Durham, NC, USA, within the “Games and Decisions in Risk and Reliability” program.
Acknowledgments
The first and the second authors contributed equally.
Roi Naveiro is the corresponding author.
Citation
Víctor Gallego. Roi Naveiro. Alberto Redondo. David Ríos Insua. Fabrizio Ruggeri. "Protecting Classifiers from Attacks." Statist. Sci. 39 (3) 449 - 468, August 2024. https://doi.org/10.1214/24-STS922
Information