Practical Inference-Time Attacks Against Machine-Learning Systems and a Defense Against Them

Abstract

Prior work has shown that machine-learning algorithms are vulnerable to evasion by so-called adversarial examples. Nonetheless, the majority of the work on evasion attacks has mainly explored $L_p$-bounded perturbations that lead to misclassification. From a computer-security perspective, such attacks have limited practical implications. To fill the gap, we propose evasion attacks that satisfy multiple objectives, and show that these attacks pose a practical threat to computer systems. In particular, we demonstrate how to produce adversarial examples against state-of-the-art face-recognition and malware-detection systems that simultaneously satisfy multiple objectives (e.g., smoothness and robustness against changes in imaging conditions) to mislead the systems in practical settings. Against face recognition, we develop a systematic method to automatically generate attacks, which are realized through printing a pair of eyeglass frames. When worn by attackers, the eyeglasses allow them mislead face-recognition algorithms to evade recognition or impersonate other individuals. Against malware detection, we develop an attack that guides binary-diversification tools via optimization to transform binaries in a functionality preserving manner and mislead detection.

The attacks that we initially demonstrate achieve the desired objectives via ad hoc optimizations. We extend these attacks via a general framework to train a generator neural network to emit adversarial examples satisfying desired objectives. We demonstrate the ability of the proposed framework to accommodate a wide range of objectives, including imprecise ones difficult to model, in two application domains. Specifically, we demonstrate how to produce adversarial eyeglass frames to mislead face recognition with better robustness, inconspicuousness, and scalability than previous approaches, as well as a new attack to fool a handwritten-digit classifier.

Finally, to protect computer-systems from adversarial examples, we propose $n$-ML—a novel defense that is inspired by $n$-version programming. $n$-ML trains an ensemble of $n$ classifiers, and classifies inputs by a vote. Unlike prior approaches, however, the classifiers are trained to classify adversarial examples differently than each other, rendering it very difficult for an adversarial example to obtain enough votes to be misclassified. In several application domains (including face and street-sign recognition), we show that $n$-ML roughly retains the benign classification accuracies of state-of-the-art models, while simultaneously defending against adversarial examples (produced by our framework, or $L_p$-based attacks) with better resilience than the best defenses known to date and, in most cases, with lower inference-time overhead.

Publication
Carnegie Mellon University