Much of the anti-adversarial research has been on the potential for minute, largely undetectable alterations to images (researchers generally refer to these as “noise perturbations”) that cause AI’s machine learning (ML) algorithms to misidentify or misclassify the images. Adversarial tampering can be extremely subtle and hard to detect, even all the way down to pixel-level subliminals. If an attacker can introduce nearly invisible alterations to image, video, speech, or other data for the purpose of fooling AI-powered classification tools, it will be difficult to trust this otherwise sophisticated technology to do its job effectively.
Growing threat to deployed AI apps
This is no idle threat. Eliciting false algorithmic inferences can cause an AI-based app to make incorrect decisions, such as when a self-driving vehicle misreads a traffic sign and then turns the wrong way or, in a worst-case scenario, crashes into a building, vehicle, or pedestrian. Though the research literature focuses on simulated adversarial ML attacks that were conducted in controlled laboratory environments, general knowledge that these attack vectors are available will almost certainly cause terrorists, criminals, or mischievous parties to exploit them.
Although high-profile adversarial attacks did not appear to impact the ML that powered this year’s U.S. presidential campaign, we cannot deny the potential for these in future electoral cycles. Throughout this pandemic-wracked year, adversarial attacks on ML platforms have continued to intensify in other sectors of our lives.
This year, the National Vulnerability Database (part of the U.S. National Institute for Science and Technology) issued its first Common Vulnerabilities and Exposures report for an ML component in a commercial system. Also, the Software Engineering Institute’s CERT Coordination Center issued its first vuln note flagging the extent to which many operational ML systems are vulnerable to arbitrary misclassification attacks.
Late last year, Gartner predicted that during the next two years 30 percent of all cyberattacks on AI apps would use adversarial tactics. Sadly, it would be premature to say that anti-adversarial best practices are taking hold within the AI community. A recent industry survey by Microsoft found that few industry practitioners are taking the threat of adversarial machine learning seriously at this point or using tools that can mitigate the risks of such attacks.
Even if it were possible to identify adversarial attacks in progress, targeted organizations would find it challenging to respond to these assaults in all their dizzying diversity. And there’s no saying whether ad-hoc responses to new threats will coalesce into a pre-emptive anti-adversarial AI “hardening” strategy anytime soon.
Anti-adversarial ML security methodologies
As these attacks surface in greater numbers, AI professionals will clamor for a consensus methodology for detecting and dealing with adversarial risks.
An important milestone in adversarial defenses took place recently. Microsoft, MITRE, and 11 other organizations released an Adversarial ML Threat Matrix. This is an open, extensible framework structured like MITRE’s widely adopted ATT&CK framework that helps security analysts classify the most common adversarial tactics that have been used to disrupt and deceive ML systems.
Developed in conjunction with Carnegie Mellon and other leading research universities, the framework presents techniques for monitoring an organization’s ML systems to detect whether such attacks are in progress or have already taken place. It lists vulnerabilities and adversary behaviors that are effective against production ML systems. It also provides case studies describing how well-known attacks such as the Microsoft Tay poisoning and the Proofpoint evasion attack can be analyzed using this framework.
As discussed in the framework, there are four principal adversarial tactics for compromising ML apps.
Functional extraction involves unauthorized recovery of a functionally equivalent ML model by iteratively querying the model with arbitrary inputs. The attacker can infer and generate a high-fidelity offline copy of the model to guide further attacks to the deployed production ML model.
Model evasion occurs when attackers iteratively introduce arbitrary inputs, such as subtle pixel-level changes to images. The changes are practically undetectable to human senses but cause vulnerable ML models to classify the images or other doctored content incorrectly.
Model inversion involves unauthorized recovery of the predictive features that were used to build an ML model. It enables attackers to launch inferences that compromise the private data that was used in training the model.
Model poisoning means training data has been contaminated in order to surreptitiously produce specific unauthorized inferences when arbitrary input data is introduced to the poisoned ML model in runtime.
Taken individually or combined in diverse ways, these tactics could enable an attacker to surreptitiously “reprogram” an AI app or steal precious intellectual property (data and ML models). All are potential tools for perpetrating fraud, espionage, or sabotage against applications, databases, and other online systems with ML algorithms at their heart.
Fruitful anti-adversarial ML tools and tactics
Anti-adversarial tactics must be rooted deeply in the ML development pipeline, leveraging code repositories, CI/CD (continuous integration/continuous delivery), and other devops infrastructure and tools.
Grounding their recommendations in devsecops and traditional application security practices, the framework’s authors call for a multipronged anti-adversarial methodology that includes critical countermeasures.
Secure coding practices would reduce exploitable adversarial vulnerabilities in ML programs and enable other engineers to audit source code. In addition, security-compliance code examples in popular ML frameworks would contribute to the spread of adversarially hardened ML apps. So far TensorFlow is the only ML framework that provides consolidated guidance around traditional software attacks and links to tools for testing against adversarial attacks. The framework’s authors recommend exploring whether containerizing ML apps can help to quarantine uncompromised ML systems from the impact of adversarially impacted ML systems.
Code analysis tools help detect potential adversarial weaknesses in ML apps as coded or when the apps execute particular code paths. ML tools such as cleverhans, secml, and IBM’s Adversarial Robustness Toolbox support varying degrees of static and dynamic ML code testing. The Adversarial ML Threat Matrix’s publishers call for such tools to be integrated with full-featured ML development toolkits to support fine-grained code assessment before ML apps are committed to the code repository. They also recommend integration of dynamic code-analysis tools for adversarial ML into CI/CD pipelines. This latter recommendation would support automation of adversarial ML testing in production ML apps.
System auditing and logging tools support runtime detection of adversarial and other anomalous processes being executed on ML systems. The matrix’s publishers call for ML platforms to use these tools to monitor, at the very least, for attacks listed in the curated repository. This would enable tracing adversarial attacks back to their sources and exporting anomalous event logs to security incident and event management systems. They propose that detection methods be written into a format that facilitates easy sharing among security analysts. They also recommend that the adversarial ML research community register adversarial vulnerabilities in a trackable system like the National Vulnerability Database in order to alert impacted vendors, users, and other stakeholders.
A growing knowledgebase
The new anti-adversarial framework’s authors provide access through their GitHub repo to what they call a “curated repository of attacks.” Every attack documented in this searchable resource has a description of the adversarial technique, the type of advanced persistent threat that has been observed to use the tactic, recommendations for detecting it, and references to publications that provide further insight.
As they become aware of new adversarial ML attack vectors, AI and security professionals should register those in this repository. This way the initiative can keep pace with the growing range of threats to the integrity, security, and reliability of deployed ML apps.
Going forward, AI application developers and security analysts should also:
- Assume the possibility of adversarial attacks on all in-production ML applications.
- Perform adversarial threat assessments prior to writing or deploying vulnerable code.
- Generate adversarial examples as a standard risk-mitigation activity in the AI training pipeline.
- Test AI apps against a wide range of adversarial inputs to determine the robustness of their inferences.
- Reuse adversarial-defense knowledge, such as that provided by the new Adversarial ML Threat Matrix, to improve AI resilience against bogus input examples.
- Update ongoing adversarial attack defenses throughout the lifecycle of deployed AI models.
- Ensure data scientists have sophisticated anti-adversarial methodologies to guide them in applying these practices throughout the AI development and operationalization lifecycle.
For further information on the new Adversarial ML Threat Matrix, check out the initiative’s GitHub repository, MITRE’s announcement, and the Carnegie Mellon SEI/CERT’s blog post. Other useful resources for security analysts to develop their own anti-adversarial strategies include Microsoft’s taxonomy of ML failure modes, threat modeling guidance specifically for ML systems, and the security development lifecycle bug bar to systematically triage attacks on ML systems.