Pair of new attacks could threaten machine learning models
A group of researchers believe they have found a pair of new techniques for manipulating machine learning models
A group of researchers say they have uncovered a pair of new techniques for tampering with the way machine learning models process data.
A group from Google, Nvidia, Robust Intelligence, and ETH Zurich say that their attack methods could be used to poison the web-scale datasets that are used to train machine learning systems, effectively allowing them to tamper with the way those systems would operate going forward.
"Our attacks are immediately practical and could, today, poison 10 popular datasets," wrote the team of Nicholas Carlini, Matthew Jagielski, Christopher Choquette-Choo, Daniel Paleka, Will Pearce, Hyrum Anderson, Andreas Terzis, Kurt Thomas, and Florian Tramèr.
The group says that the attacks would allow a bad actor to be able to pinpoint precisely what content to poison in order to manipulate the machine learning models.
The first of the techniques, known as split-view data poisoning, takes advantage of the presence of expired domains within existing web-scale datasets.
Essentially, the threat actor could figure out which domains within a specific dataset have expired, purchase them, and then serve up malicious content until someone discovers their activity and updates the dataset.
"In particular, domain names occasionally expire— and when they do, anyone can buy them," the researchers explain.
"The adversary does not need to know the exact time at which clients will download the resource in the future: by owning the domain the adversary guarantees that any future download will collect poisoned data."
In their example, the researchers showed how an attacker could seek out a lapsed domain for an image-based dataset and simply replace the image files referenced in that list with malicious content of their own.
The second technique, known as frontrunning poisoning, takes almost the exact opposite approach. In a frontrunning poisoning attack, the bad actor would target a data source that they knew would be closely monitored, but rely instead on precise timing.
In this attack, the threat actor would have to work out precisely when the dataset would be likely to be collecting web content. The researchers found that when datasets use content from popular services such as Wikipedia, they do not actually crawl the live website, but rather rely on snapshots.
"Because Wikipedia is a live resource that anyone can edit, an attacker can poison a training set sourced from Wikipedia by making malicious edits," the team notes.
"Deliberate malicious edits (or 'vandalism') are not uncommon on Wikipedia, but are often manually reverted within a few minutes."
Should an attacker be able to work out when a snapshot is going to be collected, however, they can simply slip their malicious content into a page and take advantage of the window between when the snapshot is collected and when their changes are reverted.
The team found that, through careful analysis, they were able to more or less plot out when a given page was likely to be collected, given potential attackers a degree of certainty that their malicious edits would in fact be collected and, subsequently, used to poison machine learning models.
The full paper, Poisoning Web-Scale Datasets is Practical, can be found here.