Benutzer:ShirkahnW/Adversariales maschinelles Lernen
![]() | |
Wenn du dies liest:
|
Wenn du diesen Artikel überarbeitest:
|
Feindliches machinelles Lernen (engl. adversarial machine learning) ist die Untersuchung der Angriffe auf Algorithmen des maschinellen Lernens und der Verteidigungsmaßnahmen gegen solche Angriffe.[1] Eine 2020 durchgeführte Umfrage hat ergeben, dass in der Praxis ein dringender Bedarf an einem besseren Schutz für maschinelle Lernsysteme in industriellen Anwendungen besteht.[2]
Zum Verständnis sei angemerkt, dass die meisten Verfahren des maschinellen Lernens für die Bearbeitung bestimmter Problemstellungen konzipiert sind, wobei davon ausgegangen wird, dass die Trainings- und Testdaten aus derselben statistischen Verteilung stammen. Diese Annahme wird jedoch in praktischen Anwendungen mit hohem Risiko oft auf gefährliche Weise verletzt, da die Benutzer absichtlich gefälschte Daten liefern können, die die statistische Annahme verletzen.
Zu den am meisten bedrohten Modellen im Bereich des feindlichen maschinellen Lernens gehören Umgehungsangriffe,[3] Data-Poisoning-Angriffe,[4] Byzantine-Gradienten-Angriffe[5] und Modellextraktion.[6]
History
Im Jahr 2004 stellten Nilesh Dalvi und andere fest, dass die in Spam-Filtern verwendeten linearen Klassifizierer durch einfache "Umgehungsangriffe" überwunden werden können, indem Spammer "gute Wörter" in ihre Spam-E-Mails einfügen. (Um 2007 fügten einige Spammer zufälliges Rauschen hinzu, um Wörter in "Bild-Spam" zu verwischen, um OCR-basierte Filter zu umgehen). Im Jahr 2006 veröffentlichten Marco Barreno und andere die Publikation "Can Machine Learning Be Secure?", in der sie eine umfassende Taxonomie von Angriffen aufstellten. Noch 2013 hofften viele Forscher, dass nichtlineare Klassifizierer (wie Support-Vektor-Maschinen und neuronale Netze) gegen Angreifer resistent sein könnten, bis Battista Biggio und andere die ersten gradientenbasierten Angriffe auf solche Machine-Learning-Modelle demonstrierten (2012[7]-2013[8]). Im Jahr 2012 begannen tiefe neuronale Netze Computer-Vision Probleme zu dominieren; ab 2014 zeigten Christian Szegedy und andere, dass tiefe neuronale Netze von Angreifern überlistet werden können, wobei sie wiederum einen gradientenbasierten Angriff verwendeten, um gegnerische Störungen zu erzeugen.[9][10]
Kürzlich wurde festgestellt, dass Angriffe in der Praxis schwieriger zu realisieren sind, da die verschiedenen Umgebungsbedingungen die Wirkung von Rauschen aufheben.[11][12] So kann zum Beispiel jede kleine Drehung oder leichte Beleuchtung eines ungünstigen Bildes die Unglaubwürdigkeit zerstören. Darüber hinaus weisen Forscher wie Nicholas Frosst von Google Brain darauf hin, dass es viel einfacher ist, selbstfahrende Autos[13] dazu zu bringen, Stoppschilder zu übersehen, indem man das Schild selbst physisch entfernt, anstatt Gegenbeispiele zu schaffen.[14] Frosst ist auch der Meinung, dass die Gemeinschaft des feindlichen maschinellen Lernens fälschlicherweise davon ausgeht, dass Modelle, die für eine bestimmte Datenverteilung trainiert wurden, auch bei einer völlig anderen Datenverteilung gut funktionieren. Er schlägt vor, einen neuen Ansatz für das maschinelle Lernen zu erforschen, und arbeitet derzeit an einem einzigartigen neuronalen Netzwerk, das der menschlichen Wahrnehmung ähnlicher ist als herkömmliche Ansätze.[14]
Während das feindliche maschinelle Lernen nach wie vor stark im akademischen Bereich verwurzelt ist, haben große Technologieunternehmen wie Google, Microsoft und IBM damit begonnen, Dokumentationen und Open-Source-Codebasen zu erstellen, um anderen die Möglichkeit zu geben, die Robustheit von maschinellen Lernmodellen konkret zu bewerten und das Risiko feindlicher Angriffe zu minimieren.[15][16][17]
Examples
Examples include attacks in spam filtering, where spam messages are obfuscated through the misspelling of "bad" words or the insertion of "good" words;[18][19] attacks in computer security, such as obfuscating malware code within network packets or modifying the characteristics of a network flow to mislead intrusion detection;[20][21] attacks in biometric recognition where fake biometric traits may be exploited to impersonate a legitimate user;[22] or to compromise users' template galleries that adapt to updated traits over time.
Researchers showed that by changing only one-pixel it was possible to fool deep learning algorithms.[23] Others 3-D printed a toy turtle with a texture engineered to make Google's object detection AI classify it as a rifle regardless of the angle from which the turtle was viewed.[24] Creating the turtle required only low-cost commercially available 3-D printing technology.[25]
A machine-tweaked image of a dog was shown to look like a cat to both computers and humans.[26] A 2019 study reported that humans can guess how machines will classify adversarial images.[27] Researchers discovered methods for perturbing the appearance of a stop sign such that an autonomous vehicle classified it as a merge or speed limit sign.[13][28][29]
McAfee attacked Tesla's former Mobileye system, fooling it into driving 50 mph over the speed limit, simply by adding a two-inch strip of black tape to a speed limit sign.[30][31]
Adversarial patterns on glasses or clothing designed to deceive facial-recognition systems or license-plate readers, have led to a niche industry of "stealth streetwear".[32]
An adversarial attack on a neural network can allow an attacker to inject algorithms into the target system.[33] Researchers can also create adversarial audio inputs to disguise commands to intelligent assistants in benign-seeming audio;[34] a parallel literature explores human perception of such stimuli.[35][36]
Clustering algorithms are used in security applications. Malware and computer virus analysis aims to identify malware families, and to generate specific detection signatures.[37][38]
Attack modalities
Taxonomy
Attacks against (supervised) machine learning algorithms have been categorized along three primary axes:[39] influence on the classifier, the security violation and their specificity.
- Classifier influence: An attack can influence the classifier by disrupting the classification phase. This may be preceded by an exploration phase to identify vulnerabilities. The attacker's capabilities might be restricted by the presence of data manipulation constraints.[40]
- Security violation: An attack can supply malicious data that gets classified as legitimate. Malicious data supplied during training can cause legitimate data to be rejected after training.
- Specificity: A targeted attack attempts to allow a specific intrusion/disruption. Alternatively, an indiscriminate attack creates general mayhem.
This taxonomy has been extended into a more comprehensive threat model that allows explicit assumptions about the adversary's goal, knowledge of the attacked system, capability of manipulating the input data/system components, and on attack strategy.[41][42] This taxonomy has further been extended to include dimensions for defense strategies against adversarial attacks.[43]
Strategies
Below are some of the most commonly encountered attack scenarios:
Data poisoning
Poisoning consists of contaminating the training dataset. Given that learning algorithms are shaped by their training datasets, poisoning can effectively reprogram algorithms. Serious concerns have been raised especially for user-generated training data, e.g. for content recommendation or natural language models, especially given the ubiquity of fake accounts. To measure the scale of the risk, it suffices to note that Facebook reportedly removes around 7 billion fake accounts per year.[44][45] In fact, data poisoning has been reported as the leading concern for industrial applications.[2]
On social medias, disinformation campaigns are known to produce vast amounts of fabricated activities to bias recommendation and moderation algorithms, to push certain content over others.
A particular case of data poisoning is called backdoor attack,[46] which aims to teaches a specific behavior for inputs with a given trigger, e.g. a small defect on images, sounds, videos or texts.
For instance, intrusion detection systems (IDSs) are often re-trained using collected data. An attacker may poison this data by injecting malicious samples during operation that subsequently disrupt retraining.[41][42][39][47][48][49]
Byzantine attacks
As machine learning is scaled, it often relies on multiple computing machines. In federated learning, for instance, edge devices collaborate with a central server, typically by sending gradients or model parameters. However, some of these devices may deviate from their expected behavior, e.g. to harm the central server's model[50] or to bias algorithms towards certain behaviors (e.g., amplifying the recommendation of disinformation content). On the other hand, if the training is performed on a single machine, then the model is very vulnerable to a failure of the machine, or an attack on the machine; the machine is a single point of failure.[51] In fact, the machine owner may themselves insert provably undetectable backdoors.[52]
The current leading solutions to make (distributed) learning algorithms provably resilient to a minority of malicious (a.k.a. Byzantine) participants are based on robust gradient aggregation rules.[53][54][55][56][57][58] Nevertheless, in the context of heterogeneous honest participants, such as users with different consumption habits for recommendation algorithms or writing styles for language models, there are provable impossibility theorems on what any robust learning algorithm can guarantee.[5][59]
Evasion
Evasion attacks[8][41][42][60] consist of exploiting the imperfection of a trained model. For instance, spammers and hackers often attempt to evade detection by obfuscating the content of spam emails and malware. Samples are modified to evade detection; that is, to be classified as legitimate. This does not involve influence over the training data. A clear example of evasion is image-based spam in which the spam content is embedded within an attached image to evade textual analysis by anti-spam filters. Another example of evasion is given by spoofing attacks against biometric verification systems.[22]
Evasion attacks can be generally split into two different categories: black box attacks and white box attacks.[16]
Model extraction
Model extraction involves an adversary probing a black box machine learning system in order to extract the data it was trained on.[61][62] This can cause issues when either the training data or the model itself is sensitive and confidential. For example, model extraction could be used to extract a proprietary stock trading model which the adversary could then use for their own financial benefit.
In the extreme case, model extraction can lead to model stealing, which corresponds to extracting a sufficient amount of data from the model to enable the complete reconstruction of the model.
On the other hand, membership inference is a targeted model extraction attack, which infers the owner of a data point, often by leveraging the overfitting resulting from poor machine learning practices.[63] Concerningly, this is sometimes achievable even without knowledge or access to a target model's parameters, raising security concerns for models trained on sensitive data, including but not limited to medical records and/or personally identifiable information. With the emergence of transfer learning and public accessibility of many state of the art machine learning models, tech companies are increasingly drawn to create models based on public ones, giving attackers freely accessible information to the structure and type of model being used.[63]
Specific attack types
There are a large variety of different adversarial attacks that can be used against machine learning systems. Many of these work on both deep learning systems as well as traditional machine learning models such as SVMs[7] and linear regression.[64] A high level sample of these attack types include:
- Adversarial Examples[65]
- Trojan Attacks / Backdoor Attacks[66]
- Model Inversion[67]
- Membership Inference [68]
Adversarial examples
An adversarial example refers to specially crafted input which is designed to look "normal" to humans but causes misclassification to a machine learning model. Often, a form of specially designed "noise" is used to elicit the misclassifications. Below are some current techniques for generating adversarial examples in the literature (by no means an exhaustive list).
- Gradient-based evasion attack[8]
- Fast Gradient Sign Method (FGSM)[69]
- Projected Gradient Descent (PGD)[70]
- Carlini and Wagner (C&W) attack[71]
- Adversarial patch attack[72]
Black Box Attacks
Black box attacks in adversarial machine learning assumes that the adversary can only get outputs for provided inputs and has no knowledge of the model structure or parameters.[16][73] In this case, the adversarial example is generated either using a model created from scratch, or without any model at all (excluding the ability to query the original model). In either case, the objective of these attacks are to create adversarial examples that are able to transfer to the black box model in question.[74]
Square Attack
The Square Attack was introduced in 2020 as a black box evasion adversarial attack based on querying classification scores without the need of gradient information.[75] As a score based black box attack, this adversarial approach is able to query probability distributions across model output classes, but has no other access to the model itself. According to the paper's authors, the proposed Square Attack required less queries than when compared to state of the art score based black box attacks at the time.[75]
To describe the function objective, the attack defines the classifier as , with representing the dimensions of the input and as the total number of output classes. returns the score (or a probability between 0 and 1) that the input belongs to class , which allows the classifier's class output for any input to be defined as . The goal of this attack is as follows:[75]
In other words, finding some perturbed adversarial example such that the classifier incorrectly classifies it to some other class under the constraint that and are similar. The paper then defines loss as and proposes the solution to finding adversarial example as solving the below constrained optimization problem:[75]
The result in theory is an adversarial example that is highly confident in the incorrect class but is also very similar to the original image. To find such example, Square Attack utilizes the iterative random search technique to randomly perturb the image in hopes of improving the objective function. In each step, the algorithm perturbs only a small square section of pixels, hence the name Square Attack, which terminates as soon as an adversarial example is found in order to improve query efficiency. Finally, since the attack algorithm uses scores and not gradient information, the authors of the paper indicate that this approach is not affected by gradient masking, a common technique formerly used to prevent evasion attacks.[75]
HopSkipJump Attack
This black box attack was also proposed as a query efficient attack, but one that relies solely on access to any input's predicted output class. In other words, the HopSkipJump attack does not require the ability to calculate gradients or access to score values like the Square Attack, and will require just the model's class prediction output (for any given input). The proposed attack is split into two different settings, targeted and untargeted, but both are built from the general idea of adding minimal perturbations that leads to a different model output. In the targeted setting, the goal is to cause the model to misclassify the perturbed image to a specific target label (that is not the original label). In the untargeted setting, the goal is to cause the model to misclassify the perturbed image to any label that is not the original label. The attack objectives for both are as follows where is the original image, is the adversarial image, is a distance function between images, is the target label, and is the model's classification class label function:
To solve this problem, the attack proposes the following boundary function for both the untargeted and targeted setting:
This can be further simplified to better visualize the boundary between different potential adversarial examples:
With this boundary function, the attack then follows an iterative algorithm to find adversarial examples for a given image that satisfies the attack objectives.
- Initialize to some point where
- Iterate below
- Boundary search
- Gradient update
- Compute the gradient
- Find the step size
Boundary search uses a modified binary search to find the point in which the boundary (as defined by ) intersects with the line between and . The next step involves calculating the gradient for , and update the original using this gradient and a pre-chosen step size. HopSkipJump authors prove that this iterative algorithm will converge, leading to a point right along the boundary that is very close in distance to the original image.[76]
However, since HopSkipJump is a proposed black box attack and the iterative algorithm above requires the calculation of a gradient in the second iterative step (which black box attacks do not have access to), the authors propose a solution to gradient calculation that requires only the model's output predictions alone.[76] By generating many random vectors in all directions, denoted as , an approximation of the gradient can be calculated using the average of these random vectors weighted by the sign of the boundary function on the image , where is the size of the random vector perturbation:
The result of the equation above gives a close approximation of the gradient required in step 2 of the iterative algorithm, completing HopSkipJump as a black box attack.[77][78][76]
White Box Attacks
White box attacks assumes that the adversary has access to model parameters on top of being able to get labels for provided inputs.[74]
Fast Gradient Sign Method (FGSM)
One of the very first proposed attacks for generating adversarial examples was proposed by Google researchers Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy.[79] The attack was called fast gradient sign method, and it consists of adding a linear amount of in-perceivable noise to the image and causing a model to incorrectly classify it. This noise is calculated by multiplying the sign of the gradient with respect to the image we want to perturb by a small constant epsilon. As epsilon increases, the model is more likely to be fooled, but the perturbations become easier to identify as well. Shown below is the equation to generate an adversarial example where is the original image, is a very small number, is the gradient function, is the loss function, is the model weights, and is the true label.[80]
One important property of this equation is that the gradient is calculated with respect to the input image since the goal is to generate an image that maximizes the loss for the original image of true label . In traditional gradient descent (for model training), the gradient is used to update the weights of the model since the goal is to minimize the loss for the model on a ground truth dateset. The Fast Gradient Sign Method was proposed as a fast way to generate adversarial examples to evade the model, based on the hypothesis that neural networks cannot resist even linear amounts of perturbation to the input.[80][81][79]
Carlini & Wagner (C&W)
In an effort to analyze existing adversarial attacks and defenses, researchers at the University of California, Berkeley, Nicholas Carlini and David Wagner in 2016 propose a faster and more robust method to generate adversarial examples.[82]
The attack proposed by Carlini and Wagner begins with trying to solve a difficult non-linear optimization equation:
Here the objective is to minimize the noise (), added to the original input , such that the machine learning algorithm () predicts the original input with delta (or ) as some other class . However instead of directly the above equation, Carlini and Wagner propose using a new function such that:
This condenses the first equation to the problem below:
and even more to the equation below:
Carlini and Wagner then propose the use of the below function in place of using , a function that determines class probabilities for given input . When substituted in, this equation can be thought of as finding a target class that is more confident than the next likeliest class by some constant amount:
When solved using gradient descent, this equation is able to produce stronger adversarial examples when compared to fast gradient sign method that is also able to bypass defensive distillation, a defense that was once proposed to be effective against adversarial examples.[83][84][82][62]
Defenses
Researchers have proposed a multi-step approach to protecting machine learning.[10]
- Threat modeling – Formalize the attackers goals and capabilities with respect to the target system.
- Attack simulation – Formalize the optimization problem the attacker tries to solve according to possible attack strategies.
- Attack impact evaluation
- Countermeasure design
- Noise detection (For evasion based attack)[85]
- Information laundering – Alter the information received by adversaries (for model stealing attacks)[62]
Mechanisms
A number of defense mechanisms against evasion, poisoning, and privacy attacks have been proposed, including:
- Deep Neural Network (DNN) classifiers enhanced with data augmentation from GANs, eg.[86]
- Secure learning algorithms[19][87][88]
- Byzantine-resilient algorithms[53][5]
- Multiple classifier systems[18][89]
- AI-written algorithms.[33]
- AIs that explore the training environment; for example, in image recognition, actively navigating a 3D environment rather than passively scanning a fixed set of 2D images.[33]
- Privacy-preserving learning[42][90]
- Ladder algorithm for Kaggle-style competitions
- Game theoretic models[91][92][93]
- Sanitizing training data
- Adversarial training[69][21]
- Backdoor detection algorithms[94]
See also
References
External links
- MITRE ATLAS: Adversarial Threat Landscape for Artificial-Intelligence Systems
- NIST 8269 Draft: A Taxonomy and Terminology of Adversarial Machine Learning
- NIPS 2007 Workshop on Machine Learning in Adversarial Environments for Computer Security
- AlfaSVMLib – Adversarial Label Flip Attacks against Support Vector Machines[95]
- Pavel Laskov, Richard Lippmann: Machine learning in adversarial environments. In: Machine Learning. 81, Nr. 2, 2010, S. 115–119. doi:10.1007/s10994-010-5207-6.
- Dagstuhl Perspectives Workshop on "Machine Learning Methods for Computer Security"
- Workshop on Artificial Intelligence and Security, (AISec) Series
Vorlage:Differentiable computing
[[Category:Machine learning]] [[Category:Computer security]]
- ↑ Mazaher Kianpour, Shao-Fang Wen: Timing Attacks on Machine Learning: State of the Art. In: Intelligent Systems and Applications (en) (= Advances in Intelligent Systems and Computing), Band 1037 2020, ISBN 978-3-030-29515-8, S. 111–125, doi:10.1007/978-3-030-29516-5_10.
- ↑ a b Ram Shankar Siva Kumar, Magnus Nyström, John Lambert, Andrew Marshall, Mario Goertzel, Andi Comissoneru, Matt Swann, Sharon Xia: Adversarial Machine Learning-Industry Perspectives. In: 2020 IEEE Security and Privacy Workshops (SPW). May 2020, S. 69–75. doi:10.1109/SPW50608.2020.00028.
- ↑ Ian Goodfellow, Patrick McDaniel, Nicolas Papernot: Making machine learning robust against adversarial inputs. In: Communications of the ACM. 61, Nr. 7, 25 June 2018, ISSN 0001-0782, S. 56–66. doi:10.1145/3134599. (Seite nicht mehr abrufbar)
- ↑ Jonas Geiping, Liam H. Fowl, W. Ronny Huang, Wojciech Czaja, Gavin Taylor, Michael Moeller, Tom Goldstein: Witches' Brew: Industrial Scale Data Poisoning via Gradient Matching. In: International Conference on Learning Representations 2021..
- ↑ a b c El Mahdi El-Mhamdi, Sadegh Farhadkhani, Rachid Guerraoui, Arsany Guirguis, Lê-Nguyên Hoang, Sébastien Rouault: Collaborative Learning in the Jungle (Decentralized, Byzantine, Heterogeneous, Asynchronous and Nonconvex Learning). In: Advances in Neural Information Processing Systems. 34, 6. Dezember 2021.
- ↑ Florian Tramèr, Fan Zhang, Ari Juels, Michael K. Reiter, Thomas Ristenpart: Stealing Machine Learning Models via Prediction {APIs}. In: 25th USENIX Security Symposium., S. 601–618. ISBN 978-1-931971-32-4
- ↑ a b Vorlage:Cite arXiv
- ↑ a b c Battista Biggio, Igino Corona, Davide Maiorca, Blaine Nelson, Nedim Srndic, Pavel Laskov, Giorgio Giacinto, Fabio Roli: Evasion attacks against machine learning at test time. In: ECML PKDD (= Lecture Notes in Computer Science), Band 7908. Springer, 2013, ISBN 978-3-642-38708-1, S. 387–402, arxiv:1708.06131, doi:10.1007/978-3-642-40994-3_25.
- ↑ Vorlage:Cite arXiv
- ↑ a b Battista Biggio, Fabio Roli: Wild patterns: Ten years after the rise of adversarial machine learning. In: Pattern Recognition. 84, December 2018, S. 317–331. arxiv:1712.03141. bibcode:2018PatRe..84..317B. doi:10.1016/j.patcog.2018.07.023.
- ↑ Vorlage:Cite arXiv
- ↑ Gupta, Kishor Datta, Dipankar Dasgupta, and Zahid Akhtar. "Applicability issues of Evasion-Based Adversarial Attacks and Mitigation Techniques." 2020 IEEE Symposium Series on Computational Intelligence (SSCI). 2020.
- ↑ a b Hazel Si Min Lim, Araz Taeihagh: Algorithmic Decision-Making in AVs: Understanding Ethical and Technical Concerns for Smart Cities. In: Sustainability. 11, Nr. 20, 2019, S. 5791. arxiv:1910.13122. bibcode:2019arXiv191013122L. doi:10.3390/su11205791.
- ↑ a b Synced: Google Brain's Nicholas Frosst on Adversarial Examples and Emotional Responses | Synced (Amerikanisches Englisch) In: syncedreview.com . 21. November 2019. Abgerufen am 23. Oktober 2021.
- ↑ Responsible AI practices (Englisch) In: Google AI . Abgerufen am 23. Oktober 2021.
- ↑ a b c Adversarial Robustness Toolbox (ART) v1.8[1], Trusted-AI, 2021-10-23
- ↑ amarshal: Failure Modes in Machine Learning - Security documentation (Amerikanisches Englisch) In: docs.microsoft.com . Abgerufen am 23. Oktober 2021.
- ↑ a b Battista Biggio, Giorgio Fumera, Fabio Roli: Multiple classifier systems for robust classifier design in adversarial environments. In: International Journal of Machine Learning and Cybernetics. 1, Nr. 1–4, 2010, ISSN 1868-8071, S. 27–41. doi:10.1007/s13042-010-0007-7.
- ↑ a b Michael Brückner, Christian Kanzow, Tobias Scheffer: Static Prediction Games for Adversarial Learning Problems. In: Journal of Machine Learning Research. 13, Nr. Sep, 2012, ISSN 1533-7928, S. 2617–2654.
- ↑ Giovanni Apruzzese, Mauro Andreolini, Luca Ferretti, Mirco Marchetti, Michele Colajanni: Modeling Realistic Adversarial Attacks against Network Intrusion Detection Systems. In: Digital Threats: Research and Practice. 3. Juni 2021, ISSN 2692-1626. doi:10.1145/3469659.
- ↑ a b João Vitorino, Nuno Oliveira, Isabel Praça: Adaptative Perturbation Patterns: Realistic Adversarial Learning for Robust Intrusion Detection. In: Future Internet. 14, Nr. 4, March 2022, ISSN 1999-5903, S. 108. doi:10.3390/fi14040108.
- ↑ a b Ricardo N. Rodrigues, Lee Luan Ling, Venu Govindaraju: Robustness of multimodal biometric fusion methods against spoof attacks. In: Journal of Visual Languages & Computing. 20, Nr. 3, 1 June 2009, ISSN 1045-926X, S. 169–179. doi:10.1016/j.jvlc.2009.01.010.
- ↑ Jiawei Su, Danilo Vasconcellos Vargas, Kouichi Sakurai: One Pixel Attack for Fooling Deep Neural Networks. In: IEEE Transactions on Evolutionary Computation. 23, Nr. 5, October 2019, ISSN 1941-0026, S. 828–841. arxiv:1710.08864. doi:10.1109/TEVC.2019.2890858.
- ↑ Single pixel change fools AI programs. In: BBC News, 3 November 2017. Abgerufen im 12 February 2018.
- ↑ Vorlage:Cite arXiv
- ↑ Vorlage:Cite magazine
- ↑ Zhenglong Zhou, Chaz Firestone: Humans can decipher adversarial images. In: Nature Communications. 10, Nr. 1, 2019. arxiv:1809.04120. bibcode:2019NatCo..10.1334Z. doi:10.1038/s41467-019-08931-6. PMID 30902973. PMC 6430776 (freier Volltext).
- ↑ Anant Jain: Breaking neural networks with adversarial attacks – Towards Data Science (Englisch) In: Medium . 9. Februar 2019. Abgerufen am 15. Juli 2019.
- ↑ Evan Ackerman: Slight Street Sign Modifications Can Completely Fool Machine Learning Algorithms (Englisch) In: IEEE Spectrum: Technology, Engineering, and Science News . 4. August 2017. Abgerufen am 15. Juli 2019.
- ↑ Vorlage:Cite magazine
- ↑ Model Hacking ADAS to Pave Safer Roads for Autonomous Vehicles (Amerikanisches Englisch) In: McAfee Blogs . 19. Februar 2020. Abgerufen am 11. März 2020.
- ↑ Vorlage:Cite magazine
- ↑ a b c Douglas Heaven: Why deep-learning AIs are so easy to fool. In: Nature. 574, Nr. 7777, October 2019, S. 163–166. bibcode:2019Natur.574..163H. doi:10.1038/d41586-019-03013-5. PMID 31597977.
- ↑ Matthew Hutson: AI can now defend itself against malicious messages hidden in speech. In: Nature. 10 May 2019. doi:10.1038/d41586-019-01510-1. PMID 32385365.
- ↑ Vorlage:Cite arXiv
- ↑ Vorlage:Cite arXiv
- ↑ D. B. Skillicorn. "Adversarial knowledge discovery". IEEE Intelligent Systems, 24:54–61, 2009.
- ↑ a b B. Biggio, G. Fumera, and F. Roli. "Pattern recognition systems under attack: Design issues and research challenges". Int'l J. Patt. Recogn. Artif. Intell., 28(7):1460002, 2014.
- ↑ a b Marco Barreno, Blaine Nelson, Anthony D. Joseph, J. D. Tygar: The security of machine learning. In: Machine Learning. 81, Nr. 2, 2010, S. 121–148. doi:10.1007/s10994-010-5188-5.
- ↑ Leslie F. Sikos: AI in Cybersecurity (= Intelligent Systems Reference Library), Band 151. Springer, Cham 2019, ISBN 978-3-319-98841-2, S. 50, doi:10.1007/978-3-319-98842-9.
- ↑ a b c B. Biggio, G. Fumera, and F. Roli. "Security evaluation of pattern classifiers under attackBitte entweder wayback- oder webciteID- oder archive-is- oder archiv-url-Parameter angeben". IEEE Transactions on Knowledge and Data Engineering, 26(4):984–996, 2014.
- ↑ a b c d e Battista Biggio, Igino Corona, Blaine Nelson, Benjamin I. P. Rubinstein, Davide Maiorca, Giorgio Fumera, Giorgio Giacinto, Fabio Roli: Security Evaluation of Support Vector Machines in Adversarial Environments. In: Support Vector Machines Applications (en). Springer International Publishing, 2014, ISBN 978-3-319-02300-7, S. 105–153, arxiv:1401.7727, doi:10.1007/978-3-319-02300-7_4.
- ↑ Kai Heinrich, Johannes Graf, Ji Chen, Jakob Laurisch, Patrick Zschech: FOOL ME ONCE, SHAME ON YOU, FOOL ME TWICE, SHAME ON ME: A TAXONOMY OF ATTACK AND DE-FENSE PATTERNS FOR AI SECURITY. In: ECIS 2020 Research Papers. 15. Juni 2020.
- ↑ Facebook removes 15 Billion fake accounts in two years (Britisches Englisch) In: Tech Digest . 27. September 2021. Abgerufen am 8. Juni 2022.
- ↑ Facebook removed 3 billion fake accounts in just 6 months (Amerikanisches Englisch) In: New York Post . 23. Mai 2019. Abgerufen am 8. Juni 2022.
- ↑ Avi Schwarzschild, Micah Goldblum, Arjun Gupta, John P. Dickerson, Tom Goldstein: Just How Toxic is Data Poisoning? A Unified Benchmark for Backdoor and Data Poisoning Attacks. In: PMLR (Hrsg.): International Conference on Machine Learning. 1. Juli 2021, S. 9389–9398.
- ↑ B. Biggio, B. Nelson, and P. Laskov. "Support vector machines under adversarial label noise". In Journal of Machine Learning Research – Proc. 3rd Asian Conf. Machine Learning, volume 20, pp. 97–112, 2011.
- ↑ M. Kloft and P. Laskov. "Security analysis of online centroid anomaly detection". Journal of Machine Learning Research, 13:3647–3690, 2012.
- ↑ Ilja Moisejevs: Poisoning attacks on Machine Learning – Towards Data Science (Englisch) In: Medium . 15. Juli 2019. Abgerufen am 15. Juli 2019.
- ↑ Gilad Baruch, Moran Baruch, Yoav Goldberg: A Little Is Enough: Circumventing Defenses For Distributed Learning. In: Curran Associates, Inc. (Hrsg.): Advances in Neural Information Processing Systems. 32, 2019.
- ↑ El-Mahdi El-Mhamdi, Rachid Guerraoui, Arsany Guirguis, Lê-Nguyên Hoang, Sébastien Rouault: Genuinely distributed Byzantine machine learning. In: Distributed Computing. 26. Mai 2022, ISSN 1432-0452. doi:10.1007/s00446-022-00427-9.
- ↑ S. Goldwasser, Michael P. Kim, V. Vaikuntanathan, Or Zamir: Planting Undetectable Backdoors in Machine Learning Models. In: ArXiv. 2022. doi:10.48550/arXiv.2204.06974.
- ↑ a b Peva Blanchard, El Mahdi El Mhamdi, Rachid Guerraoui, Julien Stainer: Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent. In: Curran Associates, Inc. (Hrsg.): Advances in Neural Information Processing Systems. 30, 2017.
- ↑ Lingjiao Chen, Hongyi Wang, Zachary Charles, Dimitris Papailiopoulos: DRACO: Byzantine-resilient Distributed Training via Redundant Gradients. In: PMLR (Hrsg.): International Conference on Machine Learning. 3. Juli 2018, S. 903–912.
- ↑ El Mahdi El Mhamdi, Rachid Guerraoui, Sébastien Rouault: The Hidden Vulnerability of Distributed Learning in Byzantium. In: PMLR (Hrsg.): International Conference on Machine Learning. 3. Juli 2018, S. 3521–3530.
- ↑ Zeyuan Allen-Zhu, Faeze Ebrahimianghazani, Jerry Li, Dan Alistarh: Byzantine-Resilient Non-Convex Stochastic Gradient Descent. 28. September 2020.
- ↑ El Mahdi El Mhamdi, Rachid Guerraoui, Sébastien Rouault: Distributed Momentum for Byzantine-resilient Stochastic Gradient Descent. 28. September 2020.
- ↑ Deepesh Data, Suhas Diggavi: Byzantine-Resilient High-Dimensional SGD with Local Iterations on Heterogeneous Data. In: PMLR (Hrsg.): International Conference on Machine Learning. 1. Juli 2021, S. 2478–2488.
- ↑ Sai Praneeth Karimireddy, Lie He, Martin Jaggi: Byzantine-Robust Learning on Heterogeneous Datasets via Bucketing. 29. September 2021.
- ↑ B. Nelson, B. I. Rubinstein, L. Huang, A. D. Joseph, S. J. Lee, S. Rao, and J. D. Tygar. "Query strategies for evading convex-inducing classifiers". J. Mach. Learn. Res., 13:1293–1332, 2012
- ↑ How to steal modern NLP systems with gibberish? (Englisch) In: cleverhans-blog . 6. April 2020. Abgerufen am 15. Oktober 2020.
- ↑ a b c d e f g h Vorlage:Cite arXiv
- ↑ a b Ben Dickson: Machine learning: What are membership inference attacks? (Amerikanisches Englisch) In: TechTalks . 23. April 2021. Abgerufen am 7. November 2021.
- ↑ Matthew Jagielski, Alina Oprea, Battista Biggio, Chang Liu, Cristina Nita-Rotaru, Bo Li: Manipulating Machine Learning: Poisoning Attacks and Countermeasures for Regression Learning. In: IEEE (Hrsg.): 2018 IEEE Symposium on Security and Privacy (SP). May 2018, S. 19–35. arxiv:1804.00308. doi:10.1109/sp.2018.00057.
- ↑ Attacking Machine Learning with Adversarial Examples (Englisch) In: OpenAI . 24. Februar 2017. Abgerufen am 15. Oktober 2020.
- ↑ Vorlage:Cite arXiv
- ↑ Michael Veale, Reuben Binns, Lilian Edwards: Algorithms that remember: model inversion attacks and data protection law. In: Philosophical Transactions. Series A, Mathematical, Physical, and Engineering Sciences. 376, Nr. 2133, 28. November 2018, ISSN 1364-503X. arxiv:1807.04644. bibcode:2018RSPTA.37680083V. doi:10.1098/rsta.2018.0083. PMID 30322998. PMC 6191664 (freier Volltext).
- ↑ Vorlage:Cite arXiv
- ↑ a b Vorlage:Cite arXiv
- ↑ Vorlage:Cite arXiv
- ↑ Vorlage:Cite arXiv
- ↑ Vorlage:Cite arXiv
- ↑ Sensen Guo, Jinxiong Zhao, Xiaoyu Li, Junhong Duan, Dejun Mu, Xiao Jing: A Black-Box Attack Method against Machine-Learning-Based Anomaly Network Flow Detection Models. In: Security and Communication Networks. 2021, 24. April 2021, ISSN 1939-0114, S. e5578335. doi:10.1155/2021/5578335.
- ↑ a b Joao Gomes: Adversarial Attacks and Defences for Convolutional Neural Networks (Englisch) In: Onfido Tech . 17. Januar 2018. Abgerufen am 23. Oktober 2021.
- ↑ a b c d e f g Maksym Andriushchenko, Francesco Croce, Nicolas Flammarion, Matthias Hein: Square Attack: A Query-Efficient Black-Box Adversarial Attack via Random Search. In: Springer International Publishing (Hrsg.): Computer Vision – ECCV 2020. 12368, Cham, 2020, S. 484–501. arxiv:1912.00049. doi:10.1007/978-3-030-58592-1_29.
- ↑ a b c d e f g h HopSkipJumpAttack: A Query-Efficient Decision-Based Attack[2] (in en), 2019
- ↑ Vorlage:Cite arXiv
- ↑ Black-box decision-based attacks on images (Englisch) In: KejiTech . 21. Juni 2020. Abgerufen am 25. Oktober 2021.
- ↑ a b Vorlage:Cite arXiv
- ↑ a b Ken Tsui: Perhaps the Simplest Introduction of Adversarial Examples Ever (Englisch) In: Medium . 22. August 2018. Abgerufen am 24. Oktober 2021.
- ↑ a b Adversarial example using FGSM | TensorFlow Core (Englisch) In: TensorFlow . Abgerufen am 24. Oktober 2021.
- ↑ a b Vorlage:Cite arXiv
- ↑ carlini wagner attack. In: richardjordan.com . Abgerufen am 23. Oktober 2021.
- ↑ Mike Plotz: Paper Summary: Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods (Englisch) In: Medium . 26. November 2018. Abgerufen am 23. Oktober 2021.
- ↑ , Zahid Akhtar, Dipankar DasguptaDetermining Sequence of Image Processing Technique (IPT) to Detect Adversarial Attacks. In: SN Computer Science. 2, Nr. 5, 2021, ISSN 2662-995X. arxiv:2007.00337. doi:10.1007/s42979-021-00773-8.
- ↑ Christophe Feltus: LogicGAN–based Data Augmentation Approach to Improve Adversarial Attack DNN Classifiers. In: Proceedings of the 2021 International Conference on Computational Science and Computational Intelligence (CSCI). December 2021.
- ↑ O. Dekel, O. Shamir, and L. Xiao. "Learning to classify with missing and corrupted features". Machine Learning, 81:149–178, 2010.
- ↑ Wei Liu, Sanjay Chawla: Mining adversarial patterns via regularized loss minimization. In: Machine Learning. 81, 2010, S. 69–83. doi:10.1007/s10994-010-5199-2.
- ↑ B. Biggio, G. Fumera, and F. Roli. "Evade hard multiple classifier systems". In O. Okun and G. Valentini, editors, Supervised and Unsupervised Ensemble Methods and Their Applications, volume 245 of Studies in Computational Intelligence, pages 15–38. Springer Berlin / Heidelberg, 2009.
- ↑ B. I. P. Rubinstein, P. L. Bartlett, L. Huang, and N. Taft. "Learning in a large function space: Privacy- preserving mechanisms for svm learning". Journal of Privacy and Confidentiality, 4(1):65–100, 2012.
- ↑ M. Kantarcioglu, B. Xi, C. Clifton. "Classifier Evaluation and Attribute Selection against Active Adversaries". Data Min. Knowl. Discov., 22:291–335, January 2011.
- ↑ Aneesh Chivukula, Xinghao Yang, Wei Liu, Tianqing Zhu, Wanlei Zhou: Game Theoretical Adversarial Deep Learning with Variational Adversaries. In: IEEE Transactions on Knowledge and Data Engineering. 33, Nr. 11, 2020, ISSN 1558-2191, S. 3568–3581. doi:10.1109/TKDE.2020.2972320.
- ↑ Aneesh Sreevallabh Chivukula, Wei Liu: Adversarial Deep Learning Models with Multiple Adversaries. In: IEEE Transactions on Knowledge and Data Engineering. 31, Nr. 6, 2019, ISSN 1558-2191, S. 1066–1079. doi:10.1109/TKDE.2018.2851247.
- ↑ TrojAI. In: www.iarpa.gov . Abgerufen am 14. Oktober 2020.
- ↑ H. Xiao, B. Biggio, B. Nelson, H. Xiao, C. Eckert, and F. Roli. "Support vector machines under adversarial label contamination". Neurocomputing, Special Issue on Advances in Learning with Label Noise, In Press.