Adversarial machine learning is the study of the attacks on machine learning algorithms, and of the defenses against such attacks. Most adversarial attacks focus on decreasing the image classification accuracy on computer vision models. My research has focused on adversarial attacks in the natural language processing field. Below is a project I worked on with my fellow student Suket Shah, now at Google. Motivated by Eric Wallace's work on Universal Adversarial Triggers for Attacking and Analyzing NLP, we developed a modified beam search to identify sequences of words that decrease accuracy of ELECTRA transformer model for question answering on The Stanford Question Answering Dataset.

Examination of Universal Trigger Robustness.pdf