CS 201: Feature Purification: How Can Adversarial Training Perform Robust Deep Learning, YUANZHI LI, Carnegie Mellon University

Speaker: Yuanzhi Li
Affiliation: Carnegie Mellon University

ABSTRACT:

Despite the great empirical success of adversarial training to defend deep learning models against adversarial perturbations, so far, it still remains rather unclear what the principles are behind the existence of adversarial perturbations, and what adversarial training does to the neural network to remove them.

In this paper, we present a principle that we call “feature purification”, where we show that one of the causes of the existence of adversarial examples is due to the accumulation of certain small “dense mixtures” in the hidden weights during the training process of a neural network. Moreover, we show that one of the goals of adversarial training is to remove such small mixtures to “purify” hidden weights, to make the network (much) more robust.

We present both experiments on standard vision datasets to illustrate this principle, and a theoretical result proving that for certain natural classification tasks, training a two-layer neural network with ReLU activation using randomly initialized gradient descent indeed satisfies this principle.

Moreover, our result sheds light on why, when training over the original data set, a neural network can learn well-generalizing but non-robust features; and how can adversarial training further robustify these features.

BIO:

Yuanzhi Li is an assistant professor at CMU, Machine Learning Department. He did his Ph.D. at Princeton, under the advice of Sanjeev Arora (2014-2018), and a one-year postdoc at Stanford. His wife is Yandi Jin.

Hosted by Professor Quanquan Gu

Date/Time:
Date(s) - Mar 02, 2021
4:00 pm - 5:45 pm

Location:
Zoom Webinar
404 Westwood Plaza Los Angeles
Map Unavailable