Speaker: Philip Long
The singular values of the linear layers in a neural network capture the extent to which the layers blow up or knock down signals, so they are key to understanding exploding and vanishing gradients. This talk is about a characterization of the singular values of the linear transformation associated with a standard 2D multi-channel convolutional layer. This characterization enables the efficient computation of these singular values, and also leads to an algorithm for projecting a convolutional layer onto an operator-norm ball. This is an effective regularizer; for example, it improves the test error of a deep residual network using batch normalization on CIFAR-10 from 6.2% to 5.3%. It also can be used to prove bounds that capture the benefit to generalization from the implicit weight-tying of convolutional layers.
This is joint work with Hanie Sedghi and Vineet Gupta.
Hosted by Professor Quanquan Gu
Date(s) - Nov 21, 2019
4:15 pm - 5:45 pm
3400 Boelter Hall
420 Westwood Plaza, Los Angeles California 90095