CS 201: The Emergence Theory of Representation Learning, STEFANO SOATTO, UCLA | Computer Science Department

Speaker: Stefano Soatto
Affiliation: UCLA | Computer Science

ABSTRACT:

Representations are functions of the data that are useful for a task while being invariant to nuisance variability affecting the data. Optimal representation are as informative as possible (sufficient) while being as insensitive as possible (invariant) to nuisances. The notion of sufficient invariants is closely related to a variational principle known as the Information Bottleneck (IB). Unfortunately, the IB cannot be computed, let alone optimized for inductive inference, since we do not have access to the test data. In other words, the IB is not useful for learning.

I will instead describe a seemingly unrelated variational principle, called Information Lagrangian (IL), that arises from maximizing generalization for a parametric model trained on a finite dataset via the PAC-Bayes bound. Unfortunately, the IL says nothing about invariance and sufficiency of the resulting representation.

The Emergence Bound connects the IL – which is computable from finite data but does not directly relates with desirable properties of the representation – with the IB – which formalizes the notion of optimal representation but is not computable.

After introducing the Emergence Bound, I will show how the central knot of the theory is a notion of “Information in the Weights of the trained model”. Since a trained deep neural network is a deterministic function, classical (Shannon) information measures are degenerate. I will show how to construct a measure of “accessible information” for a deterministic classifier, such as a trained neural network.

The Information in the Weights makes it possible to define a topology in the space of learning tasks, so we can compute the distance between learning tasks, predict how long it will take to fine-tune a model pre-trained on a different task, and how well it will work, without actually fine-tuning it.

I will conclude highlighting open problems and possible extensions of this framework, as well as some intriguing behavior of the learning dynamics that suggests that the transient, not the asymptotics, are the place to look at to understand deep learning.

BIO:

Stefano Soatto is Professor of Computer Science and Professor of Electrical Engineering at UCLA, where he is the founding director of the UCLA Vision Lab. He is also Vice President at AWS where he leads AI Labs, that conduct research for AI applications in vision, speech, language, and vertical services.

 Via Zoom Webinar

Date/Time:
Date(s) - Oct 21, 2021
4:00 pm - 5:45 pm

Location:
Zoom Webinar
404 Westwood Plaza Los Angeles
Map Unavailable