CS 201 | SGD: The Role of Implicit Regularization, Batch-size and Multiple-epochs, KARTHIK SRIHARAN, Cornell University

Speaker: Karthik Sriharan
Affiliation: Cornell University

ABSTRACT:

Multi-epoch, small-batch, Stochastic Gradient Descent (SGD) algorithm has been the method of choice for learning with large over-parameterized models. However our theoretical understanding of the unreasonable effectiveness of SGD over a wide range of convex and non-convex problems in practice is still lacking. There has been a slew of work aiming to provide theoretical insights into why SGD works. In this talk we will contemplate about these proposed theories that aim to explain the success of SGD. We will specifically look at the limitations of the idea of implicit regularization to explain the success of SGD in general. In fact, we will consider perhaps the most widely studied setting for SGD of Stochastic Convex Optimization (SCO) to understand the limitations of implicit regularization as an explanation for the success of SGD. We will also explore the role of batch-size and multiple epochs in understanding the success of SGD. Concretely I will discuss the following results:

1. We will demonstrate an SCO problem with strict separation between SGD and Regularized Empirical Risk Minimizers (RERM) (for any choice of regularization) thus demonstrating that implicit regularization widely cannot explain the success of SGD.

2. We will show a sample complexity separation between SGD and Full Batch Gradient Descent on training loss (irrespective of step-size choice and early stopping)

3. We will propose a simple validated multi-epoch SGD algorithm which is guaranteed to work at least as well as single pass SGD but will show that there are problems for which this algorithm can far outperform single pass SGD.

I will discuss further the extension of some of these results to simple deep learning models.

BIO:

Karthik Sridharan is currently an Associate Professor in the Computer Science Department at Cornell University. His research interests span the theory of machine learning, stochastic optimization, online learning, reinforcement learning and learning aspects of game theory. He is the recipient of Sloan Fellowship and NSF Career awards. His work has received two best paper awards and two student best paper awards at COLT.

Hosted by Professor Quanquan Gu

Location: Via Zoom Webinar

Date/Time:
Date(s) - Nov 09, 2021
4:00 pm - 5:45 pm

Location:
Zoom Webinar
404 Westwood Plaza Los Angeles
Map Unavailable