CS 201: Robust Reinforcement Learning with Langevin Dynamics, VOLKAN CEVHER, EPFL

Speaker: Volkan Cevher
Affiliation: EPFL

ABSTRACT:

In this talk, I will talk about principled ways of solving a classical reinforcement learning (RL) problem and introduce its robust variant.

In particular, we rethink the exploration-exploitation trade-off in RL as an instance of a distribution sampling problem in infinite dimensions. Using the powerful Stochastic Gradient Langevin Dynamics (SGLD), we propose a new RL algorithm, which results in a sampling variant of the Twin Delayed Deep Deterministic Policy Gradient (TD3) method. Our algorithm consistently outperforms existing exploration strategies for TD3 based on heuristic noise injection strategies in several MuJoCo environments.

The sampling perspective enables us to introduce an action-robust variant of RL objective, which is as a particular case of a zero-sum two-player Markov game. In this setting, at each step of the game, both players simultaneously choose an action. The reward each player gets after one step depends on the state and the convex combination of the actions of both players. Based on our earlier work (SGLD for min-max/GAN problem), we propose a new robust RL algorithm with convergence guarantee and provide numerical evidence of the new algorithm. Finally, I will also discuss future directions on the application of the framework to self-play in games.

BIO:

Volkan Cevher received the B.Sc. (valedictorian) in electrical engineering from Bilkent University in Ankara, Turkey, in 1999 and the Ph.D. in electrical and computer engineering from the Georgia Institute of Technology in Atlanta, GA in 2005. He was a Research Scientist with the University of Maryland, College Park from 2006-2007 and also with Rice University in Houston, TX, from 2008-2009. Currently, he is an Associate Professor at the Swiss Federal Institute of Technology Lausanne and a Faculty Fellow in the Electrical and Computer Engineering Department at Rice University. His research interests include signal processing theory, machine learning, convex optimization, and information theory. Dr. Cevher is an ELLIS fellow and was the recipient of the Google Faculty Research Award on Machine Learning in 2018, IEEE Signal Processing Society Best Paper Award in 2016, a Best Paper Award at CAMSAP in 2015, a Best Paper Award at SPARS in 2009, and an ERC CG in 2016 as well as an ERC StG in 2011.

Hosted by Professor Quanquan Gu

Date/Time:
Date(s) - Feb 04, 2020
4:15 pm - 5:45 pm

Location:
3400 Boelter Hall
420 Westwood Plaza Los Angeles California 90095