CS 201: Near-optimal Algorithms for Imitation Learning, JIANTAO JIAO, UC Berkeley

Speaker: Jiantao Jiao
Affiliation: UC Berkeley

ABSTRACT:

We study the fundamental limits and efficient algorithms for imitation learning in Markov decision processes (MDP). It is known that Behavior Cloning (BC) achieves suboptimality that grows quadratically with the horizon in imitation learning, which is termed error compounding in the literature. However, the only existing lower bound for BC is about optimization error, and it is not clear whether error compounding is a fundamental phenomenon related to sample inefficiency. We show that when the MDP transition function is unknown, all algorithms have to suffer a suboptimality that grows quadratically with the horizon even if the optimization error is zero and the algorithm can interactively query the expert such as in the setting of DAGGER. This negative result suggests that overcoming error compounding requires imposing more assumptions in imitation learning. We show two approaches to mitigate the quadratic error compounding: active query with recoverable mistakes and known transition. First, we establish that if the expert knows how to recover from a temporary mistake and continue on the optimal path, then a carefully designed no-regret learning algorithm that actively queries the expert can provably beat the quadratic error compounding and achieve linear suboptimality growth in the horizon. Second, we transform the intuition that we should always utilize transition function knowledge to only visit states the expert has visited into a precise optimization problem named Mimic-MD, and introduce a novel technique named artificial trajectory synthesis to provably break the quadratic dependence on the horizon and improve the exponent to 3/2. Last, we show that under the additional assumption that the expert is optimal for the true reward function in the known transition setting, one can in fact further improve the exponent from 3/2 to 1 via two new algorithms in some special cases of MDP. One of the two new algorithms is intimately connected to inverse reinforcement learning and shows the provable benefit of running inverse reinforcement learning for imitation learning. We implement multiple instantiations of our approach on several continuous control tasks and find that we are able to significantly improve policy performance across a variety of dataset sizes.

BIO:

Jiantao Jiao is an Assistant Professor in the Department of Electrical Engineering and Computer Sciences and Department of Statistics at the University of California, Berkeley. He co-directs the Center for the Theoretical Foundations of Learning, Inference, Information, Intelligence, Mathematics, and Microeconomics at Berkeley (CLIMB) and is a member of the Berkeley Artificial Intelligence Research (BAIR) Lab and the Berkeley Laboratory for Information and System Sciences (BLISS). He received his Ph.D. from Stanford University in 2018. He is a recipient of the Presidential Award of Tsinghua University and the Stanford Graduate Fellowship. He was a semi-plenary speaker at ISIT 2015 and a co-recipient of the MobiHoc 2019 best paper award. His research interests are in statistical machine learning, high-dimensional and nonparametric statistics, mathematical programming, information theory, and their applications in robotics, natural language processing, computer vision, distributed systems, and security.

Hosted by Professor Quanquan Gu

Date/Time:
Date(s) - Apr 21, 2022
4:00 pm - 5:45 pm

Location:
Zoom Webinar
404 Westwood Plaza Los Angeles
Map Unavailable