Speaker: Lihong Li
In many real-world applications of reinforcement learning (RL) such as healthcare, dialogue systems and robotics, running a new policy on humans or robots can be costly or risky. This gives rise to the critical need for off-policy estimation, that is, estimate the average reward of a target policy given data that was previously collected by another policy. This talk will describe some recent advances for long- or even infinite-horizon off-policy estimation, where standard methods suffer a variance that grows exponentially with the horizon (“curse of horizon”).
Lihong Li is a Senior Principal Scientist at Amazon. He obtained a PhD degree in Computer Science from Rutgers University. After that, he has held research positions in Yahoo!, Microsoft and Google. His main research interests are in reinforcement learning, including contextual bandits, and other related problems in AI. His work has found applications in recommendation, advertising, Web search and conversation systems, and has won best paper awards at ICML, AISTATS and WSDM. He regularly serves as area chair or senior program committee member at major AI/ML conferences such as AAAI, AISTATS, ICLR, ICML, IJCAI and NeurIPS. Personal homepage: http://lihongli.github.io.
Hosted by Professor Quanquan Gu
Via Zoom Webinar
Date(s) - Oct 27, 2020
4:00 pm - 5:45 pm
404 Westwood Plaza Los Angeles