CS 201 | The Surprising Efficacy of “Ungrounded” Models for Image and Video Understanding, and Generation, TREVOR DARRELL, UC Berkeley

Speaker: Trevor Darrell
Affiliation: UC Berkeley

ABSTRACT:

Recently released open-source text LLMs have provided significant leverage towards multimodal perception, via lightweight fusion with learned visual representations, or even–somewhat paradoxically—as a unimodal source of knowledge in another domain. As time permits, I’ll cover our recent work exploring this premise, including recent advances toward a modern form of visual routines a.k.a. visual programming, methods for recursive explainable visual question answering, an approach to multimodal gesture animation, and image and video generation with LLM-constrained diffusion models.

 BIO:

Professor Darrell is on the faculty of the CS Division at UC Berkeley. His group develops algorithms to enable visual recognition across a variety of platforms and applications. His interests include computer vision, machine learning, computer graphics, and perception-based human computer interfaces. Prof. Darrell was on the faculty of the MIT EECS department from 1999-2008, where he directed the Vision Interface Group. He was a member of the research staff at Interval Research Corporation from 1996-1999 and received the S.M. and Ph.D. degrees from MIT in 1992 and 1996, respectively. He obtained the B.S.E. degree from the University of Pennsylvania in 1988, having started his career in computer vision as an undergraduate researcher in Ruzena Bajcsy’s GRASP lab.

Hosted by Professor Bolei Zhou

Date/Time:
Date(s) - Mar 12, 2024
4:15 pm - 5:45 pm

Location:
3400 Boelter Hall
420 Westwood Plaza Los Angeles California 90095