“Data at the AI Frontier: Weak-to-Strong Generalization and Beyond”
As AI has become more capable, progress increasingly depends on more than model scaling—and particularly on finding and exploiting the right data. In this talk I will offer a data-centric perspective on frontier AI, centered on the idea of frontier data: data that lies near the capability boundary, containing signal that weaker processes can partially access but stronger models can exploit more fully. I will begin with our work on weak-to-strong generalization, which identifies a special kind of “layered” structure in data as a key ingredient for transfer from weak supervision to strong learners. I will then discuss how this perspective motivates new approaches to scalable supervision, including programmatic distillation methods that compress expensive model-based annotation into reusable programs. Finally, I will connect these ideas to recent work on “hybrid” foundation models. I will argue that increasingly heterogeneous data may call for increasingly heterogeneous architectures. These directions suggest a broader shift from model-centric scaling toward co-design of data, supervision, and architectures.
Frederic Sala is an Assistant Professor in the Computer Sciences Department at the University of Wisconsin-Madison and the Chief Scientist at Snorkel AI. His research studies the fundamentals of data-driven systems and machine learning, with a focus on data-centric AI, foundation models, and automated machine learning. Previously, he was a postdoctoral researcher at Stanford. He received his Ph.D. in electrical engineering from UCLA. He and his group received the 2024 DARPA Young Faculty Award, the UW-Madison SACM Students’ Choice Professor of the Year Award, a best student paper runner-up award at UAI ’22.
Date/Time:
Date(s) - Mar 31, 2026
4:00 pm - 5:45 pm
Location:
3400 Boelter Hall
420 Westwood Plaza Los Angeles California 90095