Speaker: Fred Sala
Affiliation: Stanford University
We discuss two challenging problems involving data in the modern machine learning pipeline: faithfully embedding structured data and labeling large-scale datasets with only weak sources of supervision. For the first problem, the quality of the representations achieved by embeddings is determined by how well the embedding geometry matches the structure of the data. Hyperbolic embeddings suit hierarchical data structures; we propose new approaches and tradeoffs for such embeddings. For less uniformly-structured data, we propose learning embeddings in a product manifold combining multiple non-Euclidean spaces. In the second problem, motivated by the fact that labeling large datasets is a major bottleneck in ML, we discuss a framework for automating the process of labeling data by building labeling functions. We show how to learn the accuracies and correlations of these functions and how to extend this framework to handle multitask and other forms of structured data.
Frederic Sala is a postdoctoral scholar in the Stanford computer science department, advised by Chris Ré. His research interests span machine learning, data analytics systems, and information and coding theory, and in particular problems related to the analysis and design of algorithms that must operate on diverse forms of data using new representations. He received the Ph.D. and M.S. degrees in electrical engineering from UCLA, where he received the Outstanding Ph.D. Dissertation in Signals & Systems Award from the UCLA EE department.
Hosted by Professor Guy Van den Broeck
Date(s) - Nov 12, 2019
4:15 pm - 5:45 pm
3400 Boelter Hall
420 Westwood Plaza, Los Angeles California 90095