Speaker: Hung Ngo
Affiliation: Relational AI Inc.
ABSTRACT: Enterprise, data are relational, and are stored in a myriad of different relational database managment systems. This makes sense: the relational data model has proved to be useful and ubiquitous in the past 50 years. The schema and data constraints stored in relational databases embody decades of experience in relational data modeling in the corresponding application domain. Yet, one of the very first things a statistical data scientist does is to write queries extracting the data out of the RDMBS, and feeds the resulting flattened out (denormalized) dataset to a statistics / machine learning tool. This process is highly wasteful, not only due to the time it takes to evaluate the data extraction query, to import/export and transform data formats between tools, but also due to the fact that denormalization throws away “relational structures” prevalent in the data. This talk introduces some of our recent and on-going research showing that, by exploiting relational structures underlying the data such as the join query topology and functional dependencies, we can train some class of machine learning models *without* even computing the costly data extracting query. This is accomplished by pushing the optimization pass the join, reparameterizing the model based on functional dependencies, and translate key steps of the optimization algorithm into the problem of computing a large number of aggregates from the underlying data. Computing aggregates is something relational database technology knows how to do well, in addition to recent results on worst-case optimal join algorithms and new query plans based on tree decompositions and new information-theoretic cost estimation. We shall briefly touch upon these related topics also. The talk is based on joint work with Mahmoud Abo Khamis, Long Nguyen, Dan Olteanu, and Maximilian Schleich. BIO: Hung Q. Ngo was a professor of Computer Science and Enginneering at the State University of w York (SUNY) at Buffalo from 2001 to 2015. From 2015, he started working for a couple of tartups building datalog and data analytic engines: LogicBlox and RelationalAI. His current research and development interests include the design, analysis, and implementation of in-database computation algorithms. These algorithms cover typicallogic and statistical query optimization. He received an NSF CAREER award, best paper awards at COCOON 2008, PODS 2012 and PODS 2016, and ACM SIGMOD Research Highlight Award. His works on in-database computation algorithms were invited for a keynote at Highlights in Logic, Games, and Automata 2017, “Gems of PODS” talk at PODS 2018, and “Theory Fest” talks at STOC 2017 and STOC 2018.
Hosted by Prof Carlo Zaniolo
Date(s) - Nov 06, 2018
4:15 pm - 5:45 pm
Mong Auditorium – Engineering VI – First Floor
404 Westwood Blvd, Los Angeles California 90095