Speaker: Sung Jin Kim
ABSTRACT: In a multisystem data analytics solution, a key challenge is to find optimal execution plans for queries that access data on heterogeneous remote data stores. The challenge comes primarily from the lack of statistics for subquery results returned from remote systems. In this talk, we first present a framework implemented in Teradata Database for dynamically collecting statistics on subquery results fetched from a remote system and feeding these statistics back to the query optimizer during query execution. The query optimizer uses these statistics to adjust execution plans of queries for further and more informed optimizations. Second, we also discuss how efficiently key statistics can be collected dynamically in this framework. Key statistics include Row Count (RC), Number of Nulls (NoN), Number of Unique Values (NUV), and High Mode Frequency (HMF). After the overview, we will focus on how to efficiently collect NUV and HMF with high accuracy on top of several known techniques. BIO: Sung Jin Kim is a senior software engineer in the Teradata DBS Query Processing (Optimizer) department. He joined Teradata Corporation in 2008 and has worked on numerous research and development projects including fine-grained cost models for major database operations, detection and extrapolation of stale statistics, hybrid columnar/row database, and adaptive query optimization. The project outcomes were delivered in Teradata Database products and multiple US patents have been filed. He received a PhD degree in computer science from the Soongsil University, Korea, in 2004. Before joining Teradata, he was a post-doctoral research fellow at the Seoul National University, Korea and the University of California, Los Angeles. His current research interest is query optimization in a heterogeneous database system.
Hosted by Professor Carlo Zaniolo
REFRESHMENTS at 3:45 pm, SPEAKER at 4:15 pm
Date(s) - Apr 27, 2017
4:15 pm - 5:45 pm