CS 201: Accelerated Machine Learning for Computational Proteomics, JOHN HALLORAN, UC Davis

Speaker: John Halloran
Affiliation: UC Davis

ABSTRACT:

In the past few decades, mass spectrometry-based proteomics has dramatically improved our fundamental knowledge of biology, leading to advancements in the understanding of diseases and methods for clinical diagnoses. However, the complexity and sheer volume of typical proteomics datasets make both fast and accurate analysis difficult to accomplish simultaneously; while machine learning methods have proven themselves capable of incredibly accurate proteomic analysis, such methods deter use by requiring extremely long runtimes in practice. In this talk, we will discuss two core problems in computational proteomics and how to accelerate the training of their highly accurate, but slow, machine learning solutions. For the first problem, wherein we seek to infer the protein subsequences (called peptides) present in a biological sample, we will improve the training of graphical models by deriving emission functions which render conditional-maximum likelihood learning concave. For the second problem, wherein we seek to further improve peptide identification accuracy by classifying correct versus incorrect identifications, we will speed up support vector machine learning using a combination of novel GPU optimizations, improved convex optimization, and massive CPU parallelization. On massive datasets nearly as large as a quarter-billion data instances, these speedups reduce analysis times from over half a week to less than a single day.

BIO:

John Halloran is a Postdoc at UC Davis. He received his PhD from the University of Washington in 2016.  John is interested in developing faster and more efficient machine learning solutions for massive-scale problems, particularly those encountered in computational biology. His work regularly focuses on both optimizations for high-performance compute architectures (particularly for GPUs) and convergence analysis for generative/discriminative training of graphical models. He is a recipient of the UC Davis Award for Excellence in Postdoctoral Research and a UW Genome Training Grant.

Hosted by Professor Baharan Mirzasoleiman

Date/Time:
Date(s) - Feb 09, 2021
4:00 pm - 5:45 pm

Location:
Zoom Webinar
404 Westwood Plaza Los Angeles
Map Unavailable