CS 201 | Breaking Language Barriers with Massive Multilingual Machine Translation, LEI LI, UC Santa Barbara

Speaker: Lei Li
Affiliation: UC Santa Barbara

ABSTRACT:

Developing high-quality machine translation (MT) systems is crucial for cross-language communication. However, building massively multilingual MT systems is incredibly challenging, particularly for low-resource languages used by underserved communities. In this talk, I will introduce our group’s research, which addresses critical challenges in MT through three fundamental aspects: a) Learning optimal vocabulary, b) Learning high-quality unified models for massive languages, and c) Learning to align with human experts in evaluating translation quality. We have developed the LegoMT model that currently supports 440 languages — the most extensive language coverage. Our work has been deployed into VolcTrans and Huggingface. TikTok and Lark are using VolcTrans to serve their one billion users, significantly enhancing cross-cultural communication and entertainment. Finally, I will share my vision for advancing MT for 1000 languages.

BIO:

Lei Li is an assistant professor in Computer Science Department at University of California Santa Barbara. He received Ph.D. from Carnegie Mellon University School of Computer Science. He is a recipient of ACL 2021 Best Paper Award, CCF Young Elite Award in 2019, CCF distinguished speaker in 2017, Wu Wen-tsün AI prize in 2017, and 2012 ACM SIGKDD dissertation award (runner-up). Previously, he founded ByteDance AI Lab in 2016 and led the research in NLP, ML, Robotics, and Drug Discovery. He launched ByteDance’s machine translation system VolcTrans and AI writing system Xiaomingbot.

Hosted by Professor Kai-Wei Chang

Date/Time:
Date(s) - May 11, 2023
4:15 pm - 5:45 pm

Location:
3400 Boelter Hall
420 Westwood Plaza Los Angeles California 90095