CS 201 | UCLA MARS Lab Lightning Talks, UCLA Comp Sci Dept

UCLA MARS Lab Lightning Talks

This talk will present several different projects at the UCLA Misinformation, AI & Responsible Society (MARS) Lab. The lab focuses on the social aspects of an increasingly AI-integrated world, examining the ways in which AI can moderate, curate and create content for users. The lab will present the following projects for this talk:

  • “AI Debate Aids Assessment of Controversial Claims” by Salaman Rahman
    • As AI grows more powerful, it will increasingly shape how we understand the world. But with this influence comes the risk of amplifying misinformation and deepening social divides—especially on consequential topics where factual accuracy directly impacts well-being. Scalable Oversight aims to ensure AI systems remain truthful even when their capabilities exceed those of their evaluators. Yet when humans serve as evaluators, their own beliefs and biases can impair judgment. We study whether AI debate can guide biased judges toward the truth by having two AI systems debate opposing sides of controversial factuality claims on COVID-19 and climate change where people hold strong prior beliefs. We conduct two studies. Study I recruits human judges with either mainstream or skeptical beliefs who evaluate claims through two protocols: debate (interaction with two AI advisors arguing opposing sides) or consultancy (interaction with a single AI advisor). Study II uses AI judges with and without human-like personas to evaluate the same protocols. In Study I, debate consistently improves human judgment accuracy and confidence calibration, outperforming consultancy by 4-10% across COVID-19 and climate change claims. The improvement is most significant for judges with mainstream beliefs (up to +15.2% accuracy on COVID-19 claims), though debate also helps skeptical judges who initially misjudge claims move toward accurate views (+4.7% accuracy). In Study II, AI judges with human-like personas achieve even higher accuracy (78.5%) than human judges (70.1%) and default AI judges without personas (69.8%), suggesting their potential for supervising frontier AI models. These findings highlight AI debate as a promising path toward scalable, bias-resilient oversight in contested domains.
  • “Translation as a Scalable Proxy for Multilingual Evaluation” by Sheriff Isaaka
    • The rapid proliferation of LLMs has created a critical evaluation paradox: while LLMs claim multilingual proficiency, comprehensive non-machine-translated benchmarks exist for fewer than 30 languages, leaving >98% of the world’s 7,000 languages in an empirical void. Traditional benchmark construction faces scaling challenges such as cost, scarcity of domain experts, and data contamination. We evaluate the validity of a simpler alternative: can translation quality alone indicate a model’s broader multilingual capabilities? Through systematic evaluation of 14 models (1B-72B parameters) across 9 diverse benchmarks and 7 translation metrics, we find that translation performance is a good indicator of downstream task success (e.g., Phi-4, median Pearson r: MetricX = 0.89, xCOMET = 0.91, SSA-COMET = 0.87). These results suggest that the representational abilities supporting faithful translation overlap with those required for multilingual understanding. Translation quality, thus emerges as a strong, inexpensive first-pass proxy of multilingual performance, enabling a translation-first screening with targeted follow-up for specific tasks.
  • “ModelCitizens: Representing Community Voices in Online Safety” by Ashima Suvarna
    • Automatic toxic language detection is critical for creating safe, inclusive online spaces. However, it is a highly subjective task, with perceptions of toxic language shaped by community norms and lived experience. Existing toxicity detection models are typically trained on annotations that collapse diverse annotator perspectives into a single ground truth, erasing important context-specific notions of toxicity such as reclaimed language. To address this, we present MODELCITIZENS, a new dataset of 6.8K posts and 40K annotations that embraces diverse identity perspectives and conversational context. Our research shows that leading moderation APIs fail to capture the nuances of reclaimed language, often performing worse when conversational context is added.
  • “Multi-Objective Alignment of Language Models for Personalized Psychotherapy” by Mehrab Beikzadeh
    • Large language models (LLMs) are increasingly used for mental health support, but existing alignment methods typically optimize therapeutic objectives in isolation, leading to trade-offs between empathy, safety, and patient autonomy. We introduce a multi-objective direct preference optimization (MODPO) framework that jointly aligns LLMs across multiple clinically grounded therapeutic criteria using patient-derived preference data. Through large-scale persona-based evaluation and blinded clinician validation, we show that multi-objective alignment produces responses that are consistently preferred over single-objective and post-hoc merging baselines while maintaining non-negotiable safety standards.
  • “MOSAIC: Modeling Social AI for Content Dissemination and Regulation in Multi-Agent Simulations” by Genglin Liu
    • We present a novel, open-source social network simulation framework, MOSAIC, where generative language agents predict user behaviors such as liking, sharing, and flagging content. This simulation combines LLM agents with a directed social graph to analyze emergent deception behaviors and gain a better understanding of how users determine the veracity of online social content. By constructing user representations from diverse fine-grained personas, our system enables multi-agent simulations that model content dissemination and engagement dynamics at scale. Within this framework, we evaluate three different content moderation strategies with simulated misinformation dissemination, and we find that they not only mitigate the spread of non-factual content but also increase user engagement. In addition, we analyze the trajectories of popular content in our simulations, and explore whether simulation agents’ articulated reasoning for their social interactions truly aligns with their collective engagement patterns. We open-source our simulation software to encourage further research within AI and social sciences.

 

Bios:
Salman is a Ph.D. student in Computer Science at UCLA, and his research focuses on the reasoning, generalization, and scalable oversight of large language models.

Sheriff Issaka is a second-year Ph.D. student at UCLA, where he conducts research at the intersection of natural language processing, multilingual AI, and responsible technology. His work spans building equitable language technologies for underrepresented languages, developing models to detect misinformation, and exploring practical AI systems that prioritize real-world impact.

Ashima Suvarna is a PhD student in Computer Science at UCLA. Her research focuses on data curation and post-training paradigms for improving the reasoning capabilities, alignment and safety of LLMs. She has been awarded the Mitacs Globalink Research Fellowship and Google Deepmind Scholarship.

Mehrab Beikzadeh is a Ph.D. student in Computer Science at UCLA and his research focuses on large language models, preference learning, and therapeutic AI.

Genglin is a second-year PhD student at UCLA and his research focuses on simulation agents as well as agentic memory systems.

Date/Time:
Date(s) - Feb 19, 2026
4:00 pm - 5:45 pm

Location:
3400 Boelter Hall
420 Westwood Plaza Los Angeles California 90095