Sunayana Sitaram at Microsoft Research

Welcome!

I work at the intersection of multilingual and multi-cultural NLP, evaluation, and responsible AI, with a focus on ensuring that language technologies work equitably across diverse languages, cultures, and communities. Recently, my work has involved participatory approaches to evaluation, data collection and policy to ensure that our models reflect the preferences of users from diverse regions and cultures. I have also been focusing on multilingual, multi-cultural synthetic data for post-training language models, and we recently released a large dataset Updesh with ~8M data points covering 13 Indian languages.

At Microsoft Research India, I collaborate with and lead interdisciplinary teams that span NLP, machine learning, linguistics, HCI, and social science. I also actively contribute to the research community through conference organization and reviewing. In 2025, I am serving as an Area Chair for ACL Rolling Review (ARR), an Area Chair for COLM, and Tutorial Chair for IndoML.

I collaborate actively with product groups within Microsoft, and am building out a team of Applied Scientists in Bangalore (Apply here: https://jobs.careers.microsoft.com/global/en/job/1828301/Senior-Applied-Scientist). Recently, my research has been shipped in the M365 Copilots, our GenAI-based suite of productivity tools, now supporting 52 languages. I have also contributed to Microsoft’s policy efforts on multilingual product strategy, focusing on inclusivity and language diversity.

🧪 Multilingual Evaluation

I led the creation of MEGA (Multilingual Evaluation of Generative AI, 2023), the first large-scale benchmark to evaluate generative LLMs on 16 NLP datasets across 70 typologically diverse languages. MEGA revealed significant disparities between English and low-resource languages and proposed a modular framework for multilingual evaluation. Building on MEGA, we created MEGAVERSE, an even broader evaluation effort covering 83 languages and 22 datasets, including multimodal tasks. MEGAVERSE benchmarked a wide range of models and performed detailed analysis of language coverage and data contamination. Prior to this, I also spearheaded the creation of the first benchmark for code-mixing, GLUECoS (2020).

👥 Participatory Evaluation at Scale

I believe evaluation should reflect the voices of real users. With this in mind, we introduced Pariksha, a scalable, transparent, and community-driven evaluation exercise for Indian LLMs in collaboration with Karya. Pariksha brings together 90,000 human and 30,000 automated evaluations across 10 Indian languages and 29 models, and is now perhaps the largest multilingual human evaluation of LLMs ever conducted. This year, we are expanding on this effort with the Samiksha project, which aims to create a real-world evaluation suite for Indian languages, contexts and use cases.

🤝 Participatory Responsible AI

As part of my work on Responsible AI, I co-led a participatory effort to address misgendering in LLM applications. We co-designed culturally grounded, multilingual guardrails with native speakers across 42 languages, and showed how these guardrails can reduce harms like misgendering without degrading performance. This work was recognized with an internal Open Data Award at Microsoft and serves as a blueprint for mitigating culturally sensitive harms in AI systems.

📚 Selected Publications

For full list of publications, please take a look at my Google Scholar page.

🎤 Recent Talks

Invited talk, Advanced Summer School on NLP (IASNLP-2025)
Keynote, I Can’t Believe It’s Not Better (ICBINB 2025) @ ICLR 2025
Keynote, Computational Approaches to Linguistic Code-Switching @ NAACL 2025
Invited talk, International Network of Safety Institutes, Feb 2025
Invited talk, Language Technologies for All, UNESCO Paris HQ, Feb 2025

🧑‍🤝‍🧑Team

I have been fortunate to work with many wonderful interns and Research Fellows who inspire me and keep me on my toes! In reverse chronological order:

Prashant Kodali (current PostDoc), Sanchit Ahuja (RF -> Northeastern PhD), Varun Gumma (RF), Divyanshu Aggrawal (RF), Ishaan Watts (intern -> CMU MS), Ashutosh Sathe (intern -> Google Deepmind), Prachi Jain (PostDoc->Senior Applied Scientist, Microsoft), Kabir Ahuja (PhD at University of Washington), Krithika Ramesh (PhD at Johns Hopkins University), Shrey Pandit (MS at UT Austin), Abhinav Rao (RF @ Microsoft Turing -> MS at CMU), Aniket Vashishtha (RF @ MSRI), Shaily Bhat (RF @ Google Research -> PhD at CMU), Simran Khanuja (RF @ Google Research -> PhD @ Carnegie Mellon University), Anirudh Srinivasan (MS @ UT Austin), Sanket Shah (Salesken.ai), Brij Mohan Lal Srivastava (PhD @ INRIA – > Nijta (startup)), Sunit Sivasankaran (PhD @ INRIA -> Microsoft), Sai Krishna Rallabandi (PhD @ CMU -> Fidelity).

🕰️ Prior to coming to MSR India

I graduated with a PhD in 2015 at the Language Technologies Institute, Carnegie Mellon University. I worked on Text-to-Speech systems with my advisor Alan W Black, and my thesis was on pronunciation modeling for low-resource languages. From 2010-2012, I was a Masters student at CMU with Jack Mostow, and I worked on children’s oral reading prosody. I also interned with Microsoft Research India in Summer 2012 and we built a low-vocabulary ASR system for farmers in rural central India.

Sunayana Sitaram

About