Shuvom Sadhuka

I am a PhD student at MIT CSAIL. Broadly, I am interested in reliable ML, with specific interests in evaluation, uncertainty quantification, and applications to biomedical problems. To me, evaluation is a two-way street:

Develop and use new tools to evaluate human decision-makers and data. Some past and ongoing work includes evaluating privacy risks in “anonymous” genomic datasets (link) and building Bayesian models of clinical decision-making (link).
Develop new metrics and methods to analyze ML systems themselves. Given thorny issues in our data — noisy labels, sparse labels, and so on — it is unsurprising that evaluations of performance are often unreliable. On this front, I’ve investigated how to use unlabeled data to estimate performance of models (link) and repurposed sequential hypothesis testing ideas to verify agent trajectories (link).

This summer (2026), I am interning at Abridge, where I will work on evals/measurement research for clinical AI models with Alex Chouldechova and Michael Oberst. I previously interned at Genentech, where I worked on sequential hypothesis testing for evaluating AI agents.

I am grateful for the support of the Hertz Fellowship and the NSF GRFP. You can find my CV here. I enjoy writing, and you can find my blog posts here. You can contact me at ssadhuka [at] mit [dot] edu.

Recent Work

E-valuator: Reliable Agent Verifiers with Sequential Hypothesis Testing. Shuvom Sadhuka, Drew Prinster, Clara Fannjiang, Gabriele Scalia, Aviv Regev, Hanchen Wang (Preprint) (Talk).

A Bayesian Model for Multi-stage Censoring. Shuvom Sadhuka, Sophia Lin, Bonnie Berger, Emma Pierson (ML4H 2025)

Evaluating multiple models using labeled and unlabeled data. Divya Shanmugam*, Shuvom Sadhuka*, Manish Raghavan, John Guttag, Bonnie Berger, Emma Pierson (NeurIPS 2025)

Other things you may find useful or interesting

I wrote a blog post with tips for applying to graduate school fellowships.
I wrote an essay on genomic privacy that won an honorable mention in an MIT Essay Contest.
I made a small website to visualize embeddings of song lyrics for my own Spotify. Also includes a sparse autoencoder to interpret the embeddings.

latest posts

Mar 30, 2026	Revenge of the Worst Case
Jan 22, 2026	Reading List
Feb 11, 2025	Measuring Entropy