Shuvom Sadhuka
📍 Boston/SF
I am a PhD student at MIT CSAIL, advised by Bonnie Berger. I also collaborate with Emma Pierson. Broadly, I am interested in reliable ML, with specific interests in evaluation, uncertainty quantification, and applications to biomedical problems. To me, evaluation is a two-way street:
-
Develop and use new tools to evaluate human decision-makers and data. Some past and ongoing work includes evaluating privacy risks in “anonymous” genomic datasets (link) and building Bayesian models of clinical decision-making (link).
-
Develop new metrics and methods to analyze ML systems themselves. Given thorny issues in our data — noisy labels, sparse labels, and so on — it is unsurprising that evaluations of performance are often unreliable. On this front, I’ve investigated how to use unlabeled data to estimate performance of models (link) and repurposed sequential hypothesis testing ideas to verify agent trajectories (link).
I previously interned at Genentech, where I built statistical methods for sequentially monitoring AI agents (paper).
I am grateful for the support of the Hertz Fellowship and the NSF GRFP. You can find my CV here. I enjoy writing, and you can find my blog posts here. You can contact me at ssadhuka [at] mit [dot] edu.
Recent Work
E-valuator: Reliable Agent Verifiers with Sequential Hypothesis Testing. Shuvom Sadhuka, Drew Prinster, Clara Fannjiang, Gabriele Scalia, Aviv Regev, Hanchen Wang (Preprint)
A Bayesian Model for Multi-stage Censoring. Shuvom Sadhuka, Sophia Lin, Bonnie Berger, Emma Pierson (ML4H 2025)
Evaluating multiple models using labeled and unlabeled data. Divya Shanmugam*, Shuvom Sadhuka*, Manish Raghavan, John Guttag, Bonnie Berger, Emma Pierson (NeurIPS 2025)
Other things you may find useful or interesting
- I wrote a blog post with tips for applying to graduate school fellowships.
- I wrote an essay on genomic privacy that won an honorable mention in an MIT Essay Contest.
- I made a small website to visualize embeddings of song lyrics for my own Spotify. Also includes a sparse autoencoder to interpret the embeddings.
latest posts
| Feb 11, 2025 | Reading List |
|---|---|
| Feb 11, 2025 | Measuring Entropy |
| Oct 21, 2024 | Fellowship Applications |