I build and lead machine learning efforts for high-noise biological and clinical data, from antibody optimization to national-scale healthcare foundation models.
I am a Machine Learning Research Scientist in the Machine Learning Group at Lawrence Livermore National Laboratory. My work spans protein design, large-scale clinical data from healthcare systems including the VA and Kaiser Permanente, and machine learning for human microbiome data.
Selected highlights
- Train billion-parameter clinical foundation models from scratch on structured event sequences at hundred-GPU scale.
- Led a team of researchers developing microbiome disease-prediction models across tens of thousands of metagenomic profiles.
- Research spans protein design, EHR-based modeling, microbiome analysis, climate science, and voice biometrics.
- Selected publications include work in Nature, Science Advances, The Lancet Digital Health, and JMLR.
Current research areas
Protein design
I work on antibody design, including library design for developability assays such as nicking mutagenesis and PacBio, machine learning models for antibody developability prediction, and computational design guided by protein folding tools, as well as multi-substrate enzyme optimization and de novo design of adhesive proteins for biomaterial engineering.
Clinical machine learning
I train billion-parameter foundation models from scratch on structured sequences of clinical events, without clinical notes, at hundred-GPU scale, and fine-tune them for biosurveillance applications including syndromic surveillance and infectious disease detection. I also work on EHR-based diagnosis prediction and drug repurposing.
Metagenomics and microbiome
I led a team of researchers developing machine learning models for disease prediction from human microbiome data. This includes hierarchical sparse Bayesian multitask models with scalable variational inference for pooled microbiome studies across heterogeneous cohorts and body sites, and pipelines that integrate host and protocol metadata with microbiome profiles across tens of thousands of samples to predict disease states ranging from gastrointestinal infections to cancer and neurological disorders.
Applied research at scale
I enjoy working where method development, domain expertise, and engineering constraints meet. In practice, that means leading focused research efforts, building scalable and reproducible pipelines, and collaborating with biologists, clinicians, and applied scientists to turn research questions into usable systems.
Prior work
Earlier work includes deep learning for climate science, including adversarial domain adaptation for climate model bias correction and conditional generative models for seasonal forecasting, and industry work on voice biometrics, where I developed speaker recognition systems using spectral, temporal, and convolutional neural network models that were deployed in production across numerous companies in Brazil.
How I approach machine learning
I approach machine learning as a generalist, drawing on a broad understanding of the field and its mathematical foundations to match the right techniques to each problem and to design novel, more efficient and scalable solutions. My particular interests include probabilistic modeling, uncertainty quantification, non-convex and discrete optimization, and efficient pretraining of domain-specific foundation models.
Personal interests
I've been playing tennis for several years, which mostly means I now know enough to realize how much better I still want to get.
I also like baking enough to keep a YouTube channel in Portuguese called Padeiro de Apartamento.
And I am an unapologetic specialty coffee enthusiast; one of my long-running side quests is to try coffee from every coffee-producing country in the world.