Scalable emulation of protein equilibrium ensembles with generative deep learning

PDF

Following the sequence and structure revolutions, predicting functionally relevant protein structure changes at scale remains an outstanding challenge. We introduce BioEmu, a deep learning system that emulates protein equilibrium ensembles by generating thousands of statistically independent structures per hour on a single GPU. BioEmu integrates over 200 milliseconds of molecular dynamics (MD) simulations, static structures and experimental protein stabilities using novel training algorithms. It captures diverse functional motions—including cryptic pocket formation, local unfolding, and domain rearrangements—and predicts relative free energies with 1 kcal/mol accuracy compared to millisecond-scale MD and experimental data. BioEmu provides mechanistic insights by jointly modelling structural ensembles and thermodynamic properties. This approach amortizes the cost of MD and experimental data generation, demonstrating a scalable path toward understanding and designing protein function.

GitHubGitHub

Publication Downloads

BioEmu-1

June 24, 2025

Biomolecular Emulator (BioEmu-1 for short) is a deep learning model that can generate thousands of protein structures per hour on a single graphics processing unit. It provides orders of magnitude greater computational efficiency compared to classical MD simulations, thereby opening the door to insights that have, until now, been out of reach.

Scalable emulation of protein equilibrium ensembles with BioEmu

Following the sequence and structure revolutions, predicting functionally relevant protein structure changes at scale remains an outstanding challenge. Microsoft Research AI for Science introduces BioEmu, a deep learning system that emulates protein equilibrium ensembles by generating thousands of statistically independent structures per hour on a single GPU. BioEmu integrates over 200 milliseconds of molecular dynamics (MD) simulations, static structures and experimental protein stabilities using novel training algorithms. It captures diverse functional motions – including cryptic pocket formation, local unfolding, and domain rearrangements – and predicts relative free energies with 1 kcal/mol accuracy compared to millisecond-scale MD and experimental data. BioEmu provides mechanistic insights by jointly modelling structural ensembles and thermodynamic properties. This approach amortizes the cost of MD and experimental data generation, demonstrating a scalable path towards understanding and designing protein function.