STARLING AI generates disordered protein ensembles in seconds
Summary
STARLING is a new AI tool that quickly generates accurate structural ensembles for disordered proteins, enabling rapid analysis and design.

STARLING generates protein ensembles in seconds
Researchers have developed a new AI framework called STARLING that can rapidly generate accurate structural ensembles for intrinsically disordered proteins (IDRs). This tool dramatically accelerates computational analysis and design, reducing tasks that took weeks to seconds.
IDRs are crucial proteins that lack a fixed, three-dimensional shape, existing instead in a fluid ensemble of many structures. This disorder is not a bug but a feature, enabling critical roles in nearly every cellular process in complex life.
Their structural plasticity allows them to perform diverse functions, from molecular recognition to signaling. However, this very flexibility has made them notoriously difficult to study with traditional computational methods designed for rigid proteins.
Merging physics with generative AI
STARLING combines two powerful approaches: physics-based molecular force fields and multimodal generative deep learning. This hybrid model learns from sequence data to predict the vast range of possible shapes an IDR can adopt.
The framework can also condition its predictions on specific environmental factors, such as ionic strength. It has shown an ability to interpolate beyond its training data, suggesting it can generalize to novel scenarios.
For higher accuracy, STARLING incorporates a Bayesian maximum-entropy reweighting scheme. This allows researchers to refine the generated ensembles by integrating experimental constraints, ensuring the model's outputs align with real-world data.
From analysis to design in seconds
Beyond simply characterizing protein ensembles, STARLING's latent sequence representations enable powerful new applications. The team demonstrated two key use cases that showcase its transformative speed.
First, it allows for an ensemble-based search for "biophysical look-alikes"—proteins that share similar structural behavior despite sequence differences. This moves beyond simple sequence alignment to functional similarity.
Second, and most significantly, it enables "ensemble-first" sequence design. Where designing a single candidate once took weeks or hours, STARLING can now do it in seconds. This acceleration unlocks library-scale design projects previously considered impractical.
Lowering the barrier to IDR research
The development of STARLING represents a major shift in computational biophysics. It specifically targets the long-standing challenge of studying proteins defined by their disorder, a field where tools have been limited.
By providing rapid, accurate ensembles, the framework lowers the barrier to computationally interrogating IDR function. Researchers can now generate and test hypotheses about protein behavior at unprecedented speeds.
The team validated STARLING against existing experimental data and provided several vignettes illustrating its utility. These examples show how the tool can aid in interpreting complex experimental results and drive rapid scientific discovery.
Key capabilities of the STARLING framework include:
- Generating full structural ensembles for IDRs from sequence alone.
- Conditioning predictions on environmental variables like salt concentration.
- Refining ensembles using experimental data via Bayesian reweighting.
- Enabling library-scale protein design in seconds instead of weeks.
- Finding functionally similar proteins based on biophysical behavior, not just sequence.
This tool complements traditional bioinformatic analysis by focusing on the emergent biophysical properties that arise from disorder, offering a new lens through which to understand protein function.
Related Articles

Consistency diffusion language models speed up inference 14x without quality loss
CDLM accelerates diffusion language models via consistency training and block-wise caching, achieving up to 14.5x faster inference on math and coding tasks.

AI maps gene control networks driving Alzheimer's disease
UCI researchers used a new machine learning tool, SIGNET, to create detailed maps of cause-and-effect gene interactions in Alzheimer's-affected brain cells, revealing key regulatory genes and pathways that drive the disease.
Stay in the loop
Get the best AI-curated news delivered to your inbox. No spam, unsubscribe anytime.

