Generative AI has revolutionized software development through prompt-based code generation, and now protein design is the next frontier. According to the NVIDIA blog, EvolutionaryScale announced the launch of the ESM3 model, a third-generation ESM model that provides protein discovery engineers with a programmable platform by simultaneously inferring the sequence, structure, and function of proteins.
The startup, which emerged from the Meta FAIR (Fundamental AI Research) unit, recently secured funding led by Lux Capital, Nat Friedman, and Daniel Gross, with an investment from NVIDIA. EvolutionaryScale is at the forefront of programmable biology, supporting research such as engineering proteins that can target cancer cells, finding alternatives to harmful plastics, and driving environmental mitigation.
EvolutionaryScale’s ESM3 model uses NVIDIA H100 Tensor Core GPUs to deliver the highest compute power ever committed to a biologically based model. The 98 billion parameter ESM3 model uses approximately 25 times more FLOPs and 60 times more data than its predecessor, ESM2. The company has developed a database of more than 2 billion protein sequences to train AI models, providing technology applicable to drug development, fighting disease, and understanding human evolution at scale.
Accelerate In Silico Biology Research with ESM3
With significant advances in training data, EvolutionaryScale aims to accelerate protein discovery with ESM3. The model was trained on nearly 2.8 billion protein sequences sampled from a variety of organisms and biomes, allowing scientists to guide the model to more accurately identify and validate new proteins.
ESM3 provides significant updates compared to previous versions. This model is generative in nature and follows an “all-to-all” approach. This means that structural and functional annotations can be provided as input rather than output. Once publicly available, scientists can fine-tune this basic model to create purpose-built models based on their own proprietary data. ESM3’s large-scale generative training on massive amounts of data provides an innovative tool for in silico biological research.
Driving the next generation of innovation with NVIDIA BioNeMo
ESM3 provides biologists and protein designers with a generative AI boost, improving the engineering and understanding of proteins. Through simple prompts, you can create new proteins using provided scaffolds, self-improve protein designs based on feedback, and design proteins based on user-specified features. These features can be used in any combination to provide chain-of-thinking protein design, similar to sending a message to researchers skilled in the complex three-dimensional meaning of all known protein sequences.
“In our internal testing, we were impressed with ESM3’s ability to respond creatively to complex prompts,” said Tom Sercu, co-founder and VP of Engineering at EvolutionaryScale. “We solved a very difficult protein design problem to create a new green fluorescent protein,” he said. We expect that ESM3 will help accelerate scientists’ work and open up new possibilities. We are excited to see ESM3 contributing to the future of research in the life sciences.”
EvolutionaryScale will be releasing its API today for a private beta, with code and weights available for a small public version of ESM3 for non-commercial use. This version will soon be accessible on NVIDIA BioNeMo, a generative AI platform for drug discovery. The entire ESM3 model family is available to select customers with NVIDIA NIM microservices, a runtime optimized in partnership with NVIDIA, and supported by an NVIDIA AI Enterprise software license for testing at ai.nvidia.com.
The computing power required to train these models is growing exponentially. ESM3 was trained using an Andromeda cluster with NVIDIA H100 GPUs and NVIDIA Quantum-2 InfiniBand networking. ESM3 models are available on select partner platforms and NVIDIA BioNeMo.
Image source: Shutterstock