logo
logo
  • Product
    • System
    • Chip
    • Software
    • Cloud
  • Industries
    • Health & Pharma
    • Energy
    • Government
    • Scientific Computing
    • Financial Services
    • Web and Social Media
  • Resources
    • Customer Spotlights
    • Blog
    • Publications
    • Events
    • White Papers
  • Developers
    • Community
    • Developer Blog
    • Documentation
    • ML Public Repository
    • Request Access to SDK
  • Company
    • About
    • News
    • Press Releases
    • Awards
    • Press Kit
  • Join Us
    • Life at Cerebras
    • All openings
  • Get Demo
  • Search
December 14, 2021
In Chip, Machine Learning, System, Drug Discovery, Blog

A Big Chip for Big Science: Watching the COVID-19 Virus in Action

Vishal Subbiah, Tech Lead Manager, ML Frameworks | December 14, 2021 It’s hard to imagine a better example of “AI […]

Vishal Subbiah, Tech Lead Manager, ML Frameworks | December 14, 2021

It’s hard to imagine a better example of “AI for good” than figuring out how the virus that causes COVID-19 works. If we know how it works, we can develop ways to prevent the virus replicating and end a global scourge. I was fortunate to co-author, along with my colleagues Jessica Liu and Tanveer Raza, a massive scientific study to figure out the “how”. The study was nominated for a Gordon Bell Special Prize and presented at this year’s SC21 supercomputing conference.

This is big science. Researchers from 12 national labs, universities, and companies like Cerebras developed a host of new computational techniques to create a simulation of the virus’ replication mechanism that runs across 4 supercomputing sites!

The idea was to create a fully functional model of the SARS-CoV-2 virus “replication-transcription machinery”. The word “machinery” is apt: this is an intricate biological mechanism made up of millions of atoms moving in three dimensions as it hijacks the host’s own replication mechanism to make copies of itself.

The process starts with three-dimensional images of the virus captured using cryo-electron microscopy. This technique can achieve near-atomic resolution, but the images are still not good enough, or dynamic enough, to show us how the mechanism really works. To fill in the missing data, the research team layered on top two completely different, but complementary techniques, working at different scales. First, we can treat biomolecules the way we treat any materials problem. We can use a type of the finite element analysis tools we routinely use to design continuum-scale objects such as engine parts. And second, we can simulate molecules atom-by-atom like a much more sophisticated version of the ball-and-stick models we all remember from chemistry class.

Diagram from the paper showing how the components of the study fit together. Our work is in the computational steering part.

Putting all this together is a mammoth task. Innovation was needed at, as it were, every scale, from a novel workflow architecture that allows widely-distribute computing resources to mesh seamlessly and automatically, to improving the computational efficiency of the individual models.

That last part – improving computational efficiency – is where Cerebras comes in. In the past, these simulations took so long to create that it was only possible to study a few tens of nanoseconds of motion at one time. However, to reach a broader understanding, they needed to study microseconds – a 50x longer period of time. The team realized that the machine learning steps were the bottleneck to achieving the 50x speedup needed when integrating simulations with AI.

What role, exactly, does ML play here, I hear you ask? Each simulation experiment ties up a supercomputer with thousands of processing nodes for a long time. To avoid wasted time, it’s vitally important to “steer” these experiments, recognizing and halting simulations that are going down dead ends, and encouraging, so to speak, those that may prove fruitful. This is easier said than done. It’s very difficult to specify the characteristics of a “bad” simulation beforehand. But it’s easy after the fact to recognize that a bad thing happened. This is a classic ML opportunity: you know what the answer looks like, but you don’t know how to define rules to describe it.

We address this with a machine learning model called a “convolutional variational autoencoder”, or CVAE. Oversimplifying, a CVAE takes a complex “high-dimensional” input and transforms or “encodes” it into a smaller form. You can think of this as a kind of figure of merit. We train the model by letting it observe snapshots of the simulations. We then run the reverse transformation – or decode it. If the decoded version is a good match for the original, we know the CVAE is working. That trained model can then be used during “real” experiments by another algorithm that does the actual steering. However, as the paper points out: “CVAE is quadratic in time and space complexity and can be prohibitive to train.”

Cerebras comes into the picture here because this bit of the problem was being explored at Oak Ridge National Laboratory on the Summit supercomputer and on the Argonne AI-Testbed at Argonne National Laboratory, which just happens to feature a Cerebras accelerator. The ANL researchers compared training their CVAE model on 256 nodes of Summit, for a total of 1,536 GPUs, and on a single Cerebras CS-2 system.

And how did we do? In terms of pure performance, rather well. Quoting the paper again: “the CS-2 delivers out-of-the-box performance of 24,000 samples/s, or about the equivalent of 110-120 GPUs.”

As impressive as this number is, perhaps even more impressive is the “out of the box” comment. Distributing a promising algorithm across a large cluster of compute nodes is difficult and time-consuming even for experts in the field. The CS-2 system, by contrast, is intentionally architected as a single, ultra-powerful node with cluster-scale performance. Our software makes it easy to get a neural network running by changing just a couple of lines of code.

Many organizations have problems that could be solved with some serious AI horsepower, but the sad fact is that few of us have the funds to construct or run supercomputers to run them on. And moreover, few of us have the specialized developers capable of rewriting and tuning applications for distributed clusters, or the support staff needed to install and maintain these complex systems.

To quote the paper again: “Because a single CS-2 here delivers the performance of over 100 GPUs, it is a practical alternative for organizations interested in this workflow who do not have extremely large GPU clusters.” We couldn’t agree more.

Finally, it’s important to bear in mind that while this study has direct benefits in the treatment of COVID-19, the new tools and workflow may ultimately prove much more significant. This methodology can be applied to any kind of molecular machinery, paving the way for more rapid and better understanding of molecular interactions across a wide range of use cases, including treatment discovery for a range of diseases. It’s hugely satisfying to know that I was able to play a part.

To learn more about the study, read the paper ​“Intelligent Resolution: Integrating Cryo-EM with AI-driven Multi-resolution Simulations to Observe the SARS-CoV-2 Replication-Transcription Machinery in Action” which will appear in International Journal of High Performance Computing Applications, 2021.

Banner image by Argonne National Laboratory/University of Illinois at Urbana-Champaign.

chip drug discover machine learning system

cerebras
Author posts
Related Posts
Machine LearningSoftwareBlog

June 22, 2022

Cerebras Sets Record for Largest AI Models Ever Trained on Single Device

Our customers can easily train and reconfigure GPT-3 and GPT-J language models…


by Joel Hestness

Machine LearningSoftwareBlogDeveloper Blog

June 22, 2022

Training Multi-Billion-Parameter Models on a Single Cerebras System is Easy

Changing model size is trivial on Cerebras, rather than a major science project…


by Natalia Vassilieva

Machine LearningSoftwareBlog

June 22, 2022

Cerebras Makes It Easy to Harness the Predictive Power of GPT-J

A look at why this open-source language model is so popular, how it works and…


by Natalia Vassilieva

  • Prev
  • Next

Explore more ideas in less time. Reduce the cost of curiosity.

Sign up

info@cerebras.net

1237 E. Arques Ave
Sunnyvale, CA 94085

Follow

Product

System
Chip
Software
Cloud

Industries

Health & Pharma
Energy
Government
Scientific Computing
Financial Services
Web & Social Media

Resources

Customer Spotlight
Blog
Publications
Events
Whitepapers

Developers

Community
Developer Blog
Documentation
ML Public Repository
Request Access to SDK

Company

About Cerebras
News
Press Releases
Privacy
Legal
Careers
Contact

© 2022 Cerebras. All rights reserved

Privacy Preference Center

Privacy Preferences

Manage Cookie Consent
To provide the best experiences, we use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us to process data such as browsing behavior or unique IDs on this site. Not consenting or withdrawing consent, may adversely affect certain features and functions.
Functional Always active
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
The technical storage or access that is used exclusively for statistical purposes. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.
Manage options Manage services Manage vendors Read more about these purposes
View preferences
{title} {title} {title}