Jan 14 2022

If You're Doing Pharma and Life Sciences AI Research Without a Cerebras System, You're Doing it Wrong - Cerebras

Rebecca Lewington, Technology Evangelist | March 14, 2022

AI has the potential to transform the speed, sophistication and safety of drug discovery, yielding better medicines and vaccines. But to be successful, our customers need to train complex models, using huge datasets, very quickly. Training times measured in weeks just won’t do. The researchers need results in hours so they can run many experiments to test their hypotheses.

Which is where Cerebras comes in. We make the world’s fastest AI accelerator, removing roadblocks to biomedical research, drug discovery and data-driven healthcare. At Cerebras, we’re helping to solve big problems at a who’s who of leading institutions.

Need proof? Read on to learn about the important work our systems are doing at a some of our major customers.

GlaxoSmithKline is training complex epigenomic models with a previously prohibitively large dataset, made possible for the first time by Cerebras. In this blog, GSK’s Kim Branson, SVP & Global Head of AI and ML writes: “We were able to train the EBERT model in about 2.5 days, compared to an estimated 24 days with a GPU cluster with 16 nodes. This dramatic reduction in training time makes the new models actually useful in a real-world research environment, which is very exciting.

And in the technical paper “Epigenomic Language Models Powered by Cerebras”, the GSK authors make it “The training speedup afforded by the Cerebras system enabled us to explore architecture variations, tokenization schemes and hyperparameter settings in a way that would have been prohibitively time and resource intensive on a typical GPU cluster.”

AstraZeneca is iterating and experimenting in real-time by running queries on hundreds of thousands of abstracts and research papers with a Cerebras system. Nick Brown, their Head of AI & Data Science, said “Training which historically took over 2 weeks to run on a large cluster of GPUs was accomplished in just over 2 days – 52hrs to be exact – on a single CS-1. This could allow us to iterate more frequently and get much more accurate answers, orders of magnitude faster.” Visit our customer spotlight page to learn more.

It’s hard to imagine a better example of “AI for good” than figuring out how the virus that causes COVID-19 works. Argonne National Laboratory, along with researchers from other national labs, universities, and Cerebras developed a host of new computational techniques to create AI-driven simulations to figure out how the virus that causes COVID-19 works. But these simulations were limited to a few tens of nanoseconds of motion at one time, whereas researchers needed to study microseconds – a 50x longer period of time. GPUs were creating a bottleneck. Our system offered the solution. To quote their study which was nominated for a Gordon Bell Special Prize and presented at this year’s SC21 supercomputing conference, “the CS-2 delivers out-of-the-box performance of 24,000 samples/s, or about the equivalent of 110-120 GPUs.”

This blog by ML frameworks technical lead Vishal Subbiah gives a nice high-level summary of this impressive piece of work.

ANL is also working with the National Institute of Health and National Cancer Institute to develop AI-powered predictive models for drug response that can be used to optimize pre-clinical drug screening and to drive precision medicine-based treatments for cancer patients. They ran into major challenges associated with scaling large AI models across a cluster of GPUs, which we were able to help them overcome. Using our system, they can now train their drug response deep learning models in hours, rather than the days or weeks that had been the case with their legacy GPU cluster. You can read the case study here.

Rick Stevens, Associate Laboratory Director of Computing, Environment and Life Sciences at ANL, said “Cerebras allowed us to reduce the experiment turnaround time on our cancer prediction models by 300X, ultimately enabling us to explore questions that previously would have taken years, in mere months.”

Lastly, nference, an AI-driven health technology company, is using our system to accelerate the training state-of-the-art transformer NLP models by orders of magnitude which they hope will lead to better health outcomes. These models will help researchers and clinicians make sense of siloed and inaccessible health data, such as patient records, scientific papers, medical imagery, and genomic databases. As their CTO, Ajit Rajasekharan said in the press release: “With a powerful Cerebras CS-2 system we can train transformer models with much longer sequence lengths than we could before. This will enable us to iterate more rapidly and build better, more insightful models.”

Everyone knows that drug development is hard. It’s expensive, takes a long time and is fraught with uncertainty. Suzi Ring from Bloomberg did a great job of laying out what they called “the Bad Math of Drug Development” in this terrific article published late last year. AI has the potential to fix that bad math, but AI is not a magic wand. Achieving meaningful results takes a combination of medical expertise, advanced computer science, mountains of data and lots and lots of AI-specific computing horsepower.

In short, if you’re doing pharma and life sciences research without a Cerebras system, you’re doing it wrong.

If you’re facing roadblocks to your biomedical research, drug discovery and data-driven healthcare, please get in touch. We can give you cluster-scale deep learning acceleration in a single, easy-to-program device, so your researchers can focus on medical innovation, not on working around the limitations of traditional computing systems.

Learn more at cerebras.ai/industries/health-and-pharma.