Nov 02 2022

Sandia Awards Major Contract to Cerebras Systems - Cerebras

Sandia National Laboratories and Its Partners Award R&D Contract to Cerebras Systems to Uncover New Stockpile Stewardship Applications

Sandia, Lawrence Livermore, and Los Alamos National Laboratories, collectively referred to as the Tri-labs, recently announced a multiyear collaboration with Cerebras Systems. Led by Sandia National Laboratory, the partnership is aimed at accelerating advanced simulation and computing applications in support of the nation’s stockpile stewardship mission.

The Tri-Labs are tasked by the National Nuclear Security Administration to assist in managing the nation’s nuclear stockpile through simulation-based research. The program is administered by the Advanced Simulation and Computing (ASC) Program. Under ASC, high-performance simulation and computing capabilities are developed to analyze and predict the performance of nuclear weapons, their safety, and reliability and certify their functionality. The realism and accuracy of ASC simulations are profoundly important and depend on improved physics models and methods, which in turn requires vastly more compute.

The ASC program seeks to inspire technological advancements that improve NNSA mission applications. The target is NNSA mission application performance improvement of 40X over that achievable on soon-to-be-deployed exascale computing platforms.

Improving the Performance of El Capitan Supercomputer by 40X

El Capitan, the forthcoming exascale supercomputer to be sited at Lawrence Livermore National Laboratory, will serve many customers, including the NNSA. El Capitan will be installed in 2023 and aims to deliver 2 exaflops (peak double-precision) performance. Seeking a path to 40X improvement over El Capitan, the Tri-Labs team was inspired by Cerebras’ 2nd generation Wafer Scale Engine (WSE-2), which powers the Cerebras CS-2 system. The WSE-2 is the largest processor ever built. It contains 2.6 trillion transistors, 850,000 AI-optimized cores, and 40 GB of on-chip memory. It is approximately 56 times larger than the largest competitive process, has 100 times the number of compute cores, 1000 times more memory, and more than 10,000 times more memory bandwidth.

“The scale of Cerebras Systems’ wafer-scale technology makes this all really exciting, and we believe Cerebras’ WSE-2 and its ability to handle sparse models and on-chip data flows may help us achieve 40X El Capitan performance,” said Siva Rajamanickam, a reseacher involved in the deployment of the Cerebras technology at Sandia. Simon Hammond, federal program manager for the ASC’s Computational Systems and Software Environments program, said, “This collaboration with Cerebras Systems has great potential to impact future mission applications by enabling artificial intelligence and machine learning techniques, which are an emerging component of our production simulation workloads.”

Previous National Laboratory Partnerships and Results

Sandia Labs was impressed by Cerebras Systems’ past achievements at the intersection of artificial intelligence and simulation. Cerebras and its partners have achieved pioneering results in Molecular Dynamics, Computational Fluid Dynamics (CFD), and innovation in Stencil Algorithm research.

In sections 3a, 3b, and 3c, we describe some of the pioneering work on the Cerebras CS-2. In these applications, the CS-2 benefited from its first-ever technology in wafer-scale integration. Wafer-scale integration enables the WSE-2 to be the largest chip ever built—the size of a dinner plate, whereas most chips are the size of a postage stamp. The computational resources on the wafer scale engine underpin the orders-of-magnitude speedup of AI and high-performance computing applications. The WSE-2’s sheer size delivers 1) industry-leading on-chip memory capacity and bandwidth, 2) not only 100 times as many cores as the nearest competitor but also high bandwidth and low latency core-to-core communication, and 3) native acceleration for all kinds of sparsity, including unstructured dynamic sparsity, something no other processor ever built can support in native form.

1. Molecular Dynamics: Argonne National Laboratory and Cerebras teamed up to deliver Stream-AI-MD, a novel instance of applying deep learning methods to drive adaptive MD simulation campaigns in a streaming manner. The collective team leveraged the computing power of a Cerebras System wafer-scale AI accelerator to stream data from atomistic MD simulations to AI/ML models to guide the conformational search in a biophysically meaningful manner. This was done to demonstrate the effic acy of Stream-AI-MD simulations for two scientific use cases: (1) folding a small prototypical protein, namely ββα-fold (BBA) FSD-EY, and (2) understanding protein-protein interaction (PPI) within the SARS-CoV-2 proteome between two proteins, nsp16 and nsp10. This study showed that Stream-AI-MD simulations could improve time-to-solution by ~50X for BBA protein folding. This experiment focused on systems of size 105 atoms (captured by images of size 448 × 448).

Additionally, researchers want to be able to study larger proteins. Cerebras Systems makes these studies possible by introducing the weight streaming execution mode. This is where the model parameters are stored off-chip, on external servers, and streamed into the system one layer at a time. With this mode, very large model capacity and support for large image sizes can be achieved since they are no longer restricted by the memory on the wafer.

For more information, please visit our co-authored paper here.

2. Computational Fluid Dynamics: In collaboration with researchers at the National Energy Technology Laboratory (NETL), Cerebras showed that, at the time, a single Cerebras CS-2 system could outperform one of the fastest supercomputers in the US by more than 200X. The team sought to improve performance for PDE codes, where large, sparse, and often structured systems of linear equations must be solved. Iterative solvers are limited by data movement between caches and memory and between nodes. Solving this problem would positively impact modeling physical phenomena—like fluid dynamics—using a finite-volume method on a regular three-dimensional mesh.

The team built a solution of such systems of equations on the Cerebras Systems CS-1. They reported achieving 0.86 PFLOPS on a single wafer-scale system for the solution by BiCGStab of a linear system arising from a 7-point finite difference stencil on a 600 × 595 × 1536 mesh, resulting in about one-third of the machine’s peak performance, which is impressive when compared with the top 20 performing supercomputers, which achieve only 0.5% – 3.1% of their peak floating point performance for similar algorithms (see published article here).

3. Seismic Modelling: TotalEnergies and Cerebras teamed up to tackle seismic modeling code that turns petabytes of data generated by a mesh of seismic sensors into three-dimensional models that extend far beneath the earth’s crust. One run of this full code on a supercomputer can take weeks. Furthermore, it is standard practice to run these simulations many times with minor input changes to build confidence that the solutions are robust to perturbations in the initial conditions.

As of April 2022, the Cerebras CS-2 system outperformed the benchmark score for a modern GPU by more than 200X. Total and Cerebras engineers wrote the benchmark code using the new Cerebras Software Language (CSL). The CSL is part of the Cerebras SDK, which allows developers to take advantage of the strengths of the CS-2 system. Read more about this incredible accomplishment here.

What’s Next

At Cerebras, we exist to help big ideas take shape. AI and simulation are pioneering techniques that Cerebras hardware can accelerate in pursuit of these big ideas.

Cerebras CEO Andrew Feldman says, “We often marvel at the innovations that take flight on our hardware. And we are proud to see the scientific breakthroughs past and future that Sandia and its Tri-Lab partners can and will achieve using our CS-2 systems. We exist to push the boundaries of what is possible. This multi-year partnership will vastly accelerate cutting-edge simulation workloads and computing applications for the nation’s stockpile stewardship mission. Cerebras is proud to be working with the Tri-Lab team to support the NNSA mission.”

Udai Mody, Product Manager | November 2, 2022