SUNNYVALE, Calif.–(BUSINESS WIRE)–Cerebras Systems, the pioneer in accelerating generative AI, today announced the achievement of a 130x speedup over Nvidia A100 GPUs on a key nuclear energy HPC simulation kernel, developed by researchers at Argonne National Laboratory. This result demonstrates the performance and versatility of the Cerebras Wafer-Scale Engine (WSE-2) and ensures that the U.S. continues to be the global leader in supercomputing for energy and defense applications.

“The Cerebras CS-2 system, powered by the WSE-2 processor, has 48x more transistors than the A100 but achieved a 130x speedup, showing a 2.7x gain in architectural efficiency for a problem that is widely optimized for GPUs.”

Post this

Monte Carlo particle transport is a major focus in the field of HPC as it provides high fidelity simulation of radiation transport and is vital to fission and fusion reactor designs. In this research collaboration, a Cerebras CS-2 system dramatically outperformed a highly optimized GPU implementation in the most demanding portion of the Monte Carlo neutron particle transport algorithm – the macroscopic cross section lookup kernel. This kernel represents the most computationally intensive portion of the full simulation, accounting for up to 85% of the total runtime for many nuclear energy applications. This work further validates Argonne’s ALCF AI Testbed program, which aims to bring AI accelerators to the forefront of U.S. supercomputing infrastructure, exploring capabilities beyond what is achievable with GPUs.

“I’ve implemented this kernel in a half dozen different programming models and have run it on just about every HPC architecture over the last decade,” said John R. Tramm, Assistant Computational Scientist, Argonne National Laboratory. “The performance numbers we were able to get out of the Cerebras machine impressed our team – a clear advancement over what has been possible on CPU or GPU architectures to-date. Our team’s work adds to growing evidence that AI accelerators have serious potential to disrupt GPU dominance in the field of HPC simulation.”

Monte Carlo neutron particle transport provides high fidelity simulation of radiation transport, which is a critical component of fission and fusion reactor design. Within this algorithm, the macroscopic cross section lookup kernel assembles statistical distribution data used to generate random samples for a particle’s behavior as it moves through a simulated geometry and interacts with various materials. ANL scientists implemented an optimized version of the macroscopic cross-section lookup kernel using the Cerebras SDK and the CSL programming language. The implementation took advantage of Cerebras CS-2’s wafer scale architecture of up to 850,000 cores and 40GB of on-chip SRAM which provided a combination of extreme bandwidth and low latency – an ideal match for Monte Carlo particle simulations. This research also validates the ability of external researchers to develop their own HPC applications for the Cerebras architecture, unlocking new levels of performance on a wide variety of computational problems.

“These published results highlight not only the incredible performance of the CS-2, but also its architectural efficiency,” said Andrew Feldman, CEO and co-founder of Cerebras Systems. “The Cerebras CS-2 system, powered by the WSE-2 processor, has 48x more transistors than the A100 but achieved a 130x speedup, showing a 2.7x gain in architectural efficiency for a problem that is widely optimized for GPUs.”

Moreover, the Cerebras CS-2 demonstrated strong scaling, meaning it achieved high performance on both small- and large-scale simulations. The researchers noted that in smaller scale simulations, no amount of GPUs working in parallel would be able to match the performance of a single CS-2.

The Cerebras CS-2, powered by the WSE-2, is purpose-built for generative AI and scientific applications. It has delivered remarkable results, often characterized as “100x” improvements in scientific computing. Notably, in a multi-dimensional seismic processing project conducted by the King Abdullah University of Science and Technology (KAUST), a cluster of 48 CS-2s achieved performance comparable to the world’s fastest supercomputer. Similarly, researchers at the National Energy Technology Laboratory used the CS-2 to perform computational fluid dynamics a staggering 470 times faster than its Joule Supercomputer. Additionally at TotalEnergies, the CS-2 accelerated stencil computations by an impressive 228 times when compared to a GPU-based solution.

To read the full paper titled “Efficient Algorithms for Monte Carlo Particle Transport on AI Accelerator Hardware,” please visit

About Cerebras Systems

Cerebras Systems is a team of pioneering deep learning researchers, computer architects, and solutions specialists of all types. We have come together to bring generative AI to enterprises and organizations of all sizes around the world. Our flagship product, the CS-2 system, powered by WSE-2, the world’s largest and fastest AI processor, makes training large models simple and easy, by avoiding the complexity of distributed computing. Our software tools simplify the deployment and training process, providing deep insights and ensuring best in class accuracy. Through our team of world-class ML researchers and practitioners who bring decades of experience developing and deploying the most advanced AI models, we help our customers stay on the cutting edge of AI. Cerebras solutions are available in the cloud, through the Cerebras AI Model Studio or on premise. For further information, visit


Kim Ziesemer