RevBiFPN: The Fully Reversible Bidirectional Feature Pyramid Network

This work introduces the RevSilo, the first reversible module for bidirectional multi-scale feature fusion. Like other reversible methods, RevSilo eliminates the need to store hidden activations by recomputing them. Existing reversible methods, however, do not apply to multi-scale feature fusion and are therefore not applicable to a large class of networks. Bidirectional multi-scale feature fusion promotes local and global coherence and has become a de facto design principle for networks…

A Templated C++ Interface for ISL

Polyhedral libraries typically support only a very limited collection of types for representing objects, corresponding to broad mathematical classes such as sets, binary relations and functions.

Massively scalable stencil algorithm

Stencil computations lie at the heart of many scientific and industrial applications. Unfortunately, stencil algorithms perform poorly on machines with cache based memory hierarchy, due to low reuse of memory accesses. This work shows that for stencil computation a novel algorithm that leverages a localized communication strategy effectively exploits the Cerebras WSE-2, which has no cache hierarchy. This study focuses on a 25-point stencil finite-difference method for the 3D wave equation, a…

Epigenomic language models powered by Cerebras

Large scale self-supervised pre-training of Transformer language models has advanced the field of Natural Language Processing and shown promise in cross-application to the biological `languages' of proteins and DNA. Learning effective representations of DNA sequences using large genomic sequence corpuses may accelerate the development of models of gene regulation and function through transfer learning. However, to accurately model cell type-specific gene regulation and function, it is necessary…

BraggNN: fast X-ray Bragg peak analysis using deep learning

We propose BraggNN, a deep-learning based method, to accelerate the most computation-intensive part of polycrystal diffraction data analysis (diffraction signal characterization). The application of BraggNN for real experimental data demonstrates that it can deliver consistent (sometimes even slightly better) results compared with the conventional method while running hundreds of times faster.

Intelligent Resolution: Integrating Cryo-EM with AI-driven Multi-resolution Simulations to Observe the SARS-CoV-2 Replication-Transcription Machinery in Action

The severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) replication transcription complex (RTC) is a multi-domain protein responsible for replicating and transcribing the viral mRNA inside a human cell. Attacking RTC function with pharmaceutical com- pounds is a pathway to treating COVID-19. Conventional tools, e.g., cryo-electron microscopy and all-atom molecular dynamics (AAMD), do not provide suffciently high resolution or timescale to capture important dynamics of this molecular…

The Path to Successful Wafer-Scale Integration: The Cerebras Story

There has been an impressive increase in single-chip processing power since the Intel 4004 was launched in 1971. This is usually attributed to Moore's law, but there are additional factors to consider. In understanding the components of prior improvements, we can gain insight into the potential for future improvements and potential limits to scaling.

Stream-AI-MD: streaming AI-driven adaptive molecular simulations for heterogeneous computing platforms

Emerging hardware tailored for artificial intelligence (AI) and machine learning (ML) methods provide novel means to couple them with traditional high performance computing (HPC) workflows involving molecular dynamics (MD) simulations. We propose Stream-AI-MD, a novel instance of applying deep learning methods to drive adaptive MD simulation campaigns in a streaming manner.

Memory Efficient 3D U-Net with Reversible Mobile Inverted Bottlenecks for Brain Tumor Segmentation

We propose combining memory saving techniques with traditional U-Net architectures to increase the complexity of the models on the Brain Tumor Segmentation (BraTS) challenge. The BraTS challenge consists of a 3D segmentation of a 240 240 155 4 input image into a set of tumor classes.

Pipelined Backpropagation at Scale: Training Large Models without Batches

New hardware can substantially increase the speed and efficiency of deep neural network training. To guide the development of future hardware architectures, it is pertinent to explore the hardware and machine learning properties of alternative training algorithms.

System Integration of Neocortex, a Unique, Scalable AI Platform

The Pittsburgh Supercomputing Center, in partnership with Cerebras Systems and Hewlett Packard Enterprise, has deployed Neocortex, an innovative computing platform that accelerates scientific discovery by vastly shortening the time required for deep learning training and fosters greater integration of deep AI models with scientific workflows.

Fast Stencil-Code Computation on a Wafer-Scale Processor

The performance of CPU-based and GPU-based systems is often low for PDE codes, where large, sparse, and often structured systems of linear equations must be solved. Iterative solvers are limited by data movement, both between caches and memory and between nodes.

Fast Stencil-Code Computation on a Wafer-Scale Processor

The performance of CPU-based and GPU-based systems is often low for PDE codes, where large, sparse, and often structured systems of linear equations must be solved. Iterative solvers are limited by data movement, both between caches and memory and between nodes. Here we describe the solution of such systems of equations on the Cerebras Systems CS-1, a wafer-scale processor that has the memory bandwidth and communication latency to perform well. We achieve 0.86 PFLOPS on a single wafer-scale…

The curious case of developmental BERTology: On sparsity, transfer learning, generalization and the brain

In this essay, we explore a point of intersection between deep learning and neuroscience, through the lens of large language models, transfer learning and network compression.

Generating SIMD Instructions for Cerebras CS-1 using Polyhedral Compilation Techniques

The Cerebras CS-1 is a computing system based on a waferscale processor having nearly 400,000 compute cores. It is intended for training of and inference on deep neural networks.

A Templated C++ Interface for ISL

Polyhedral libraries typically support only a very limited collection of types for representing objects, corresponding to broad mathematical classes such as sets, binary relations and functions. Software built on top of these libraries, on the other hand, needs to deal with a plethora of different kinds of objects such as instance sets, access relations and dependence relations. Conceptually, these different kinds of objects can only be combined in very specific ways, but they are all mapped to…

Online Normalization for Training Neural Networks

Polyhedral libraries typically support only a very limited collection of types for representing objects, corresponding to broad mathematical classes such as sets, binary relations and functions.

Online Normalization for Training Neural Networks, NeurIPS 2019

Online Normalization is a new technique for normalizing the hidden activations of a neural network. Like Batch Normalization, it normalizes the sample dimension. While Online Normalization does not use batches, it is as accurate as Batch Normalization.