Sparse-IFT: Sparse Iso-FLOP Transformations for Maximizing Training Efficiency
Replacing dense layers with Sparse-IFT leads to significant improvements across computer vision (CV) and natural language processing (NLP) tasks, including ResNet-18 on ImageNet (+3.5%) and GPT-3 Small on WikiText-103 (-0.4 PPL), both matching larger dense model variants with 2x or more FLOPs. To…
SPDF: Sparse Pre-training and Dense Fine-tuning for Large Language Models
Presented at the ICLR 2023 Workshop on Sparsity in Neural Networks. In this work, we show the benefits of using unstructured weight sparsity to train only a subset of weights during pre-training (Sparse Pre-training) and then recover the representational capacity by allowing the zeroed weights to…
January 20, 2023
Publication
AI Research Projects,Deep learning,chip,SDK,software,drug discover,Awards
Wafer-Scale Fast Fourier Transforms
We have implemented fast Fourier transforms for one, two, and three-dimensional arrays on the Cerebras CS-2, a system whose memory and processing elements reside on a single silicon wafer. The wafer-scale engine (WSE) encompasses a two-dimensional mesh of roughly 850,000 processing elements (PEs)…
November 23, 2022
Publication
AI Research Projects,Deep learning,chip,SDK,software,drug discover,Awards
GenSLMs: Genome-scale language models reveal SARS-CoV-2 evolutionary dynamics
Our work seeks to transform how new and emergent variants of pandemic causing viruses, specially SARS-CoV-2, are identified and classified. By adapting large language models (LLMs) for genomic data, we build genome-scale language models (GenSLMs) which can learn the evolutionary landscape of…
TensorFlow as a DSL for stencil-based computation on the Cerebras Wafer-Scale Engine
The Cerebras Wafer Scale Engine (WSE) is an accelerator that combines hundreds of thousands of AI-cores onto a single chip. Whilst this technology has been designed for machine learning workloads, the significant amount of available raw compute means that it is also a very interesting potential…