Key Enabling Technologies
Multi-Trillion Parameter Models
Cerebras combines Wafer-Scale architecture with our innovative weight streaming technology to support massive models simply and easily, without complex hacks.
Ease of Clustering
Erase the pain of distributed computing. Cerebras Clusters run strictly data-parallel, so you can distribute work across tens of millions of Cerebras cores with a single keystroke.
Linear Scaling Performance
Powered by our weight streaming technology, Cerebras Wafer-Scale Clusters effortlessly deliver near-linear scaling to hundreds of nodes.
Native Long Sequences
Native hardware support for maximum sequence lengths as long as 50,000 tokens enables more insightful, more accurate models, simply and directly.
Sparsity Acceleration
Massive memory bandwidth enables Cerebras to harvest structured and unstructured sparsity. Never multiplying by zero means faster training, fewer FLOPs and less energy.
Featured Models
Model | Details | Example | Model Zoo |
---|---|---|---|
GPT-2 GPT-3 | The GPT architecture uses autoregressive attention mechanisms to selectively focus on segments of input text it predicts to be the most relevant. GPT-2 has 1.5 billion parameters. The third-generation GPT-3 has variants ranging from 1.3B to 175B parameters. Changing between versions using CSoft merely requires changing a few configuration parameters. | Cerebras-GPT family Genomic GPT models Sparse GPT training |
Cerebras-GPT Repo GPT-2 Repo GPT-3 Repo |
GPT-J GPT-NeoX | GPT-J and GPT-NeoX are open source language models, created by the research group EleutherAI. Two of the most advanced open source alternatives to OpenAI’s GPT-3, they have up to 6 billion and 20 billion parameters respectively. | Harnessing GPT-J | GPT-J Repo |
BERT RoBERTa | Bidirectional Transformers for Language Understanding (BERT) is an encoder-only transformer-based model designed for natural language understanding. RoBERTa is similar to BERT, but has architectural changes made to reduce pre-training time. | Financial dataset Epigenomic BERT |
BERT Repo |
Transformer AIAYN T5 | Transformer, also known as Attention is All you Need (AIAYN), started the current wave of transformer architectures. The Text-to-Text-Transfer-Transformer model (T5) expresses all NLP tasks into a unified text-to-text-format where the input and output are always text strings. T5 can be considered as a generalized extension of Transformer. | T5 Repo |

“We note that these training runs frequently take >1 week on dedicated GPU resources (such as Polaris@ALCF). To enable training of the larger models on the full sequence length (10,240 tokens), we leveraged AI-hardware accelerators such as Cerebras CS-2, both in a stand-alone mode and as an inter-connected cluster, and obtained GenSLMs that converge in less than a day.”
Award-winning research
2022 Gordon Bell Prize for COVID Research
A team led by researchers from Argonne National Laboratory and Cerebras was recognized for developing the first genome-scale language model to study the evolutionary dynamics of SARS-CoV-2. Their work has the potential to transform how we identify and classify new and emergent variants of pandemic-causing viruses.
At Cerebras Systems, we love it when the CS-2 is vastly faster than large NVIDIA GPU clusters.