Large Language Models

Key Enabling Technologies

Multi-Trillion Parameter Models

Cerebras combines Wafer-Scale architecture with our innovative weight streaming technology to support massive models simply and easily, without complex hacks.

Learn more

Data Parallel Is All You Need

Erase the pain of distributed computing. Cerebras Clusters run strictly data-parallel, so you can distribute work across tens of millions of Cerebras cores with a single keystroke.

Learn more

Linear Performance Scaling

Powered by our weight streaming technology, Cerebras Wafer-Scale Clusters effortlessly deliver near-linear scaling to hundreds of nodes.

Learn more

50K Context Out of the Box

Build models that can reason over huge documents with native 50K context length support.

Learn more

Train Faster with Sparsity

With native support for dynamic and unstructured sparsity at any level, you can train models up to 8x faster.

Learn more

Featured Case Studies

Core42 Sets New Benchmark for Arabic Large Language Models with the Release of Jais 30B

Latest Jais model iteration shows stronger performance across content generation, summarization, Arabic-English translation.

How we fine-tuned Llama2-70B to pass the US Medical License Exam in a week

New open-access model by M42 outperforms GPT-3.5 in standardized medical exam.

BTLM-3B-8K: 7B Performance in a 3 Billion Parameter Model

Cerebras and Opentensor introduce a new standard for compact large language models

Genomics in Unparalleled Resolution: Cerebras Wafer-Scale Cluster Trains Large Language Models on the Full COVID Genome Sequence

Our joint work with Argonne National Laboratory (ANL) and NVIDIA won the 2022 Gordon Bell Special Prize for HPC-Based COVID-19 Research.

Resources

Hugging Face

Explore all our latest open source models

Visit Cerebras Hugging Face

GitHub Model Zoo

See reference implementations of popular LLMs on Cerebras

Visit Cerebras Model Zoo

Documentation

Your full guide to programming Cerebras hardware with PyTorch 2.0

Explore Developer Documentation

“We note that these training runs frequently take >1 week on dedicated GPU resources (such as Polaris@ALCF). To enable training of the larger models on the full sequence length (10,240 tokens), we leveraged AI-hardware accelerators such as Cerebras CS-2, both in a stand-alone mode and as an inter-connected cluster, and obtained GenSLMs that converge in less than a day.”

Award-winning research

2022 Gordon Bell Prize for COVID Research

A team led by researchers from Argonne National Laboratory and Cerebras was recognized for developing the first genome-scale language model to study the evolutionary dynamics of SARS-CoV-2. Their work has the potential to transform how we identify and classify new and emergent variants of pandemic-causing viruses.

At Cerebras Systems, we love it when the CS-2 is vastly faster than large NVIDIA GPU clusters.