The Cerebras CS-1: the world’s most powerful AI compute system

The CS-1 is built from the ground up to accelerate deep learning in the data center. It is a complete solution for AI compute powered by the Cerebras Wafer Scale Engine (WSE), programmable with the Cerebras Software Platform and packaged in an innovative System that fits directly into your existing infrastructure.

Overview

Faster time to solution with the CS-1

Designed for deep learning, the CS-1 delivers more performance than a cluster of traditional machines, in a single system. This means faster time to solution at far greater power and space efficiency.

The CS-1 is powered by the revolutionary Wafer-Scale Engine (WSE), fits directly into standard data center infrastructure and is easily programmable with today’s ML frameworks.

Sparse Linear Algebra Compute (SLAC) Cores
400,000
On-chip Memory (SRAM)
18 GB SRAM
Memory Bandwidth
9.6 PB/sec
Interconnect Bandwidth
100 Pb/sec
System I/O
1.2 Tb/s
Dimensions
15 rack units

Download the CS-1 data sheet

Download

Chip Technology

The Cerebras Wafer Scale Engine

With vastly more silicon area than the largest graphics processing unit, the WSE provides more compute cores, tightly coupled memory for efficient data access, and an extensive high bandwidth communication fabric for groups of cores to work together.

Vastly more deep learning compute

The WSE contains 400,000 Sparse Linear Algebra Compute (SLAC) cores. Each core is flexible, programmable, and optimized for the computations that underpin most neural networks. Programmability ensures the cores can run all algorithms in the constantly changing machine learning field.

High bandwidth, low latency communication fabric

The 400,000 cores on the WSE are connected via the Swarm communication fabric in a 2D mesh with 100 Pb/s of bandwidth. Swarm is a massive on-chip communication fabric that delivers breakthrough bandwidth and low latency at a fraction of the power draw of traditional techniques used to cluster graphics processing units. It is fully configurable; software configures all the cores on the WSE to support the precise communication required for training the user-specified model. For each neural network, Swarm provides a unique and optimized communication path.

Efficient, high performance on-chip memory

The WSE has 18 GB of on-chip memory, all accessible within a single clock cycle, and provides 9 PB/s memory bandwidth. This is 3000x more capacity and 10,000x greater bandwidth than the leading competitor. More cores with more local memory enables fast, flexible computation, at lower latency and with less energy.

Software Platform

Software that integrates seamlessly with your workflows

The Cerebras software platform integrates with popular machine learning frameworks like TensorFlow and PyTorch, allowing researchers to use familiar tools and effortlessly bring their models to the WSE.

A programmable C++ interface allows researchers to extend the platform and develop custom kernels - empowering them to push the limits of ML innovation.

Cerebras Graph Compiler drives full hardware utilization

The Cerebras Graph Compiler (CGC) automatically translates your neural network to an optimized WSE executable.

Every stage of CGC is designed to maximize WSE utilization. Kernels are intelligently sized so that more compute resources are allocated to more complex work. CGC then generates a placement and routing, unique for each neural network, to minimize communication latency between the layers.

Designed for flexibility and extensibility

The Cerebras software platform includes an extensive library of primitives for standard deep learning computations, as well as a familiar C++ interface for developing custom software kernels.

A comprehensive suite of debug and profiling tools allows researchers to optimize the platform for their work.

CS-1 System

A single system replaces racks of graphics processing units

Until now, a wafer-scale chip had never been built. Power delivery, cooling, data delivery, and packaging in the CS-1 — every element has been carefully co-designed alongside the chip and software to enable the generational performance leap made possible by wafer-scale integration.

1. Input/Output

The CS-1 requires high bandwidth communication with surrounding infrastructure to feed the 400,000 cores on the Wafer-Scale Engine. Our I/O system handles this task and delivers 1.2 Terabits per second bandwidth to the system edge through 12x standard 100 Gigabit Ethernet links to the datacenter. The I/O system also includes several optimized FPGAs to convert standard TCP-IP traffic into WSE protocol. This allows the CS-1 to be easily connected to a standard switch and receive input data for training or inference from many standard CPU servers in parallel. Simply plug in the CS-1 to power and a 100 Gigabit Ethernet switch, and you are ready to start training models at wafer-scale speed.

2. Engine Block

Powering, cooling, and packaging a wafer-scale processor is no easy task! The magic occurs in the back of the system, in the engine block – an innovation in packaging that solves the challenges of power delivery, cooling, and electrical connectivity to the Wafer Scale Engine.

The front contains power pins, behind which are power step-down modules and the main motherboard. A brass manifold contains dry quick connectors for the water pumps and directs water across the back of a cold plate that is tightly coupled to the wafer, cooling its 1.2 trillion transistors.

A key innovation brings power to the wafer through the main board rather than at the edges of the wafer. However, the silicon wafer has a different coefficient of thermal expansion (CTE) than the main board. This means that during heating and cooling the main board and the wafer expand and shrink by different amounts. We developed a custom connector to maintain electrical connectivity in the face of these stresses.

Overcoming the technical hurdles of power delivery, cooling, packaging and CTE mismatch with innovative solutions allowed Cerebras to solve the 70-year-old problem of wafer scale compute.

3. Cooling System

The CS-1 is an internally water-cooled system. Like a giant gaming PC on steroids, the CS-1 uses water to cool the WSE, and then uses air to cool the water. Water circulates through a closed loop internal to the system.

Two hot-swappable pumps on the top right move water through a manifold across the back of the WSE, cooling the wafer and warming the water. Warm water is then pumped into a heat exchanger. This heat exchanger presents a large surface area for the cold air blown in by the four hot-swappable fans at the bottom of the CS-1. The fans move air from the cold aisle, cool the warm water via the heat exchanger, and exhaust the warm air into the warm aisle.

Faster insights at lower cost

At 15 rack units, using max system power of 20kW, the CS-1 packs the performance of a room full of servers into a single unit the size of a dorm room mini-fridge.

With cluster-scale compute available in a single box, you can push your research further - at a fraction of the cost.

Lower scaling complexity

Writing your application to run on a single system is vastly simpler than distributing the workload across hundreds of Graphic Processing Units. 

A single model can map entirely to a single CS-1 with no effort.

Cluster Design

Datacenter-scale AI processing with a CS-1 cluster

Multiple CS-1s can be clustered together for even greater scale and performance beyond a single unit, with greater deployment ease, lower engineering cost, and more flexibility for AI researchers.

Higher performance

A single CS-1 delivers orders of magnitude greater deep learning performance than a graphics processor. As such, far fewer CS-1 systems are needed to achieve the same effective compute as large-scale cluster deployments of traditional machines.

Greater flexibility

Scaling to fewer nodes is much simpler and more efficient, due to lower communication and synchronization overheads. This also means that distributed training across CS-1s achieves higher utilization without needing large batch sizes.

Explore more ideas in less time. Reduce the cost of curiosity.

Contact us to learn how to purchase

Contact us