Cerebras Systems is a team of pioneering computer architects, computer scientists, deep learning researchers, and engineers of all types who love doing fearless engineering. We have come together to build a new class of computer to accelerate deep learning.
Today, I’m excited to introduce the first element of the Cerebras solution – the Cerebras Wafer Scale Engine, the largest chip in the world and the heart of our deep learning system.
In the last few years, deep learning has risen as one of the most important workloads of our time. It has driven breakthroughs in Artificial Intelligence, across industries – from consumer technology to healthcare to manufacturing.
Deep learning also has unique, massive, and growing computational requirements. And it is not well-matched by legacy machines like graphics processing units, which were fundamentally designed for other work.
As a result, AI today is constrained not by applications or ideas, but by the availability of compute. Testing a single new hypothesis – training a new model – can take days, weeks, or even months and cost hundreds of thousands of dollars in compute time. This is a major roadblock to innovation.
Time and again, our industry has shown that special applications require specialized accelerators. We’ve built DSPs for signal processing; switching silicon for packet processing; GPUs for graphics. In the same vein, we need a new processor – a new engine – for deep learning compute. It should be purpose-built for this work and deliver orders of magnitude more performance to achieve wall clock training times of minutes, not months.
This is why Cerebras was founded – to build a new type of computer optimized exclusively for deep learning, starting from a clean sheet of paper. To meet the enormous computational demands of deep learning, we have designed and manufactured the largest chip ever built.
The Cerebras Wafer Scale Engine (WSE) is 46,225 square millimeters and contains more than 1.2 Trillion transistors and is entirely optimized for deep learning computation. By way of comparison, the WSE is more than 56X larger than the largest graphics processing unit, containing 3,000X more on-chip memory and capable of achieving more than 10,000X the memory bandwidth.
On a chip of this size, we are able to deliver 400,000 AI-optimized cores. Our specialized memory architecture ensures each of these cores operates at maximum efficiency. It provides 18 Gigabytes of fast, on-chip memory distributed among the cores in a single-level memory hierarchy, one clock cycle away from each core. All of these high-performance, AI-optimized cores are connected entirely on silicon by the Swarm fabric in a 2D mesh with 100 Petabits per second of bandwidth. Swarm delivers breakthrough bandwidth and low latency at a fraction of the power draw of traditional techniques used to cluster graphics processing units. Software configures all the cores on the WSE to support the precise communication required for training user-specified models.
Altogether, the WSE takes the fundamental properties of cores, memory, and interconnect to their logical extremes. A vast array of programmable cores provides cluster-scale compute on a single chip. High-speed memory close to each core ensures that cores are always occupied doing calculations. And by connecting everything on-die, communication is many thousands of times faster than what is possible with off-chip technologies like InfiniBand.
Every element on the WSE, every tradeoff in design – from the core, to the memory, to the interconnect – has been made to enable deep learning research at unprecedented speeds and scale. And this is just the beginning. Stay tuned here and on our website for updates and more information coming soon. And if you are excited about our technology and mission, we are always looking for new members for our extraordinary team. Consider joining us if you want to be a part of something great.
To learn more about the WSE and why it’s the right answer for deep learning compute, check out our whitepaper below.