Wafer-Scale Deep Learning (Hot Chips 2019 Presentation)

This past Monday, August 19, I was proud and excited to reveal the Cerebras Wafer Scale Engine (WSE) at my HotChips talk.

Update 1/6/2020: Video of my talk has now been made publicly available by the HotChips organizers.

The WSE is the largest commercial chip ever manufactured, built to solve the problem of deep learning compute. The WSE is 1.2 trillion transistors, packed onto a single 215mm x 215mm chip with 400,000 AI-optimized cores, connected by a 100Pbit/s interconnect. The cores are fed by 18 GB of super-fast, on-chip memory, with an unprecedented 9 PB/s of memory bandwidth.

Why does this matter? We believe that deep learning is the most important computational workload of our time. Its requirements are unique and demand is growing at an unprecedented rate. Large training tasks often require peta- or even exascale compute: it commonly takes days or even months to train large models with today’s processors.

We need a new processor for deep learning. In this talk, I unveiled the Cerebras WSE — the right processor for this work, designed from the ground up to accelerate deep learning training from months to minutes. In our presentation, I describe the core technology behind WSE, why big chips are the answer to deep learning compute, and the engineering challenges we faced in building the world’s first wafer-scale engine.