000 days 00 hours 00 minutes 00 seconds


Joel Hestness

Senior Research Scientist

Panel Session time: October 12, 2022 at 8:30 AM PT

Panel Title: Waferscale Computing Systems: Are We There Yet?

Panel description: Fueled by the tremendous growth of new applications in the domain of big-data computing, deep learning, and scientific computing, the demand for increasing system performance is far outpacing the capability of conventional methods for system performance scaling. Waferscale computing, where an entire 300 mm wafer worth of compute and memory can be extremely tightly integrated, promises to provide orders of magnitude improvement in performance and energy efficiency compared to today’s systems built using traditional packaging technologies.

In this panel, we will discuss the “Promised Land” of waferscale computing. Back in the 1980s, waferscale systems were attempted by a few companies, notable amongst them were Trilogy systems and Tandem Computers. However, yield and cost challenges of building waferscale systems led to its early demise, but the promise remained. After more than 30 years, recent academic (e.g., UCLA/UIUC) and industrial (Cerebras, Tesla) efforts have taken up the challenge again. So, are we in a waferscale technology renaissance period and nearing the days when waferscale technologies would be widely adopted? Many questions remain which we will discuss in this panel.

First and foremost, is the overall technology there yet? Does manufacturing difficulties, more so in the advanced nodes, limit choice of waferscale architectures? Would waferscale integration of heterogeneous chiplets open up more architectural choices over monolithic waferscale technologies ? Waferscale computing comes with very high power density, which means 10s of kilowatts of power need to be supplied and that heat needs to be extracted from the wafer. Is the data center infrastructure ready to accommodate such waferscale systems at scale? Are there lower power use cases for Waferscale ? Moreover, the design infrastructure, such as EDA and simulation tools are not yet truly ready for larger-than-a-reticle design. Thus, the overall design challenges of a waferscale system is humongous and so would the development be confined to a niche group? What are the applications that need waferscale computing and would benefit massively from such systems? Does the cost of waferscale systems justify adoption of these systems at volume? Are there previously untenable applications and business cases that now become feasible with waferscale computing, if so what are those? Are there edge compute use cases where the volumetric compute density would lead to adoption of waferscale systems?


Linear Scaling Made Possible with Weight Streaming

In a single keystroke, Cerebras can scale large language models from a single CS-2 system to 192 CS-2s in a Cerebras Wafer-Scale Cluster.


Cerebras Makes It Easy to Harness the Predictive Power of GPT-J

A look at why this open-source language model is so popular, how it works and how simple it is to train on a single Cerebras system.


Context is Everything: Why Maximum Sequence Length Matters

GPU-Impossible™ sequence lengths on Cerebras systems may enable breakthroughs in Natural Language Understanding, drug discovery and genomics.