Wafer-Scale Cluster

The Wafer-Scale Cluster is an installation designed to support large-scale models (up to and well beyond 1 billion parameters) and large-scale inputs. It can contain single or multiple CS-2 systems with the ability to distribute jobs across all or a subset of CS-2 systems in the cluster. A supporting CPU cluster in this installation consists of MemoryX, SwarmX, management, and input worker nodes. This installation supports both Pipeline execution for models below 1 billion parameters and Weight Streaming execution for models up and above 1 billion parameters.  

The advantage of Wafer-Scale Cluster is that it runs strictly in data parallel mode. No other AI hardware can do this for large NLP models. Running data parallel is only possible if the entire neural network fits on a single processor—both the compute and the parameter storage. Cerebras pioneered wafer scale integration—enabling us to build the largest processor ever made. And we invented techniques to store trillions of parameters off chip and deliver the performance as if they were on chip. With 100 times more cores and 1000 times more on chip memory, and 10,000 times more memory bandwidth, the Wafer Scale Engine isn’t forced to break big problems into little pieces and distribute them among hundreds or thousands of little processors only then reassemble them into the final answer. Each WSE can support all the layers in every model, including the largest layers in the largest models. And our Memory X technology enables us to store trillions of parameters off processor.  

Based on the CS-2, MemoryX and SwarmX, the Cerebras Wafer-Scale Cluster is the only cluster in AI compute that enables strict linear scaling of models with billions, tens of billions, hundreds of billions, and trillions of parameters. If users go from one CS-2 to two CS-2s in a cluster, the time to train is cut in half. If users go from one CS-2 to four CS-2s, training time is cut to one-fourth. This is an exceptionally rare characteristic in cluster computing. It is profoundly cost and power efficient. Unlike GPU clusters, in a Cerebras cluster, as users add more compute, performance increases linearly.