SwarmX is a component within Cerebras Systems’ Weight Streaming Execution. The Weight Streaming Execution is a new paradigm for training giant models. Weight streaming disaggregates the storage of parameters from the compute units. The MemoryX service, for parameter storage and update; works together with SwarmX fabric which is a novel interconnection between parameter memory and compute. 

The SwarmX fabric is composed of broadcast-reduce nodes each containing a set of 100Gb/s network interfaces. Each broadcast-reduce node provides enough bandwidth to perform either 1:4 broadcast-reduce operations or a pair of 1:2 broadcast-reduce operations. The nodes can be configured in several different modes, which provides flexibility to meet the needs of a particular cluster. Each node provides enough compute to perform the floating-point gradient reductions at line-rate, allowing reductions to occur as gradients flow back to the MemoryX service. SwarmX nodes are connected in a bidirectional tree topology which minimizes the overall bandwidth and latency required to perform the broadcast and reduction operations (Figure 8 and Figure 9). Each CS-2 system has 1.2Tb/s of I/O bandwidth and, in the worst case, needs weights to be delivered at this rate to keep the compute units busy, so the aggregate bandwidth required from the SwarmX fabric increases linearly with N, the number of CS-2 systems. To satisfy this requirement, the number of nodes composing the SwarmX fabric scales linearly with N. Since tree reductions are work-efficient, the compute required also increases linearly with N, and is delivered by the compute in each broadcast-reduce node. A tree topology also has the benefit of reduced latency, with the latency between the MemoryX service and the CS-2 systems growing logarithmically with N.