On October 30th at TensorFlow World, my colleague Manjunath Kudlur and I were honored to speak publicly for the first time about the Cerebras software stack. This stack links deep learning researchers to the massive compute capabilities of the Wafer Scale Engine (WSE).
The WSE (pronounced “wise”) is the largest commercial chip ever manufactured, built to solve the problem of deep learning compute. The WSE is 1.2 trillion transistors, packed onto a single chip with 400,000 AI-optimized cores, connected by a 100Pbit/s interconnect. The cores are fed by 18 GB of super-fast, on-chip memory, with an unprecedented 9 PB/s of memory bandwidth.
What does this mean for AI researchers? We believe, along with many others, that AI has massive potential — from ads, to autonomous vehicles, from commerce to climate. It has transformative potential for the way we live and work.
Researchers continue to see gains with deeper models and larger datasets. But, they are compute limited today. Training commonly takes days, weeks, even months. Not only is this costly, it constrains research and development. We need wall clock training times on the timescale of experimentation and human innovation — minutes-hours rather than days-weeks — even for large models. This means we need 100-1000x increase in compute capabilities, not incremental 1.5-2x.
We need this performance in an accessible, easy to program package.
This talk introduced the Cerebras Software Stack. Its primary task is to map the computational graph for researchers’ neural networks — from the framework level all the way down to our massive wafer-scale processor.
The stack integrates seamlessly with popular machine learning frameworks like TensorFlow and PyTorch, allowing researchers to use familiar and flexible tools that bring their models to the WSE.
A programmable C++ interface allows researchers to extend the platform and develop custom kernels – empowering them to push the limits of ML innovation.