Software that Integrates Seamlessly with your Workflows

The Cerebras Software Platform, CSoft, has two main parts:

The Cerebras ML Software integrates with the popular machine learning frameworks TensorFlow and PyTorch, so researchers can effortlessly bring their models to the CS-2 system.

The Cerebras Software Development Kit allows researchers to extend the platform and develop custom kernels – empowering them to push the limits of AI and HPC innovation.

Programming at Scale White Paper

Unmatched Productivity and Performance

Cerebras ML Software makes it simple to get existing PyTorch or TensorFlow models running on the CS-2.

Our PyTorch interface library is a simple wrapper for PyTorch program exposed through API calls that is easy to add as few extra lines of code for an existing PyTorch implementation. The integration is via lazy tensor backend with XLA to capture the full graph of a model and map it optimally onto our massive wafer scale engine (WSE-2).

Integration of TensorFlow is via Cerebras Estimator, which is a wrapper class developed by our team for TensorFlow and is based on standard TensorFlow Estimator. Users simply imports Cerebras Estimator and can continue using standard TensorFlow semantics.

Learn more about PyTorch integration


Designed for flexibility and extensibility

The Cerebras Software Platform includes an extensive library of standard deep learning primitives and a complete suite of debug and profiling tools.

The Cerebras SDK enables developers to extend the platform for their work, harnessing the power of wafer-scale computing to accelerate their development needs. With the SDK and the Cerebras Software Language (CSL), developers can target the WSE’s microarchitecture directly using a familiar C-like interface for developing software kernels.

SDK Whitepaper

Cerebras Graph Compiler Drives Full Hardware Utilization

The Cerebras Graph Compiler (CGC) automatically translates your neural network to an optimized executable program.

Every stage of the process is designed to maximize WSE-2 utilization. Kernels are intelligently sized so that more cores are allocated to more complex work. The Graph Compiler then generates a placement and routing, unique for each neural network, to minimize communication latency between adjacent layers.