Cerebras AI Model Studio
The Cerebras AI Model Studio is a simple pay by the model computing service powered by dedicated clusters of Cerebras CS-2s and hosted by Cirrascale Cloud Services. It is a purpose-built platform, optimized for training and fine-tuning large language models on dedicated clusters of millions of cores. It provides deterministic performance, requires no distributed computing headaches, and is push-button simple to start.

Key Benefits
Large models in less time
Train 1-175 billion parameter models 8x faster than the largest publicly available AWS GPU instance
Enable higher performing models with our longer sequence lengths (up to 50,000!)
Simple & Easy to Use
Easy access: simply SSH in and go
Simple programming: range of large language models in standard PyTorch and TensorFlow
Push-button performance: the power of millions of AI cores dedicated to your work with no distributed programming required
Price
Models trained at a fraction of the price of AWS
Predictable fixed price cost for production model training
Competitive per token pricing for fine-tuning
Flexibility
Train your models from scratch or fine-tune open-source models with your data
Ownership
Dependency free - Keep the trained weights for the models you build
Simple & Secure cloud operations
Simple onboarding: no DevOps required
Software environment, libraries, secure storage, networking configured and ready to go
Should you Fine-Tune or Train from Scratch?

1) Dataset size is dependent upon the model architecture used for training and the task. If you are unsure, our world-class engineers will be happy to help
2) Domain similarity assess if the data used to pre-train a generic model and your data are similar enough such that your fine-tuned model will perform well on downstream tasks
Fine-Tuning
Standard Offering
Self-service process, similar to training from scratch standard offering
See the price per 1,000 tokens, no surprises
Note: Minimum spend is $10,000
White-Glove Support with Cerebras Experts
Our thought leaders will fine-tune a model on our Wafer-Scale Cluster on your behalf and will deliver you trained weights
Contact us for pricing
Introductory Pricing - Standard Offering
These prices represent blocks of dedicated cluster time for the chosen model. Additional system time is available at an hourly rate as needed.
Model | Parameters (B) | Fine-tuning price per 1K tokens | Fine-tuning price per example (MSL 2048) | Fine-tuning price per example (MSL 4096) | Cerebras time to 10B tokens (h)** | AWS p4d (8xA100) time to 10B tokens (h) |
---|---|---|---|---|---|---|
Eleuther GPT-J | 6 | $0.00055 | $0.0011 | $0.0023 | 17 | 132 |
Eleuther GPT-NeoX | 20 | $0.00190 | $0.0039 | $0.0078 | 56 | 451 |
CodeGen* 350M | 0.35 | $0.00003 | $0.00006 | $0.00013 | 1 | 8 |
CodeGen* 2.7B | 2.7 | $0.00026 | $0.0005 | $0.0027 | 8 | 61 |
CodeGen* 6.1B | 6.1 | $0.00065 | $0.0013 | $0.0030 | 19 | 154 |
CodeGen* 16.1B | 16.1 | $0.00147 | $0.0030 | $0.011 | 44 | 350 |
* T5 tokens to train from the original T5 paper. Chinchilla scaling laws not applicable.
** Note that GPT-J was pre-trained on ~400B tokens. Fine-tuning jobs can employ a wide range of dataset sizes, but often use order 1-10% of the pre-training tokens. As such, one might fine-tune a model like GPT-J with ~4-40B tokens. We provide estimated wall clock time to fine-tune train the model checkpoints above with 10B tokens on Cerebras AI Model Studio and an AWS p4d instance in the table above to give you a sense of how much time jobs of this scale could take.
Train from Scratch
Standard Offering
Pick a model from the list below
See the price, time to train: no surprises
SSH into secure, dedicated programming environment for the training period
Browse documentation and code examples and verify model implementation for the chosen model
Configure scripts to vary training parameters, e.g. batch, learning rate, training steps, checkpointing frequency
Save and export trained weights and training log data from your work to use as you see fit
Additional Services
Bigger dedicated clusters are available to reduce time to accuracy and work on larger models.
Additional cluster time for hyperparameter tuning, pre-production training runs, post-production continuous pre-training or fine-tuning is available by the hour.
CPU hours from Cirrascale for dataset preparation
CPU or GPU support from Cirrascale for production model inference
Train models that are not listed in the table
Introductory Pricing - Standard Offering
These prices represent blocks of dedicated cluster time for the chosen model. Additional system time is available at an hourly rate as needed.
Model | Parameters (B) | Tokens to train to Chinchilla point (B) |
Cerebras AI Model Studio days to train** |
Cerebras AI Model Studio price to train |
---|---|---|---|---|
GPT3-XL | 1.3 | 26 | 0.4 | $2,500 |
GPT-J | 6 | 120 | 8 | $45,000 |
GPT-3 6.7B | 6.7 | 134 | 11 | $40,000 |
T-5 11B | 11 | 34* | 9 | $60,000 |
GPT-3 13B | 13 | 260 | 39 | $150,000 |
GPT NeoX | 20 | 400 | 47 | $525,000 |
GPT 70B | 70 | 1400 | call for quote | call for quote |
GPT 175B | 175 | 3500 | call for quote | call for quote |
* T5 tokens to train from the original T5 paper. Chinchilla scaling laws not applicable.
** Expected number of days, based on training experience to date, using a 4-node Cerebras Wafer-Scale Cluster. Actual training of model may take more or less time.
Interested? Contact Sales To Get Started!
This is how to get started with the world’s fastest AI accelerators, quickly and easily.
- 2-day trial access to a cluster:
- Secure, dedicated access to programming environment for the trial period
- Systems, code, data, documentation provided by Cerebras
- Ability to run models on 1, 2, or 4 systems within the cluster on a Cerebras-curated version of the open source Pile dataset
- PyTorch or TensorFlow models to choose from, including: GPT 1.3B, 6B, 6.7B, 13B, 20B. Please see the Cerebras Model Zoo for examples
- Trial programming environment:
- Cerebras-provided Python scripts allowing the trial user to vary the number of CS-2 systems used to train and GPT model implementation / size
- Cerebras-provided scripts allowing the trial user to vary learning rate, steps, checkpointing frequency
- Access to trial trained weights
FAQ
Training large Transformer models such as GPT and T5 on traditional cloud platforms with graphics processors is painful, expensive, and time consuming. The largest instance typically offered in the cloud is an 8-way GPU server. It often takes weeks just to get access. Networking, storage, and compute cost extra. Set up is no joke. Models with tens of billions of parameters take weeks to get going and months to train. If you want to train in less time, you can attempt to reserve additional instances – but unpredictable inter-instance latency makes distributing AI work difficult, and achieving high performance across multiple instances nearly impossible. The result is very few large models are ever trained in a traditional cloud.
Our Solution
The Cerebras AI Model Studio makes training large Transformer models for language or generative AI model applications fast, easy, and affordable. With Cerebras, you have millions of cores, predictable performance, no parallel distribution headaches – all of this enables you to quickly and easily run existing models on your data or to build new models from scratch optimized for your business.
A dedicated cloud-based cluster powered by Cerebras CS-2 systems with millions of AI cores for large language models and generative AI:
Train 1-175 billion parameter models quickly and easily
No parallel distribution pain: single-keystroke scaling over millions of cores
Zero DevOps or firewall pain: simply SSH in and go
Push-button performance: models in standard PyTorch or TensorFlow
Flexibility: pre-train or fine-tune models with your data
Train in a known amount of time, for a fixed fee
This offering is all about simplified access to large-scale compute to train large-scale language models in short time. We have provisioned specific CS-2 accelerator resources for each model above to deliver the throughput needed to reach the target number of tokens in the listed time. This way, when you use the selected Model Studio code and configuration, you can be sure you’ll complete training to the listed number of tokens. And you can always get more CS-2 resources if needed – contact us to learn more. The Model Studio has access to an elastic pool of CS-2 resources, up to and including large 16-node CS-2 wafer-scale clusters like Andromeda.
The production training run is intended to be a single run from scratch to the listed number of tokens (for most models above, the number of tokens listed is defined by the Chinchilla model scaling laws for large language models). We understand that model development involves a lot more, and your production training run may not go off without a hitch – see below for more information on those issues. When you select a model and purchase one of our fixed price production training run offerings, you’re essentially getting a pre-defined pool of CS-2 resources for the listed amount of time. By using our code and configuration, we ensure sufficient accelerator resources to deliver the token throughput needed to train to the listed number of tokens in the advertised time. Need more time or more accelerator resources? No problem, we have both.
Yes! The Cerebras AI Model Studio is a fully provisioned model development facility. As needed, you can rent CPU hours for data preprocessing and input pipeline development, you can rent additional CS-2 accelerator resources as needed for hyperparameter tuning – pre-production experimental training runs – training eval as needed, and compute resources for production inference after training. We can support fine-tuning, continuous pre-training, and model re-training as well.
Not every run is perfect. If you run into issues, we’ll work together. If the issue is with Cerebras AI Model Studio code or systems, we’ll credit you back the time and help you start again. If the issue is a user or ML matter (e.g. suboptimal hyperparameters for the run), you’ll retain the remainder of your system allocation time and can procure more time as needed to finish the run to your desired end state. We’re here to help you be successful.
Yes! We will provide pre-trained checkpoints for the listed models trained on public, open-source data, and you can purchase CS-2 accelerator resources to train. Contact us to learn more.
Cerebras AI Model Studio provides a ready-to-use training environment, with all required software components already installed and configured. Running training jobs with a multi-million core AI system in the Cerebras AI Model Studio is as easy as running training jobs on a single GPU – no need to manage distributed resources, no need to worry about loading and saving checkpoints across many nodes, no need to think about complex hybrid parallelism strategies, no need to distribute shards of the optimizer state parameters. Cerebras AI Model Studio uses model implementations in the Cerebras Model Zoo in standard PyTorch and TensorFlow. Feel free to take a look!