Nov 29 2022

The Cerebras AI Model Studio brings Wafer-Scale Cluster Acceleration to the Cloud - Cerebras

We recently announced the availability of the Cerebras AI Model Studio, our dedicated cloud service powered by the Cerebras Wafer-Scale Cluster, for large language models and generative AI. The Cerebras AI Model Studio is hosted on the Cerebras Cloud by Cirrascale Cloud Services® and makes training large language models cheaper, faster, and easier.

In this blog, we will describe why we built the Cerebras AI Model Studio, what it is, and what users can do today.

Training a Large Language Model is too expensive, too time-consuming, and too complicated

Large language models have been increasing in popularity in recent years. They are getting larger and are being applied to a variety of real-world science and business applications. However, the cost, time, and complexity of training these models is so high that only a handful of organizations are capable of training them.

Training large language models from scratch is expensive; this article mentions training PaLM using cloud computing would require around $9M to $23M. That is an expensive if not impossible proposition for most organizations!

Additionally, training large language models is time-consuming. The largest instance type typically offered by traditional cloud services is an 8-way GPU server. It often takes weeks just to get access, networking, storage, host compute, security, and software set up on one of these. Then, models in the billions to tens or hundreds of billions of parameters could take months or even years to train – if they run at all. A Tech Crunch article from earlier this year mentions that the 176B parameter large language model, Bloom, trained on 384 Nvidia A100 GPUs for three months! Most organizations do not have access to, and cannot afford, 384 Nvidia A100 GPUs, and they certainly cannot wait the years necessary to train a large language model on much smaller GPU systems.

Finally, training large language models is complicated. Because of the size of a large language model, current compute resources like GPUs require the model to be broken up into smaller pieces and fit across a cluster of GPUs. This requires the user to have a mastery of distributed programming to optimize the model-parallel configuration that is required for an optimal training run. Our CEO, Andrew Feldman mentioned in an interview with TechTalks that “distributed parallel computation is obscure and it’s rare and only a few organizations in the world are good at it.”

Given these challenges, it is no surprise that training large language models is currently being done by only a handful of organizations that have time, money, and expertise to execute successfully.

The Cerebras AI Model Studio makes training large language models cheaper, faster, and easier

The Cerebras AI Model Studio solves all of this and makes training large Transformer models for language or generative AI model applications fast, easy, and affordable. By giving you control over the models you build, the Cerebras AI Model Studio gives you the opportunity to build models tailored to your data and business application, independent of third-party models and GPT API-type services – all to optimize model performance, cost efficiency, and differentiation / defensibility of your enterprise.

The Cerebras AI Model Studio enables cloud access to clusters of Cerebras CS-2 systems, the fastest AI-optimized accelerators in the world. As a related example of the power of CS-2 systems and clusters, see our recent announcement of our flagship AI supercomputer, Andromeda. Argonne National Labs (ANL) and Cerebras used Andromeda to conduct our award-winning research (read more about winning the Gordon Bell Special Prize for High Performance Computing-Based Covid-19 Research). Rick Stevens, Associate Lab Director at ANL said,

“we put the entire COVID-19 genome into the sequence window, and Andromeda ran our unique genetic workload with long sequence lengths across 1,2,4,8, and 16 nodes, with near-perfect linear scaling Andromeda sets a new bar for AI accelerator performance.”

The Cerebras AI Model Studio makes training easy: users simply log in to a secure terminal provided by Cirrascale and access the requested Cerebras Wafer-Scale Cluster environment, which will be pre-configured and ready to run for each user’s work. Users no longer have to provision cloud resources, set up monitoring and logging services, configure an environment for machine learning training runs, or wait in virtual lines for compute resources. Instead, users can focus on preparing their data, designing their training run, and connecting their trained model to their broader business or research goals.

With the Cerebras AI Model Studio, Cerebras introduces model-based pricing for GPT-style models. Now, users will receive a fixed price for the time of the training runs based on the model size and number of tokens to train they select. This removes the unpredictable pricing of training a model on the cloud, where users have to calculate total cost of training based on their estimated hourly consumption of compute resources. Our model-based pricing makes this predictable and affordable: users can train a GPT3-XL 1.3B parameter model for as low as $2,640.

Train GPT-style models via the Cerebras AI Model Studio today!

Today, the Cerebras AI Model Studio enables users to train generative pre-trained Transformer (GPT)-class models like those that can be found in our Model Zoo repository. This includes, but is not limited to, GPT 1.3B, 6B, 6.7B, 13B, 20B, and the T5 11B models. We enable a simple, push-button approach to training these large language models by providing users with pre-configured Python scripts to match a user’s training specifications. This reduces the development hours needed to prepare a training run, lowering the overall total cost of training.

In partnership with Cirrascale, we have built a cloud solution that prioritizes security. Every user that receives access to systems will appreciate our consideration into data security, networking security, and our processes that ensure a system is scrubbed after every engagement.

Finally, the Cerebras AI Model Studio offers users flexibility. Users can purchase additional add-ons, enabling them to tweak the training experience to fit their needs. Some examples of additional capabilities include:

Additional CS-2 resources to go faster or train on more tokens to drive accuracy beyond published marks.
Additional cluster time for hyperparameter tuning, pre-production training runs, post-production continuous pre-training or fine-tuning
CPU hours from Cirrascale for dataset preparation
CPU or GPU support from Cirrascale for production model inference

Conclusion

The Cerebras AI Model Studio makes large language model training accessible to a broader audience. Users no longer need millions of dollars, months of time, or teams of experts in distributed programming to harness the power of large language models. In fact, users do not need large language model training expertise either! Users can work with our world-class solutions team to identify the right training run that fits a user’s business needs and budget.

To learn more about the Cerebras AI Model Studio, please visit our webpage.

Udai Mody, Product Manager | November 28, 2022