Introducing Condor Galaxy 1: a 4 exaFLOPS Supercomputer for Generative AI - Cerebras

Today Cerebras is excited to announce Condor Galaxy 1 (CG-1), a 4 exaFLOPS, 54 million core, 64-node AI supercomputer. Built in partnership with G42, the leading AI and cloud company of the United Arab Emirates, CG-1 is the first in a series of nine supercomputers to be built and operated through a strategic partnership between Cerebras and G42. Upon completion in 2024, the nine inter-connected supercomputers will have 36 exaFLOPS of AI compute, making it one of the most powerful cloud AI supercomputers in the world. CG-1 is located in Santa Clara California at the Colovore data center. We will be sharing results of a new model trained on CG-1 at the ICML 2023 conference on July 24th.

From Andromeda to Condor

In 2022 we observed exponential growth in the size and computational requirements of large language models (LLMs). The Cerebras Wafer-Scale Engine was already the largest and most powerful AI processor chip in the world. The only way to go larger was to make our Wafer-Scale Engine operate at cluster scale. The results from GPT-3 convinced us that LLMs were likely to be the single largest opportunity in AI, and thus we bet our entire roadmap on a new architecture to connect our Wafer-Scale Engines and thus scale performance to the exaFLOPS range.

To make this possible, we invented two pieces of technology:

Cerebras Wafer-Scale Cluster – a new system architecture that lets up to 192 Cerebras CS-2 systems be connected and operate as a single logical accelerator. The design decouples memory from compute, allowing us to deploy terabytes of memory for AI models vs. the gigabytes possible using GPUs.
Weight streaming – a novel way to train large models on wafer-scale clusters using just data parallelism. We saw firsthand, customers struggling to implement the complex pipeline and model parallel schemes that are required to train large GPU models. Our solution exploits the large-scale compute and memory features of our hardware and distributes work by streaming the model one layer at a time in a purely data parallel fashion.

In November 2022, we brought these two technologies to market with Andromeda – a one exaFLOPS AI, 16 CS-2, AI supercomputer. Andromeda served three purposes. First, it provided a reference design for Cerebras Wafer-Scale Clusters, allowing us to quickly and easily build new AI supercomputers for customers. Second it gave us a world class platform to train large generative models, enabling us to train seven Cerebras-GPT models in just a few weeks and share those models open source with the world. Third, it became the flagship offering of the Cerebras Cloud, opening the doors for customers to use our systems without procuring and managing hardware.

Today’s announcement of CG-1 is the culmination of all these efforts – it is the largest AI supercomputer we’ve ever deployed, brought up in just two weeks thanks to the Andromeda blueprint. It has already trained several large language models, spanning entirely new datasets such as Arabic. And it’s served through the Cerebras Cloud and G42 Cloud to customers worldwide.

Condor Galaxy 1 AI supercomputer specifications:

4 exaFLOPS of AI compute at FP16 with sparsity
54 million AI optimized compute cores
82 terabytes of memory
64 Cerebras CS-2 systems
Base configuration supports 600 billion parameters, extendable up to 100 trillion.
386 terabits of internal cluster fabric bandwidth
72,704 AMD EPYC Gen 3 processor cores
Native hardware support for training with 50,000-token sequence length, no third-party libraries needed.
Data parallel programming model with linear performance scaling

Condor Galaxy Roadmap

Condor Galaxy will roll out in four phases over the coming year:

Phase 1: CG-1 today consists of 32 CS-2 systems and is up and running in the Colovore data center in Santa Clara.
Phase 2: we will double the footprint of CG-1, expanding it to 64 CS-2 systems at 4 exaFLOPS. A 64-node system represents one full supercomputer instance.
Phase 3: We will build out two more full instances across the United States, bringing the total deployed compute to 3 centers at 12 exaFLOPS.
Phase 4: Build out six more supercomputing centers, bringing the full install base to 9 instances at 36 exaFLOPS of AI compute. This puts Cerebras in the top 3 companies worldwide for public AI compute infrastructure.

Condor Galaxy Phase 1 Delivered Phase 2 Q4 2023 Phase 3 H1 2024 Phase 4 H2 2024 exaFLOPS 2 4 12 36 CS-2 Systems 32 64 192 576 Supercomputer Centers 1 1 3 9 Milestone Largest CS-2 deployment to date First 64-node Cerebras AI supercomputer First distributed supercomputer network Largest distributed supercomputer network

When fully deployed in 2024, Condor Galaxy will be one of the largest cloud AI supercomputers in the world. At 36 exaFLOPS, it’s 9 times more powerful than Nvidia’s Israel-1 supercomputer, and four times more powerful than Google’s largest announced TPU v4 pod.

Available Now Through Cerebras Cloud

Cerebras manages and operates CG-1 for G42 and makes it available through the Cerebras Cloud. Dedicated supercomputing instances for AI training are instrumental to model development. OpenAI’s ChatGPT was made possible by the dedicated clusters built by Microsoft Azure. Likewise, breakthroughs from DeepMind and Google Brain were only possible thanks to GCP’s pre-configured TPU pods. Since the launch of Andromeda, we’ve offered cloud-based access to Cerebras systems of up to 16 inter-connected CS-2 systems. With the launch of CG-1, we are now expanding our cloud offering to include fully configured AI supercomputers of up to 64 systems, providing customers with push button access to 4 exaFLOPs of AI performance.

Solving the GPU Scaling Challenge

While GPUs are powerful general-purpose accelerators, it’s widely recognized that programming large GPU clusters is a huge technical barrier for ML developers. Almost every organization has had to invent a programming framework to manage this complexity such as Microsoft DeepSpeed, Nvidia Megatron, Meta Fairscale, and Mosaic Foundry. We performed a thorough analysis of these libraries and found that it takes on average ~38,000 lines of code to train a model over a GPU cluster. It is no wonder that large models only appear to emerge from premier AI labs – the complexity is simply too high for most software teams to manage.

Cerebras Wafer-Scale Clusters – whether one or 64 nodes – are fundamentally designed to operate as a single, logical accelerator. Because CG-1’s memory is a unified 82 Terabyte block, we can fit even the largest models directly into memory without any partitioning or extra code. On Cerebras, a 100B parameter model uses the same code as a 1B model and does not require any pipeline or model parallelism. We natively support long sequence length training of up to 50,000 tokens – no flash attention needed. The net result is that a standard GPT implementation on Cerebras requires just 1,200 lines of code – 30x simpler than the average of leading industry frameworks. Ultimately, this is why customers such as G42 choose our platform – we are not just faster, we are dramatically simpler to use.

G42 and Cerebras Partnership

G42 is an AI company based in the United Arab Emirates. With 22,000 people in 25 countries, G42 is comprised of nine operating companies—including G42 Cloud, IIAI and its joint venture with Mubada, named M42. G42 is the AI national champion in the UAE, driving large-scale digital transformation initiatives in the region and beyond. G42 Cloud is the leading regional cloud computing provider, with a full suite of on-demand compute, storage, and platform solutions. IIAI is a pioneer in building AI models and deploying them across enterprise customers. And, G42 has access to, either directly or through their partners, truly unique datasets in domains as disparate as language, healthcare, energy, and environmental science creating rare opportunity to use AI to find insight in data. This partnership combines the domain specific expertise of G42 researchers with Cerebras’ AI and hardware expertise to build new models and bring to market new cloud AI services. The combined teams have already trained state-of-the-art monolingual and bilingual chat models, healthcare models, and models to further climate studies.

Availability

CG-1 is available today through the Cerebras Clouds White Glove Service and through G42 Cloud for commercial customers training generative AI models. The Cerebras Cloud also offers dedicated AI supercomputing stances from 1-4 exaFLOPS, fully configured and ready to use. In addition to AI infrastructure, our ML applications team offers custom services based on your needs. We’d love to help your team solve the most challenging ML problems and build breakthrough products. Please contact us for additional information.