ICML 2024

July 21-27, 2024

Booth #205B

The 62nd Annual Meeting of the Association for Computational Linguistics

000 days 00 hours 00 minutes 00 seconds


The 7th workshop on Neural Scaling Laws
Scaling, Alignment, Transfer Learning & Multilingual Models

This workshop is co-organized by the CERC in Autonomous AI Lab at UdeM and aims to provide a forum for discussing recent advances in foundation models - large-scale neural networks, pretrained in an unsupervised way on large and diverse datasets.

Session time: Monday, July 22, 2024
Time: 9am – 12pm CEST
Location: Courtyard Vienna Prater/Messe / 4 Trabrennstraße 1020 Wien Austria

Learn more

Sparse IFT has been accepted into ICML!

Abstract: Recent research has focused on weight sparsity in neural network training to reduce FLOPs, aiming for improved efficiency (test accuracy w.r.t training FLOPs). However, sparse weight training often sacrifices accuracy, requiring extended training schedules to attain the accuracy of dense models. In contrast, our approach, Sparse Iso-FLOP Transformations (Sparse-IFT), uses sparsity to improve accuracy while maintaining dense model FLOPs. Using a single hyperparameter (i.e., sparsity level), Sparse-IFTs efficiently replace dense layers, expanding the search space for optimal sparse masks. In addition, dynamic sparse training with Sparse-IFT models effectively navigates this larger sparse mask-weight space, which is evidenced by a spectral analysis using Ramanujan graph properties. Our study reveals a robust correlation among mask topology, weights, and final performance. Notably, without adjusting hyperparameters, replacing dense layers with Sparse-IFT yields significant improvements, such as a +3.5% boost for ResNet-18 on ImageNet and +0.9% for GPT-3 Small on the Open LLM leaderboard. To our knowledge, this is the first work to demonstrate the use of sparsity for improving the accuracy of dense models through a simple-to-use set of sparse transformations.


Introducing Sparse Llama: 70% Smaller, 3x Faster, Full Accuracy

Cerebras and Neural Magic have achieved a major milestone in the field of large language models (LLMs). By combining state-of-the-art pruning techniques, sparse pretraining, and purpose-built hardware, we have unlocked unprecedented levels of sparsity in LLMs, enabling up to 70% parameter reduction without compromising accuracy.


Cerebras Breaks Exascale Record for Molecular Dynamics Simulations

Cerebras has set a new record for molecular dynamics simulation speed that goes far beyond the exascale level. While this breakthrough has wide-ranging impacts for materials modeling, we initially focused on a problem relevant to commercializing nuclear fusion. This achievement demonstrates how Cerebras's wafer-scale computers enable novel computational science applications.


Cerebras CS-3 vs. Nvidia B200: 2024 AI Accelerators Compared

In the fast-paced world of AI hardware, the Cerebras CS-3 and Nvidia DGX B200 are two of the most exciting new offerings to hit the market in 2024. Both systems are designed to tackle large scale AI training, but they take decidedly different approaches.