Accelerating Large GPT Training with Sparse Pre-Training and Dense Fine-Tuning [Updated]

We have shown it is possible to reduce the training compute for large GPT models using high degrees of weight sparsity…


0 Comments27 Minutes