Model Lab

Compare different models, simulate runs,and estimate training & inference costs

Notes

The test loss is an estimate based on the Cerebras-GPT scaling law. It allows for a relative comparison of the models’ loss if they are trained from scratch using the same hyperparameters and dataset. The estimate may not align with the figures published in the original papers
The test loss serves as an estimation of the pre-training performance. Downstream performance may vary, even when the models achieve the same test loss
Loss is a very sensitive number. A 5% change in loss is a huge difference, typical of doubling the model size
The pre-populated models only provide information about the parameter size and dataset size. They do not incorporate other hyperparameters
The loss curve presented is a simulation of an ideal learning rate at specific FLOP levels. In real training runs, the curve may be steeper but should ultimately reach the same loss value
Questions or feedback? Contact james.wang[at]cerebras.net