Training from Scratch

Training from scratch is a machine learning process in which neural networks are trained to recognize patterns and make predictions without requiring any previously-learned information. This type of training involves the machine learning model being built from the ground up and requires vast amounts of data for machine learning algorithms to be able to interpret and learn from it. It is different from transfer learning where knowledge learned during training a model on one task is transferred and applied to another, thus reducing the amount of new data needed for training. Training from scratch can be time consuming but useful when dealing with complex problems that require a high degree of accuracy. By using extensive datasets, machine learning models can be effectively trained to produce reliable outcomes. This method of machine learning can help improve the accuracy and robustness of machine learning models while also reducing the risk of overfitting. However, it is important to remember that this type of training requires a lot more data and takes much longer than transfer learning. Nevertheless, in many cases it is still very useful as it produces better outcomes when dealing with complex problems. 

Cerebras System’s CS-2 system delivers the compute performance of more than 120 AI-optimized GPUs, making it possible to train large BERT models from scratch. Results of a pilot project conducted by a major financial services institution and Cerebras Systems showcased the benefits of training from scratch using domain-specific datasets can now be realized in an enterprise environment thanks to Cerebras AI accelerator technology.The project aimed to improve BERTLARGE model accuracy for financial services applications by training the model from scratch using domain-specific data, rather than simply using a generic model trained on general purpose text as a starting point. The project also demonstrated that training from scratch, which was previously intractable using conventional hardware, could be made easy and far faster using the Cerebras CS-2 system. This performance and ease of use made rapid experimentation with model parameters, and thus a better-performing model, practical in an enterprise environment. The CS-2 system reduced training time by 15X compared to a leading 8-GPU server, delivering the compute performance of more than 120 GPUs. This impressive speedup enabled us to demonstrate dramatic improvements in resulting model quality using the Thomson Reuters TRC2 dataset, while almost halving energy consumption.