NLP Tutorials — Part 14: RoBERTa

Hello and welcome back to yet another interesting article in the NLP tutorials series! We are here to explore a model which is an improvement over the massively famous NLP language model — BERT. Robustly Optimized BERT Pretraining approach or RoBERTa performs a good 15–20% better than BERT due to careful hyperparameter tuning and bigger datasets. The authors thought that the BERT is very under-trained and if given more data with hyperparameter tuning, its full potential of performance can be achieved. Let’s quickly get started and understand how the authors were able to achieve the performance bump over conventional BERT