Hello and welcome back to the NLP tutorials series where we inspire you to go through the ranks of NLP expertise all the way to Expert. If you follow all our articles in this tutorial series, no doubt you will gain valuable technical knowledge in the NLP domain. In our previous articles, we had an in-depth look at BERT and one of its improvements (accuracy). In this article we shall rather address a huge problem coming our way — the computational requirement for training these massive language models is going out of hands.
Hello and welcome back to yet another interesting article in the NLP tutorials series! We are here to explore a model which is an improvement over the massively famous NLP language model — BERT. Robustly Optimized BERT Pretraining approach or RoBERTa performs a good 15–20% better than BERT due to careful hyperparameter tuning and bigger datasets. The authors thought that the BERT is very under-trained and if given more data with hyperparameter tuning, its full potential of performance can be achieved. Let’s quickly get started and understand how the authors were able to achieve the performance bump over conventional BERT
Welcoming you to an article on BERT. Yes, you heard it right! What a journey we have had starting right from the basics all the way till BERT. Finally we are at the proficiency required to understand one of the highly capable models on a variety of NLP tasks like Text Classification, Question Answering, Named Entity Recognition with very little training. Bidirectional Encoder Representations from Transformers or BERT is a semi-supervised language model trained on huge corpus of data and then fine-tuned on custom data to achieve SOTA results. Without wasting much time let’s jump straight into the technicalities of BERT.