This particular architecture has a lower memory requirement than Vanilla Transformer and is similar to the Transformer-XL that models longer sequences efficiently. The below image depicts how the memory is compressed. We can also say that this is drawing some parallels to the human brain — We have a brilliant memory because of the power of compressing and storing information very intelligently. This sure seems interesting, doesn’t it?
In this article, we will be discussing Longformer, which overcomes one of the famous pitfalls of transformers — the inability to process long sequences because of its quadratic scaling with increase in the sequence length. The Longformer is a vanilla transformer with a change in the attention mechanism, which is a combination of local self-attention and a global attention.
GPT-3 was a massive model of 175 billion parameters, way more than GPT-2, Google’s T5 and Microsoft’s Turing NLG model. The main objective of GPT-3 was to improve the few-shot and zero-shot tasks with a large training data and computational parameters. The GPT-3 did not fail in achieving this objective and blew away all other language models in a plethora of language modelling tasks. Let’s dive deep into the world of GPT-3
Hello and welcome back to the NLP Tutorials Blog series! In this article we will understand the model which is a successor to the GPT model i.e GPT-2. GPT-2 was trained with a very simple objective: generate text and build coherent essays and paragraphs. GPT-2 is a huge model — 1.5 billion parameters! GPT-2 has more than 10x times parameters and 10x times training data than GPT-1 making it a scaled up version of GPT. GPT-2 was so good that the authors did not release the original trained models due to concerns about misuse of the AI.
Welcome back to yet another interesting article in the NLP Tutorials series wherein we will be advancing our proficiency from a Beginner to an expert in NLP. In this blog, we will be looking at an architecture which took the industry by storm. That’s right, it's the GPT (Generative Pre Training)! The GPT was published by OpenAI in 2018 and achieved an incredible state of the art performance in the majority of the popular NLP tasks. GPT is a way of training language models and comes under the category of semi-supervised learning. This means, it is trained on unlabeled text data and then fine-tuned on supervised (labelled) data for the downstream NLP tasks. Let’s dig deep and understand GPT in detail.