Welcome to the Applied Singularity blog. Use this Contents post to browse through the full list of articles and Guided Learning Modules we have created or find specific topics of interest.
This particular architecture has a lower memory requirement than Vanilla Transformer and is similar to the Transformer-XL that models longer sequences efficiently. The below image depicts how the memory is compressed. We can also say that this is drawing some parallels to the human brain — We have a brilliant memory because of the power of compressing and storing information very intelligently. This sure seems interesting, doesn’t it?
In this article, we will be discussing Longformer, which overcomes one of the famous pitfalls of transformers — the inability to process long sequences because of its quadratic scaling with increase in the sequence length. The Longformer is a vanilla transformer with a change in the attention mechanism, which is a combination of local self-attention and a global attention.
GPT-3 was a massive model of 175 billion parameters, way more than GPT-2, Google’s T5 and Microsoft’s Turing NLG model. The main objective of GPT-3 was to improve the few-shot and zero-shot tasks with a large training data and computational parameters. The GPT-3 did not fail in achieving this objective and blew away all other language models in a plethora of language modelling tasks. Let’s dive deep into the world of GPT-3
Hello and welcome back to the NLP Tutorials Blog series! In this article we will understand the model which is a successor to the GPT model i.e GPT-2. GPT-2 was trained with a very simple objective: generate text and build coherent essays and paragraphs. GPT-2 is a huge model — 1.5 billion parameters! GPT-2 has more than 10x times parameters and 10x times training data than GPT-1 making it a scaled up version of GPT. GPT-2 was so good that the authors did not release the original trained models due to concerns about misuse of the AI.