Transformer has been a breakthrough architecture which has fared excellently in both NLP and Computer Vision and learning about these kinds of architectures is always beneficial in the long run. I promise a hands-on exercise in our next article wherein we will use various architectures, pick a dataset and observe the performance. For now, let’s get on with Linformer.
Hello all and welcome back to yet another interesting concept which has time and again proven as one of the best methods to solve major NLP problems with State-of-the-Art accuracy which are near human in performance! That architecture is known as the “Transformers”. The important gain by Transformers was to enable parallelization which wasn’t on offer in the previous model we saw — “Seq2Seq”. In this blog, we shall navigate through the Transformer architecture in detail and understand why it is the breakthrough architecture in recent years.