Welcome back to yet another interesting article in our NLP Tutorials series. In this article we will be talking about Transformer-XL which outperformed the Vanilla Transformer (Attention is All You Need) in accuracy metrics and handling long-term context dependencies which we often see in real world tasks.