Hello again, glad to welcome you back to this article on Text Classification in the NLP Tutorials series. In our previous posts we had a detailed overview on the fundamental text representation — CountVectorizer & Tf-Idf Vectorizer and also the two most prominent Word Embeddings — Word2Vec & GloVe. In this article we will put our knowledge to task — Build a Text Classification model using all these techniques and analyse the results.
Hello and welcome back to the NLP Tutorials! In our previous article we had a discussion on one of the popular Word Embedding technique — Word2Vec. It was a revolutionary word representation technique which changed the face of solving NLP problems. Although Word2Vec was good, it still has a few drawbacks which were strongly overcome by the GloVe Word Embeddings. GloVe stands for Global Vectors. This embedding model is mainly based on capturing vector statistics in global context. Due to capturing more data on a global level (document), it is high-dimensional and memory intensive but gives excellent results in a majority of NLP tasks. Let’s quickly get into the details of GloVe embeddings.
Hello and welcome back to the NLP Tutorials Series! Today we will move forward on the Road to becoming proficient in NLP and delve into Text Representation and Word Embeddings. To put it in simple terms, Text Representation is a way to convert text in its natural form to vector form - Machines like it and understand it in this way only! The numbers/vectors form. This is the second step in an NLP pipeline after Text Pre-processing. Let’s get started with a sample corpus, pre-process and then keep ‘em ready for Text Representation.