Welcome back to the NLP Tutorials! Hope y’all had a good time reading my previous articles and were able to learn and make progress in your journey to NLP proficiency! In our previous post we looked at a project — Document Similarity using two vectorizers — CountVectorizer & Tf-Idf Vectorizer. I hope you tried your hand at Document Similarity with various other techniques and datasets. In this article we shall dive deep into the world of Text Embeddings, which are more advanced and sophisticated ways of representing text in vector form. There are many Word Embeddings out there, but in this article we shall have an overview of Word2Vec, one of the earliest and most famous Word Embeddings developed and published by Google. Let’s get started then!
Hello and welcome back to the NLP Tutorials Series! Today we will move forward on the Road to becoming proficient in NLP and delve into Text Representation and Word Embeddings. To put it in simple terms, Text Representation is a way to convert text in its natural form to vector form - Machines like it and understand it in this way only! The numbers/vectors form. This is the second step in an NLP pipeline after Text Pre-processing. Let’s get started with a sample corpus, pre-process and then keep ‘em ready for Text Representation.