Blog

Contents

Welcome to the Applied Singularity blog. Use this Contents post to browse through the full list of articles and Guided Learning Modules we have created or find specific topics of interest.

NLP Tutorials — Part 24: Named Entity Recognition

In this article we won’t be looking at high-end architectures but instead explore a key concept in the NLP domain — Named Entity Recognition (NER). We might have heard of this term as one of the important concepts that has a lot of applications in real-world scenarios. Let’s understand what it means and how we can create a NER model using a few popular NLP libraries.

NLP Tutorials — Part 23: Fastformer: Additive Attention Can Be All You Need

Hello and welcome back to an article where we are going to discuss an architecture that had mixed impressions. Some called it brilliant and some of them said, “Nah, this ain’t no transformer!” The architecture I’m talking about is the Fastformer: Additive Attention can be all you need. As we all know by now, Transformers are quite inefficient while scaling up and we have seen a plethora of architectures that claim to mitigate this in their own ways.

NLP Tutorials — Part 21: Linformer: Self-attention with Linear Complexity 

Transformer has been a breakthrough architecture which has fared excellently in both NLP and Computer Vision and learning about these kinds of architectures is always beneficial in the long run. I promise a hands-on exercise in our next article wherein we will use various architectures, pick a dataset and observe the performance. For now, let’s get on with Linformer.

NLP Tutorials — Part 20: Compressive Transformer

This particular architecture has a lower memory requirement than Vanilla Transformer and is similar to the Transformer-XL that models longer sequences efficiently. The below image depicts how the memory is compressed. We can also say that this is drawing some parallels to the human brain — We have a brilliant memory because of the power of compressing and storing information very intelligently. This sure seems interesting, doesn’t it?

NLP Tutorials — Part 19: Longformer: Long Document Transformer

In this article, we will be discussing Longformer, which overcomes one of the famous pitfalls of transformers — the inability to process long sequences because of its quadratic scaling with increase in the sequence length. The Longformer is a vanilla transformer with a change in the attention mechanism, which is a combination of local self-attention and a global attention. 

NLP Tutorials — Part 18: GPT-3

GPT-3 was a massive model of 175 billion parameters, way more than GPT-2, Google’s T5 and Microsoft’s Turing NLG model. The main objective of GPT-3 was to improve the few-shot and zero-shot tasks with a large training data and computational parameters. The GPT-3 did not fail in achieving this objective and blew away all other language models in a plethora of language modelling tasks. Let’s dive deep into the world of GPT-3

NLP Tutorials — Part 17: GPT-2

Hello and welcome back to the NLP Tutorials Blog series! In this article we will understand the model which is a successor to the GPT model i.e GPT-2. GPT-2 was trained with a very simple objective: generate text and build coherent essays and paragraphs. GPT-2 is a huge model — 1.5 billion parameters! GPT-2 has more than 10x times parameters and 10x times training data than GPT-1 making it a scaled up version of GPT. GPT-2 was so good that the authors did not release the original trained models due to concerns about misuse of the AI.

NLP Tutorials — Part 16: Generative Pre-Training

Welcome back to yet another interesting article in the NLP Tutorials series wherein we will be advancing our proficiency from a Beginner to an expert in NLP. In this blog, we will be looking at an architecture which took the industry by storm. That’s right, it's the GPT (Generative Pre Training)! The GPT was published by OpenAI in 2018 and achieved an incredible state of the art performance in the majority of the popular NLP tasks. GPT is a way of training language models and comes under the category of semi-supervised learning. This means, it is trained on unlabeled text data and then fine-tuned on supervised (labelled) data for the downstream NLP tasks. Let’s dig deep and understand GPT in detail.

NLP Tutorials — Part 15: DistilBERT

Hello and welcome back to the NLP tutorials series where we inspire you to go through the ranks of NLP expertise all the way to Expert. If you follow all our articles in this tutorial series, no doubt you will gain valuable technical knowledge in the NLP domain. In our previous articles, we had an in-depth look at BERT and one of its improvements (accuracy). In this article we shall rather address a huge problem coming our way — the computational requirement for training these massive language models is going out of hands. 

NLP Tutorials — Part 14: RoBERTa

Hello and welcome back to yet another interesting article in the NLP tutorials series! We are here to explore a model which is an improvement over the massively famous NLP language model — BERT. Robustly Optimized BERT Pretraining approach or RoBERTa performs a good 15–20% better than BERT due to careful hyperparameter tuning and bigger datasets. The authors thought that the BERT is very under-trained and if given more data with hyperparameter tuning, its full potential of performance can be achieved. Let’s quickly get started and understand how the authors were able to achieve the performance bump over conventional BERT

NLP Tutorials — Part 13: BERT

Welcoming you to an article on BERT. Yes, you heard it right! What a journey we have had starting right from the basics all the way till BERT. Finally we are at the proficiency required to understand one of the highly capable models on a variety of NLP tasks like Text Classification, Question Answering, Named Entity Recognition with very little training. Bidirectional Encoder Representations from Transformers or BERT is a semi-supervised language model trained on huge corpus of data and then fine-tuned on custom data to achieve SOTA results. Without wasting much time let’s jump straight into the technicalities of BERT.

NLP Tutorials — Part 11: Transformers

Hello all and welcome back to yet another interesting concept which has time and again proven as one of the best methods to solve major NLP problems with State-of-the-Art accuracy which are near human in performance! That architecture is known as the “Transformers”. The important gain by Transformers was to enable parallelization which wasn’t on offer in the previous model we saw — “Seq2Seq”. In this blog, we shall navigate through the Transformer architecture in detail and understand why it is the breakthrough architecture in recent years.

NLP Tutorials — Part 10: Encoder-Decoder RNNs

Warm welcome to another interesting article in the NLP Tutorials series. In this article we will try to understand an architecture which forms the base for advanced models like Attention, Transformers, GPT, and BERT. This is widely used in machine and language translation tasks. The encoder will encode the input to a fixed-length internal representation which is then taken by the decoder to output words in another form/language. Nowadays, we are seeing multi-modal tasks being performed using a single model i.e Text translation from English to French, Spanish and German language using a single model! Since the input and output are always text in the form of sequences, this architecture is popularly known as Seq2Seq. 

NLP Tutorials — Part 9: Bi-LSTMs & GRUs

Hola! Welcome back to the follow-up article on LSTMs. In this article we shall discuss 2 more architectures which are very similar to LSTMs. They are Bi-LSTMs and GRUs (Gated Recurrent Units). As we saw in our previous article, the LSTM was able to solve most problems of vanilla RNNs and solve a few important NLP problems easily with good data. The Bi-LSTM and GRU can be treated as architectures which have evolved from LSTMs. The core idea will be the same with a few improvements here and there.

NLP Tutorials — Part 8: Long Short Term Memory (LSTMs)

Hi and welcome back to yet another article in the NLP Tutorials series. So far, we have covered a few important concepts, architectures and projects which are important in the NLP domain, the latest being Recurrent Neural Networks (RNNs). Time to move a step ahead and understand about an architecture which is advanced and performs excellently over RNNs. You heard it right — Long Short Term Memory (LSTMs) networks. Without wasting time, let us first go through a few disadvantages of RNNs and what did not work for them, which in turn will set the context right for understanding LSTM and how it was able to solve the problems of RNNs.

NLP Tutorials-Part 7: Recurrent Neural Networks

Welcome back to another exciting article in the NLP Tutorials series. It's time to fully delve into Deep Learning for NLP! In NLP, it is very important that we remember things and retain the context very well. For example, as humans we learn progressively, word by word, sentence by sentence, the way you are doing now. We understand things by reading and thinking progressively and we need to have a memory to retain things and maintain the context for that particular task. 

NLP Tutorials — Part 6: Text Classification

Hello again, glad to welcome you back to this article on Text Classification in the NLP Tutorials series. In our previous posts we had a detailed overview on the fundamental text representation — CountVectorizer & Tf-Idf Vectorizer and also the two most prominent Word Embeddings — Word2Vec & GloVe. In this article we will put our knowledge to task — Build a Text Classification model using all these techniques and analyse the results.

NLP Tutorials —  Part 5: GloVe

Hello and welcome back to the NLP Tutorials! In our previous article we had a discussion on one of the popular Word Embedding technique — Word2Vec. It was a revolutionary word representation technique which changed the face of solving NLP problems. Although Word2Vec was good, it still has a few drawbacks which were strongly overcome by the GloVe Word Embeddings. GloVe stands for Global Vectors. This embedding model is mainly based on capturing vector statistics in global context. Due to capturing more data on a global level (document), it is high-dimensional and memory intensive but gives excellent results in a majority of NLP tasks. Let’s quickly get into the details of GloVe embeddings.

NLP Tutorials — Part 4: Word2Vec Embedding

Welcome back to the NLP Tutorials! Hope y’all had a good time reading my previous articles and were able to learn and make progress in your journey to NLP proficiency! In our previous post we looked at a project — Document Similarity using two vectorizers — CountVectorizer & Tf-Idf Vectorizer. I hope you tried your hand at Document Similarity with various other techniques and datasets. In this article we shall dive deep into the world of Text Embeddings, which are more advanced and sophisticated ways of representing text in vector form. There are many Word Embeddings out there, but in this article we shall have an overview of Word2Vec, one of the earliest and most famous Word Embeddings developed and published by Google. Let’s get started then!

NLP Tutorials — Part 3: Document Similarity

Welcome back to the NLP Tutorials! In our previous posts we had a detailed look at Text Representation & Word Embeddings, which are ways to accurately convert the text into vector form. The corpus in vector form is easily stored, accessible and can be used further for solving the NLP problem at hand. In this article, we shall try our hand at a small NLP problem -  Document Similarity/Text Similarity. Without wasting much time, let’s quickly get started!

NLP Tutorials — Part 2: Text Representation & Word Embeddings

Hello and welcome back to the NLP Tutorials Series! Today we will move forward on the Road to becoming proficient in NLP and delve into Text Representation and Word Embeddings. To put it in simple terms, Text Representation is a way to convert text in its natural form to vector form  - Machines like it and understand it in this way only! The numbers/vectors form. This is the second step in an NLP pipeline after Text Pre-processing. Let’s get started with a sample corpus, pre-process and then keep ‘em ready for Text Representation.

NLP Tutorials – Part 1: Beginner’s Guide to Text Pre-Processing

Natural Language Processing is a subdomain under Artificial Intelligence which deals with processing natural language data like text and speech. We can also term NLP as the "Art of extracting information from texts". Recently there has been a lot of activity in this field and amazing research coming out every day! But, the revolutionary research was the "Transformer" which opened up avenues to build massive Deep Learning models which can come very close to human-level tasks like Summarization and Question Answering. Then came the GPTs and BERTs which were massive models consisting of billions of computation parameters trained on very huge datasets and can be fine-tuned to a variety of NLP tasks and problem statements.

YOLOv4

come back to another interesting read on the latest Advanced Object Detector architectures - the YOLOv4. YOLOv4 is the latest and one of the strongest state of the art object detectors now in the industry. Without wasting much time let's get straight into the YOLOv4 and understand why and how it became the new state-of-the-art with an mAP of ~45% @ 65 fps which is quite very real-time with a good performance.

YOLOv3

Hello and welcome back to another article in the Advanced Object Detection series! In our last post, we ventured out of the YOLO detectors a bit and touched on RetinaNet architecture which introduced a novel loss function called FocalLoss (& 𝛂-balanced FocalLoss) and solved the huge class-imbalance problem observed in single-stage object detectors. Now, let's come back to the YOLO object detectors, specifically the YOLOv3. The YOLOv3 had some minor updates on top of the YOLOv2 which made it better and stronger, but sadly not as fast. The authors traded the speed with accuracy - accurate but not so fast. It matched the accuracy of the SSD by 3x faster @ ~22s inference time and higher scaled images (416x416) pushing it to sub 30fps inference times. It even comes close to RetinaNet in accuracy but is way faster. Let’s dig deep and understand the improvements made to YOLOv2 and why it’s slower but more accurate.

RetinaNet

Welcome back to the Advanced Object Detection blog post series! In our previous posts, we had a thorough understanding of YOLOv1, YOLOv2, YOLO9000 & the SSD Multibox detector. All are State-of-the-Art detectors that outperform each other brilliantly. In this post, we shall talk about another one of them - RetinaNet. RetinaNet is quite different from the YOLOs & SSD in a few aspects, the main one being the loss function. The RetinaNet employs a Focal Loss function that focuses less on soft or easy negatives and focuses more on hard samples. This was the class imbalance problem observed in training an object detector. The architecture uses an FPN (Feature Pyramid Network) with ResNet as the backbone CNN outperforms the Faster R-CNN and won the Best Student Paper Award in ICCV (International Conference on Computer Vision) 2021.

SSD: Single Shot MultiBox Detector

Hi, welcome to the Advanced Object Detection Blog Post series i.e. Single-stage object detectors - notably the YOLO & SSD class of object detectors. In our previous posts, we had a detailed look at YOLO and YOLOv2 (and YOLO9000 too). Now, it’s time to introduce another state-of-the-art object detector - Single Shot Detector. It has few similarities to YOLO v2 - dividing the image into grid cells and using the anchor box approach for detection. YOLOv2 & SSD papers were published in 2016 in CVPR (International conference on Computer Vision & Pattern Recognition) & ECCV (European Conference on Computer Vision) conferences respectively. The SSD had a performance of 74.3% mAP @59 fps, which is better than both Faster R-CNN  (73% mAP @ 7 fps) & YOLO v1 (63.4% mAP @ 45 fps). Let’s get into the details of SSD, the architecture, loss function and we will compare it with YOLO v1 in various places in this article. Let’s assume that YOLOv2 is not yet out and we will always have this context of having Faster R-CNN & YOLO v1 as the most recent state-of-the-art object detectors. Well, let’s dive into SSD for now.

YOLO9000

In our previous article, we had a detailed look at the YOLO architecture, why it is famous and performs so well - up to 45fps with an accuracy of more than 65%! Like any other architecture, it also had some flaws which needed to be solved to break the 45fps barrier and to better the mAP of 65%. Coming to the drawbacks of YOLO v1, it used to get the localization wrong when the objects appeared in a different aspect ratio and failed to detect multiple small objects like a flock of birds. Let’s see how the YOLOv1 was improved to a better, faster, and stronger YOLO with an mAP of more than 78%, which is a huge improvement over YOLO v1 in terms of accuracy but with a slower speed of 40fps. But, at 67fps YOLO v2 performs at 76% mAP trained on the PASCAL VOC 2007. And we will also understand why we had the title as YOLO9000!

YOLO: You Only Look Once

Hello, I’m back with an Advanced Object Detector article! We are now past the amateur stage if and only if you have read through all our previous articles on Object Detection, R-CNN, Fast R-CNN, SPPnet, Faster R-CNN & Mask R-CNN. Although Mask R-CNN is also a tad advanced, we considered learning related to the R-CNN Family of Object Detectors. We are on the right track to mastering Object Detection after learning the building blocks of an Object Detector and have a fair intuition about the mechanism of how various parts are put together for a fully functional Object Detector. We progressed through the R-CNN, Fast R-CNN & Faster R-CNN architectures and understood how evolution happens to improve the accuracy and reduce the inference time taken per image. 

Learning Temporal Coherence via Self-Supervision for GAN-based Video Generation (TecoGAN)

When learning-based algorithms were not nearly as good as they are today, this problem was mainly handled by handcrafted techniques, but they had their limits - after all if we don’t see something too well, how could we tell what’s there? And this is where new learning-based methods, especially TecoGAN, come into play. This is a hard enough problem for even a still image, yet this technique is able to do it really well even for videos.

Egocentric Videoconferencing

In this short article, we will look at the state of egocentric videoconferencing. Now, this doesn’t mean that only we get to speak during a meeting, it means that we are wearing a camera, which looks like the Input (in below video). The goal is to use a learning algorithm to synthesize this frontal view of us, you can see the recorded reference footage, which is the reality (Ground Truth). This real footage (Ground Truth) would need to be somehow synthesized by the algorithm, the predicted one from this algorithm (Predicted). If we could pull that off,  we could add a low-cost egocentric camera to smart glasses and it could pretend to see us from the front, which would be amazing for hands-free  videoconferencing. 

One-Shot 3D Photography

We all know that if we click a photo from a normal smartphone it will be in 2 Dimension, but can we convert it into a 3 Dimensional photo? The answer to it is 'YES'. The below image is a standard color photo made with a smartphone and hence, as we discussed, it contains only a 2D representation of the world. When we look at it, our brain is able to reconstruct the 3D information from it. Now, it is possible for an AI to do the same, and even go all the way to create a 3D version of this photo.

A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild

In today's world, everybody can make Deepfakes by recording a voice sample. So let's understand what this new method can do by example. Let’s watch the below short clip of a speech, and make sure to pay attention to the fact that the louder voice is the English translator. If you pay attention, you can hear the chancellor’s original voice in the background too. So what is the problem here? Honestly, there is no problem here, this is just the way the speech was recorded.

Deformable Neural Radiance Fields

In March 2020, a paper named Neural Radiance Fields, NeRF appeared. With this technique, we could take a bunch of input photos and train a neural network to learn them, and then synthesize new, previously unseen views of not just the materials in the scene but the entire scene itself. We can learn and reproduce entire real-world scenes from only a few views by using neural networks. However, it had some limitations, such as trouble with scenes with variable lighting conditions and occlusions.

COCO-FUNIT: Few-Shot Unsupervised Image Translation with a Content Conditioned Style Encoder

The research field of image translation with the aid of learning algorithms is improving at a fast speed. For example, this earlier technique would look at a large number of animal faces and could interpolate between them, or in other words, blend one kind of dog into another breed. It could even transform dogs into cats, or even further transform cats into cheetahs. The results were very good, but it would only work on the domains it was trained on i.e. it could only translate to and from species that it took the time to learn about.

IoT for indoor air quality monitoring (IAQ)

In the last blog post about indoor air quality monitoring we discussed the importance and need of monitoring indoor environments for indoor air pollutants and even went into details about what are the most commonly occurring indoor air pollutants. This post will shed light on how we can use IoT to solve the problem of monitoring indoor air quality in real time. The global indoor air quality monitoring market is expected to grow from USD 2.5 billion in 2015 to USD 4.6 billion by 2022[1]. There are many solutions available in the market today from vendors like Honeywell and lesser known startups around this problem. 

Clustering in ML – Part 3: Density-Based Clustering

In the previous post, we learned about k-means which is easy to understand and implement in practice. That algorithm has no notion of outliers, so all points are assigned to a cluster even if they do not belong in any. In the domain of anomaly detection, this causes problems as anomalous points will be assigned to the same cluster as “normal” data points.

The impact of environmental toxins on cellular and molecular pathways in Neural Crest development

At present, the world is using many environmental chemicals to improve economic profit in various sectors such as agriculture, aviation, pharmaceutical, mining, and in many polymer companies, which are resulting in a severe negative impact on human health and the environment. Over the last two decades, people are often exposed to chemicals leading to many disorders such as cancer, neurodegeneration, and congenital diseases.

Baby Spinach-based Minimal Modified Sensor (BSMS) for nucleic acid analysis

Baby Spinach-based Minimal Modified Sensor (BSMS) is used for detection of small molecules. It serves as a platform for detection of an array of biomolecules such as small molecules, nucleic acids, peptides, proteins, etc. The compact nature of the BSMS sensor provides flexibility to introduce many different changes in the design. BSMS is an excellent sensitive sensor, detecting targets present in small amounts, even in nano molar (nM) range.

Object Detection – Part 6: Mask R-CNN

In our last blog post, we went through the Faster R-CNN architecture for Object Detection, which remains one of the State-of-the-Art architectures till date! The Faster R-CNN has a very low inference time per image of just ~0.2s (5 fps), which was a huge improvement from the ~45-50s per image from the R-CNN. So far, we have understood the evolution of R-CNN into Fast R-CNN and Faster R-CNN in terms of simplifying the architecture, reducing training and inference times and increasing the mAP (Mean Average Precision). This article is about taking a step further from Object Detection to Instance Segmentation. Instance Segmentation is the identification of boundaries of the detected objects at pixel levels. It is a step further from Semantic Segmentation, which will group similar entities and give a common mask to differentiate from other objects. Instance segmentation labels each object under the same class as a different instance itself.

Object Detection – Part 5: Faster R-CNN

In our previous articles, we understood few limitations of R-CNN and how SPP-net & Fast R-CNN have solved the issues to a great extent leading to an enormous decrease in inference time to ~2s per test image, which is an improvement over the ~45-50s of the R-CNN. But even after such a speedup, there are still some flaws as well as enhancements that can be made for deploying it in an exceedingly real-time 30fps or 60fps video feed. As we know from our previous blog, the Fast R-CNN & SPP-net are still multi-stage training and involve the Selective Search Algorithm for generating the regions. This is often a huge bottleneck for the entire system because it takes plenty of time for the Selective Search Algorithm to generate ~2000 region proposals. This problem was solved in Faster R-CNN - the widely used State-of-the-Art version in the R-CNN family of Object Detectors. We’ve seen the evolution of architectures in the R-CNN family where the main improvements were computational efficiency, accuracy, and reduction of test time per image. Let's dive into Faster R-CNN now!

Discovery of novel molecular pathways linked to Insulin Resistance

Insulin resistance (IR) is a clinical and major pathological condition that occurs due to inappropriate cell response to insulin hormone and abnormal secretion in the body. The decrease in insulin sensitivity leads to the progression of many metabolic disorders such as auto-immune diseases, type-1 diabetes mellitus (T1DM), obesity, atherosclerosis, cardiovascular diseases, etc. In some cases … Continue reading Discovery of novel molecular pathways linked to Insulin Resistance