Welcome to the Applied Singularity blog. Use this Contents post to browse through the full list of articles and Guided Learning Modules we have created or find specific topics of interest.

NLP Tutorials — Part 4: Word2Vec Embedding

Welcome back to the NLP Tutorials! Hope y’all had a good time reading my previous articles and were able to learn and make progress in your journey to NLP proficiency! In our previous post we looked at a project — Document Similarity using two vectorizers — CountVectorizer & Tf-Idf Vectorizer. I hope you tried your hand at Document Similarity with various other techniques and datasets. In this article we shall dive deep into the world of Text Embeddings, which are more advanced and sophisticated ways of representing text in vector form. There are many Word Embeddings out there, but in this article we shall have an overview of Word2Vec, one of the earliest and most famous Word Embeddings developed and published by Google. Let’s get started then!

NLP Tutorials — Part 3: Document Similarity

Welcome back to the NLP Tutorials! In our previous posts we had a detailed look at Text Representation & Word Embeddings, which are ways to accurately convert the text into vector form. The corpus in vector form is easily stored, accessible and can be used further for solving the NLP problem at hand. In this article, we shall try our hand at a small NLP problem -  Document Similarity/Text Similarity. Without wasting much time, let’s quickly get started!

NLP Tutorials — Part 2: Text Representation & Word Embeddings

Hello and welcome back to the NLP Tutorials Series! Today we will move forward on the Road to becoming proficient in NLP and delve into Text Representation and Word Embeddings. To put it in simple terms, Text Representation is a way to convert text in its natural form to vector form  - Machines like it and understand it in this way only! The numbers/vectors form. This is the second step in an NLP pipeline after Text Pre-processing. Let’s get started with a sample corpus, pre-process and then keep ‘em ready for Text Representation.

NLP Tutorials – Part 1: Beginner’s Guide to Text Pre-Processing

Natural Language Processing is a subdomain under Artificial Intelligence which deals with processing natural language data like text and speech. We can also term NLP as the "Art of extracting information from texts". Recently there has been a lot of activity in this field and amazing research coming out every day! But, the revolutionary research was the "Transformer" which opened up avenues to build massive Deep Learning models which can come very close to human-level tasks like Summarization and Question Answering. Then came the GPTs and BERTs which were massive models consisting of billions of computation parameters trained on very huge datasets and can be fine-tuned to a variety of NLP tasks and problem statements.


come back to another interesting read on the latest Advanced Object Detector architectures - the YOLOv4. YOLOv4 is the latest and one of the strongest state of the art object detectors now in the industry. Without wasting much time let's get straight into the YOLOv4 and understand why and how it became the new state-of-the-art with an mAP of ~45% @ 65 fps which is quite very real-time with a good performance.


Hello and welcome back to another article in the Advanced Object Detection series! In our last post, we ventured out of the YOLO detectors a bit and touched on RetinaNet architecture which introduced a novel loss function called FocalLoss (& 𝛂-balanced FocalLoss) and solved the huge class-imbalance problem observed in single-stage object detectors. Now, let's come back to the YOLO object detectors, specifically the YOLOv3. The YOLOv3 had some minor updates on top of the YOLOv2 which made it better and stronger, but sadly not as fast. The authors traded the speed with accuracy - accurate but not so fast. It matched the accuracy of the SSD by 3x faster @ ~22s inference time and higher scaled images (416x416) pushing it to sub 30fps inference times. It even comes close to RetinaNet in accuracy but is way faster. Let’s dig deep and understand the improvements made to YOLOv2 and why it’s slower but more accurate.


Welcome back to the Advanced Object Detection blog post series! In our previous posts, we had a thorough understanding of YOLOv1, YOLOv2, YOLO9000 & the SSD Multibox detector. All are State-of-the-Art detectors that outperform each other brilliantly. In this post, we shall talk about another one of them - RetinaNet. RetinaNet is quite different from the YOLOs & SSD in a few aspects, the main one being the loss function. The RetinaNet employs a Focal Loss function that focuses less on soft or easy negatives and focuses more on hard samples. This was the class imbalance problem observed in training an object detector. The architecture uses an FPN (Feature Pyramid Network) with ResNet as the backbone CNN outperforms the Faster R-CNN and won the Best Student Paper Award in ICCV (International Conference on Computer Vision) 2021.

SSD: Single Shot MultiBox Detector

Hi, welcome to the Advanced Object Detection Blog Post series i.e. Single-stage object detectors - notably the YOLO & SSD class of object detectors. In our previous posts, we had a detailed look at YOLO and YOLOv2 (and YOLO9000 too). Now, it’s time to introduce another state-of-the-art object detector - Single Shot Detector. It has few similarities to YOLO v2 - dividing the image into grid cells and using the anchor box approach for detection. YOLOv2 & SSD papers were published in 2016 in CVPR (International conference on Computer Vision & Pattern Recognition) & ECCV (European Conference on Computer Vision) conferences respectively. The SSD had a performance of 74.3% mAP @59 fps, which is better than both Faster R-CNN  (73% mAP @ 7 fps) & YOLO v1 (63.4% mAP @ 45 fps). Let’s get into the details of SSD, the architecture, loss function and we will compare it with YOLO v1 in various places in this article. Let’s assume that YOLOv2 is not yet out and we will always have this context of having Faster R-CNN & YOLO v1 as the most recent state-of-the-art object detectors. Well, let’s dive into SSD for now.


In our previous article, we had a detailed look at the YOLO architecture, why it is famous and performs so well - up to 45fps with an accuracy of more than 65%! Like any other architecture, it also had some flaws which needed to be solved to break the 45fps barrier and to better the mAP of 65%. Coming to the drawbacks of YOLO v1, it used to get the localization wrong when the objects appeared in a different aspect ratio and failed to detect multiple small objects like a flock of birds. Let’s see how the YOLOv1 was improved to a better, faster, and stronger YOLO with an mAP of more than 78%, which is a huge improvement over YOLO v1 in terms of accuracy but with a slower speed of 40fps. But, at 67fps YOLO v2 performs at 76% mAP trained on the PASCAL VOC 2007. And we will also understand why we had the title as YOLO9000!

YOLO: You Only Look Once

Hello, I’m back with an Advanced Object Detector article! We are now past the amateur stage if and only if you have read through all our previous articles on Object Detection, R-CNN, Fast R-CNN, SPPnet, Faster R-CNN & Mask R-CNN. Although Mask R-CNN is also a tad advanced, we considered learning related to the R-CNN Family of Object Detectors. We are on the right track to mastering Object Detection after learning the building blocks of an Object Detector and have a fair intuition about the mechanism of how various parts are put together for a fully functional Object Detector. We progressed through the R-CNN, Fast R-CNN & Faster R-CNN architectures and understood how evolution happens to improve the accuracy and reduce the inference time taken per image. 

Learning Temporal Coherence via Self-Supervision for GAN-based Video Generation (TecoGAN)

When learning-based algorithms were not nearly as good as they are today, this problem was mainly handled by handcrafted techniques, but they had their limits - after all if we don’t see something too well, how could we tell what’s there? And this is where new learning-based methods, especially TecoGAN, come into play. This is a hard enough problem for even a still image, yet this technique is able to do it really well even for videos.

Egocentric Videoconferencing

In this short article, we will look at the state of egocentric videoconferencing. Now, this doesn’t mean that only we get to speak during a meeting, it means that we are wearing a camera, which looks like the Input (in below video). The goal is to use a learning algorithm to synthesize this frontal view of us, you can see the recorded reference footage, which is the reality (Ground Truth). This real footage (Ground Truth) would need to be somehow synthesized by the algorithm, the predicted one from this algorithm (Predicted). If we could pull that off,  we could add a low-cost egocentric camera to smart glasses and it could pretend to see us from the front, which would be amazing for hands-free  videoconferencing. 

One-Shot 3D Photography

We all know that if we click a photo from a normal smartphone it will be in 2 Dimension, but can we convert it into a 3 Dimensional photo? The answer to it is 'YES'. The below image is a standard color photo made with a smartphone and hence, as we discussed, it contains only a 2D representation of the world. When we look at it, our brain is able to reconstruct the 3D information from it. Now, it is possible for an AI to do the same, and even go all the way to create a 3D version of this photo.

A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild

In today's world, everybody can make Deepfakes by recording a voice sample. So let's understand what this new method can do by example. Let’s watch the below short clip of a speech, and make sure to pay attention to the fact that the louder voice is the English translator. If you pay attention, you can hear the chancellor’s original voice in the background too. So what is the problem here? Honestly, there is no problem here, this is just the way the speech was recorded.

Deformable Neural Radiance Fields

In March 2020, a paper named Neural Radiance Fields, NeRF appeared. With this technique, we could take a bunch of input photos and train a neural network to learn them, and then synthesize new, previously unseen views of not just the materials in the scene but the entire scene itself. We can learn and reproduce entire real-world scenes from only a few views by using neural networks. However, it had some limitations, such as trouble with scenes with variable lighting conditions and occlusions.

COCO-FUNIT: Few-Shot Unsupervised Image Translation with a Content Conditioned Style Encoder

The research field of image translation with the aid of learning algorithms is improving at a fast speed. For example, this earlier technique would look at a large number of animal faces and could interpolate between them, or in other words, blend one kind of dog into another breed. It could even transform dogs into cats, or even further transform cats into cheetahs. The results were very good, but it would only work on the domains it was trained on i.e. it could only translate to and from species that it took the time to learn about.

IoT for indoor air quality monitoring (IAQ)

In the last blog post about indoor air quality monitoring we discussed the importance and need of monitoring indoor environments for indoor air pollutants and even went into details about what are the most commonly occurring indoor air pollutants. This post will shed light on how we can use IoT to solve the problem of monitoring indoor air quality in real time. The global indoor air quality monitoring market is expected to grow from USD 2.5 billion in 2015 to USD 4.6 billion by 2022[1]. There are many solutions available in the market today from vendors like Honeywell and lesser known startups around this problem. 

Clustering in ML – Part 3: Density-Based Clustering

In the previous post, we learned about k-means which is easy to understand and implement in practice. That algorithm has no notion of outliers, so all points are assigned to a cluster even if they do not belong in any. In the domain of anomaly detection, this causes problems as anomalous points will be assigned to the same cluster as “normal” data points.

The impact of environmental toxins on cellular and molecular pathways in Neural Crest development

At present, the world is using many environmental chemicals to improve economic profit in various sectors such as agriculture, aviation, pharmaceutical, mining, and in many polymer companies, which are resulting in a severe negative impact on human health and the environment. Over the last two decades, people are often exposed to chemicals leading to many disorders such as cancer, neurodegeneration, and congenital diseases.

Baby Spinach-based Minimal Modified Sensor (BSMS) for nucleic acid analysis

Baby Spinach-based Minimal Modified Sensor (BSMS) is used for detection of small molecules. It serves as a platform for detection of an array of biomolecules such as small molecules, nucleic acids, peptides, proteins, etc. The compact nature of the BSMS sensor provides flexibility to introduce many different changes in the design. BSMS is an excellent sensitive sensor, detecting targets present in small amounts, even in nano molar (nM) range.

Object Detection – Part 6: Mask R-CNN

In our last blog post, we went through the Faster R-CNN architecture for Object Detection, which remains one of the State-of-the-Art architectures till date! The Faster R-CNN has a very low inference time per image of just ~0.2s (5 fps), which was a huge improvement from the ~45-50s per image from the R-CNN. So far, we have understood the evolution of R-CNN into Fast R-CNN and Faster R-CNN in terms of simplifying the architecture, reducing training and inference times and increasing the mAP (Mean Average Precision). This article is about taking a step further from Object Detection to Instance Segmentation. Instance Segmentation is the identification of boundaries of the detected objects at pixel levels. It is a step further from Semantic Segmentation, which will group similar entities and give a common mask to differentiate from other objects. Instance segmentation labels each object under the same class as a different instance itself.

Object Detection – Part 5: Faster R-CNN

In our previous articles, we understood few limitations of R-CNN and how SPP-net & Fast R-CNN have solved the issues to a great extent leading to an enormous decrease in inference time to ~2s per test image, which is an improvement over the ~45-50s of the R-CNN. But even after such a speedup, there are still some flaws as well as enhancements that can be made for deploying it in an exceedingly real-time 30fps or 60fps video feed. As we know from our previous blog, the Fast R-CNN & SPP-net are still multi-stage training and involve the Selective Search Algorithm for generating the regions. This is often a huge bottleneck for the entire system because it takes plenty of time for the Selective Search Algorithm to generate ~2000 region proposals. This problem was solved in Faster R-CNN - the widely used State-of-the-Art version in the R-CNN family of Object Detectors. We’ve seen the evolution of architectures in the R-CNN family where the main improvements were computational efficiency, accuracy, and reduction of test time per image. Let's dive into Faster R-CNN now!

Interactive Video Stylization Using Few-Shot Patch-Based Training

Style transfer is an interesting problem in machine learning research where we have two input images, one for content and another for style, and the output is our content image re-imagined with this new style. The content can be a photo straight from our camera, and the style can be a painting, which leads to … Continue reading Interactive Video Stylization Using Few-Shot Patch-Based Training

Ensemble Learning in ML – Part 3: Stacking

In the previous posts, we discussed Bagging and Boosting ensemble learning in ML and how they are useful. We also discussed the algorithms which are based on it i.e. Ada Boost and Gradient Boosting. In this part, we will discuss another ensemble learning technique known as Stacking. We also discuss a bit about Blending (another … Continue reading Ensemble Learning in ML – Part 3: Stacking

Ensemble Learning in ML – Part 2: Boosting

In the last part, we discussed what ensembling learning in ML is and how it is useful. We also discussed one ensemble learning technique - Bagging - and algorithms that are based on it i.e. Bagging meta-estimator and Random Forest. In this part, we will discuss another ensemble learning technique, which is known as Boosting, … Continue reading Ensemble Learning in ML – Part 2: Boosting

Ensemble Learning in ML – Part 1: Bagging

Let’s understand ensemble learning with an example. Suppose you have a startup idea and you wanted to know whether that idea is good to move ahead with it or not. Now, you want to take preliminary feedback on it before committing money and your precious time to it. So you may ask one of your … Continue reading Ensemble Learning in ML – Part 1: Bagging

Workflow and Implications of membrane lipids

Membrane lipids play diverse roles in cellular function. On the membrane, they act as structural elements, serve as a pool of secondary messengers and act as a platform for membrane proteins.  Phosphatidyinositol (PI) plays an important role in signal transduction and membrane trafficking. The PI on the plasma membrane is phosphorylated to P1 and P2 … Continue reading Workflow and Implications of membrane lipids

Discovery of novel molecular pathways linked to Insulin Resistance

Insulin resistance (IR) is a clinical and major pathological condition that occurs due to inappropriate cell response to insulin hormone and abnormal secretion in the body. The decrease in insulin sensitivity leads to the progression of many metabolic disorders such as auto-immune diseases, type-1 diabetes mellitus (T1DM), obesity, atherosclerosis, cardiovascular diseases, etc. In some cases … Continue reading Discovery of novel molecular pathways linked to Insulin Resistance

PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization

Today, a variety of techniques exist that can take an image that contains humans and perform pose estimation on it. This gives us interesting skeletons that show us the current posture of the subjects shown in the given images. Having a skeleton opens up the possibility for many cool applications, for instance, it’s great for … Continue reading PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization

CLEVRER: CoLlision Events for Video REpresentation and Reasoning

I recently came across a paper "CLEVRER" ("CoLlision Events for Video REpresentation and Reasoning", by - Kexin Yi, Chuang Gan, Yunzhu Li, Pushmeet Kohli, Jiajun Wu, Antonio Torralba, Joshua B. Tenenbaum). It intrigued me, so I wanted to share some thoughts about it. With the advancements in NN-based learning algorithms, many of us are wondering … Continue reading CLEVRER: CoLlision Events for Video REpresentation and Reasoning