Blog

Contents

Welcome to the Applied Singularity blog. Use this Contents post to browse through the full list of articles and Guided Learning Modules we have created or find specific topics of interest.

One-Shot 3D Photography

We all know that if we click a photo from a normal smartphone it will be in 2 Dimension, but can we convert it into a 3 Dimensional photo? The answer to it is 'YES'. The below image is a standard color photo made with a smartphone and hence, as we discussed, it contains only a 2D representation of the world. When we look at it, our brain is able to reconstruct the 3D information from it. Now, it is possible for an AI to do the same, and even go all the way to create a 3D version of this photo.

A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild

In today's world, everybody can make Deepfakes by recording a voice sample. So let's understand what this new method can do by example. Let’s watch the below short clip of a speech, and make sure to pay attention to the fact that the louder voice is the English translator. If you pay attention, you can hear the chancellor’s original voice in the background too. So what is the problem here? Honestly, there is no problem here, this is just the way the speech was recorded.

Deformable Neural Radiance Fields

In March 2020, a paper named Neural Radiance Fields, NeRF appeared. With this technique, we could take a bunch of input photos and train a neural network to learn them, and then synthesize new, previously unseen views of not just the materials in the scene but the entire scene itself. We can learn and reproduce entire real-world scenes from only a few views by using neural networks. However, it had some limitations, such as trouble with scenes with variable lighting conditions and occlusions.

COCO-FUNIT: Few-Shot Unsupervised Image Translation with a Content Conditioned Style Encoder

The research field of image translation with the aid of learning algorithms is improving at a fast speed. For example, this earlier technique would look at a large number of animal faces and could interpolate between them, or in other words, blend one kind of dog into another breed. It could even transform dogs into cats, or even further transform cats into cheetahs. The results were very good, but it would only work on the domains it was trained on i.e. it could only translate to and from species that it took the time to learn about.

IoT for indoor air quality monitoring (IAQ)

In the last blog post about indoor air quality monitoring we discussed the importance and need of monitoring indoor environments for indoor air pollutants and even went into details about what are the most commonly occurring indoor air pollutants. This post will shed light on how we can use IoT to solve the problem of monitoring indoor air quality in real time. The global indoor air quality monitoring market is expected to grow from USD 2.5 billion in 2015 to USD 4.6 billion by 2022[1]. There are many solutions available in the market today from vendors like Honeywell and lesser known startups around this problem. 

Clustering in ML – Part 3: Density-Based Clustering

In the previous post, we learned about k-means which is easy to understand and implement in practice. That algorithm has no notion of outliers, so all points are assigned to a cluster even if they do not belong in any. In the domain of anomaly detection, this causes problems as anomalous points will be assigned to the same cluster as “normal” data points.

The impact of environmental toxins on cellular and molecular pathways in Neural Crest development

At present, the world is using many environmental chemicals to improve economic profit in various sectors such as agriculture, aviation, pharmaceutical, mining, and in many polymer companies, which are resulting in a severe negative impact on human health and the environment. Over the last two decades, people are often exposed to chemicals leading to many disorders such as cancer, neurodegeneration, and congenital diseases.

Baby Spinach-based Minimal Modified Sensor (BSMS) for nucleic acid analysis

Baby Spinach-based Minimal Modified Sensor (BSMS) is used for detection of small molecules. It serves as a platform for detection of an array of biomolecules such as small molecules, nucleic acids, peptides, proteins, etc. The compact nature of the BSMS sensor provides flexibility to introduce many different changes in the design. BSMS is an excellent sensitive sensor, detecting targets present in small amounts, even in nano molar (nM) range.

Object Detection – Part 6: Mask R-CNN

In our last blog post, we went through the Faster R-CNN architecture for Object Detection, which remains one of the State-of-the-Art architectures till date! The Faster R-CNN has a very low inference time per image of just ~0.2s (5 fps), which was a huge improvement from the ~45-50s per image from the R-CNN. So far, we have understood the evolution of R-CNN into Fast R-CNN and Faster R-CNN in terms of simplifying the architecture, reducing training and inference times and increasing the mAP (Mean Average Precision). This article is about taking a step further from Object Detection to Instance Segmentation. Instance Segmentation is the identification of boundaries of the detected objects at pixel levels. It is a step further from Semantic Segmentation, which will group similar entities and give a common mask to differentiate from other objects. Instance segmentation labels each object under the same class as a different instance itself.

Object Detection – Part 5: Faster R-CNN

In our previous articles, we understood few limitations of R-CNN and how SPP-net & Fast R-CNN have solved the issues to a great extent leading to an enormous decrease in inference time to ~2s per test image, which is an improvement over the ~45-50s of the R-CNN. But even after such a speedup, there are still some flaws as well as enhancements that can be made for deploying it in an exceedingly real-time 30fps or 60fps video feed. As we know from our previous blog, the Fast R-CNN & SPP-net are still multi-stage training and involve the Selective Search Algorithm for generating the regions. This is often a huge bottleneck for the entire system because it takes plenty of time for the Selective Search Algorithm to generate ~2000 region proposals. This problem was solved in Faster R-CNN - the widely used State-of-the-Art version in the R-CNN family of Object Detectors. We’ve seen the evolution of architectures in the R-CNN family where the main improvements were computational efficiency, accuracy, and reduction of test time per image. Let's dive into Faster R-CNN now!

Object Detection – Part 4: Spatial Pyramid Pooling in Deep Convolution Networks (SPPnet)

In our recent blog posts on R-CNN and Fast R-CNN, there was one more famous architecture for Image Classification, Object Detection & Localization. It was the first runner-up in Object Detection,  2nd Runner Up in Image Classification, and 5th Place in Localization Task at the ILSVRC 2014! This feat makes it one of the major architectures to study on the subject of Object Detection and Image Classification. The architecture is SPPnet - Spatial Pyramid Pooling network. In this article, we shall delve into SPPnet only from an Object Detection Perspective.

Object Detection – Part 3: Fast R-CNN

In the previous post, we had an in-depth overview of Region-based Convolutional Neural Networks (R-CNN), which is one of the fundamental architectures in the Two-Stage Object Detection pipeline approach. During the ramp-up of the Deep Learning era in around 2012 when AlexNet was published, the approach of solving the Object Detection problem changed from hand-built features like Haar features and Histogram of Oriented Gradients approaches to the Neural Network-based approach, and in that mainly the CNN-based architecture. Over time it has been solved via a 2-Stage approach, where the first stage will be mainly based on generating Region Proposals, and the second stage deals with classifying each proposed region. 

Object Detection – Part 2: Region Based CNN (R-CNN)

In our last post, we had a quick overview of Object Detection and the various approaches and methods used to tackle this problem in Computer Vision. Now, it's time to dive deep into the popular methods of building a State-of-the-Art Object Detector. In particular, we shall focus on one of the earliest methods - Region-Based Convolutional Neural Network Family of Object Detectors. The reason it is called R-CNN is that with modifications to a CNN architecture in terms of structuring or adding auxiliary networks or layers, the Object Detector was built, albeit not achieving the state of the art performance. R-CNN is the best way to start in the Object Detection space.

Object Detection – Part 1: Introduction

Object Detection is one of the most sought after sub-disciplines under Computer Vision. The fact that it's extensively utilized in major real-world applications has made it extremely important. When humans perceive, we have an innate cognitive intelligence trained daily to acknowledge and understand what we see through our eyes. Object detection is one of the advanced methods of how a computer tries to match the power to perceive and understand things around, the primary steps being Image Classification and Localization. Each object will have its own set of varying characteristics that are challenging for a Deep Learning Model/Architecture. It is a different ball game altogether to build an efficient and accurate Object Detector. Let's quickly have a short tour of the extensions and key concepts under Computer Vision before diving in deep on Object Detection. 

NF-Nets: Normalizer Free Nets

The beginning of the downfall for Batch Normalization? DeepMind released a new family of state-of-the-art networks for Image Classification that has surpassed the previous best - EfficientNet - by quite a margin. The main idea behind the new architecture is the use of Normalizer Free Neural Nets or NF-Nets to train networks instead of batch … Continue reading NF-Nets: Normalizer Free Nets

Castle in the Sky

Dynamic Sky Replacement and Harmonization in Videos. Through the power of neural network-based learning algorithms today it is possible to perform video to video translation. For instance, it goes, a daytime video, and out comes a nighttime version of the same footage.  This work ‘Castle in the Sky’ proposes a vision-based method for video sky … Continue reading Castle in the Sky

Interactive Video Stylization Using Few-Shot Patch-Based Training

Style transfer is an interesting problem in machine learning research where we have two input images, one for content and another for style, and the output is our content image re-imagined with this new style. The content can be a photo straight from our camera, and the style can be a painting, which leads to … Continue reading Interactive Video Stylization Using Few-Shot Patch-Based Training

Ensemble Learning in ML – Part 3: Stacking

In the previous posts, we discussed Bagging and Boosting ensemble learning in ML and how they are useful. We also discussed the algorithms which are based on it i.e. Ada Boost and Gradient Boosting. In this part, we will discuss another ensemble learning technique known as Stacking. We also discuss a bit about Blending (another … Continue reading Ensemble Learning in ML – Part 3: Stacking

Ensemble Learning in ML – Part 2: Boosting

In the last part, we discussed what ensembling learning in ML is and how it is useful. We also discussed one ensemble learning technique - Bagging - and algorithms that are based on it i.e. Bagging meta-estimator and Random Forest. In this part, we will discuss another ensemble learning technique, which is known as Boosting, … Continue reading Ensemble Learning in ML – Part 2: Boosting

Ensemble Learning in ML – Part 1: Bagging

Let’s understand ensemble learning with an example. Suppose you have a startup idea and you wanted to know whether that idea is good to move ahead with it or not. Now, you want to take preliminary feedback on it before committing money and your precious time to it. So you may ask one of your … Continue reading Ensemble Learning in ML – Part 1: Bagging

Workflow and Implications of membrane lipids

Membrane lipids play diverse roles in cellular function. On the membrane, they act as structural elements, serve as a pool of secondary messengers and act as a platform for membrane proteins.  Phosphatidyinositol (PI) plays an important role in signal transduction and membrane trafficking. The PI on the plasma membrane is phosphorylated to P1 and P2 … Continue reading Workflow and Implications of membrane lipids

Discovery of novel molecular pathways linked to Insulin Resistance

Insulin resistance (IR) is a clinical and major pathological condition that occurs due to inappropriate cell response to insulin hormone and abnormal secretion in the body. The decrease in insulin sensitivity leads to the progression of many metabolic disorders such as auto-immune diseases, type-1 diabetes mellitus (T1DM), obesity, atherosclerosis, cardiovascular diseases, etc. In some cases … Continue reading Discovery of novel molecular pathways linked to Insulin Resistance

PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization

Today, a variety of techniques exist that can take an image that contains humans and perform pose estimation on it. This gives us interesting skeletons that show us the current posture of the subjects shown in the given images. Having a skeleton opens up the possibility for many cool applications, for instance, it’s great for … Continue reading PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization

CLEVRER: CoLlision Events for Video REpresentation and Reasoning

I recently came across a paper "CLEVRER" ("CoLlision Events for Video REpresentation and Reasoning", by - Kexin Yi, Chuang Gan, Yunzhu Li, Pushmeet Kohli, Jiajun Wu, Antonio Torralba, Joshua B. Tenenbaum). It intrigued me, so I wanted to share some thoughts about it. With the advancements in NN-based learning algorithms, many of us are wondering … Continue reading CLEVRER: CoLlision Events for Video REpresentation and Reasoning