We all know that if we click a photo from a normal smartphone it will be in 2 Dimension, but can we convert it into a 3 Dimensional photo? The answer to it is 'YES'. The below image is a standard color photo made with a smartphone and hence, as we discussed, it contains only a 2D representation of the world. When we look at it, our brain is able to reconstruct the 3D information from it. Now, it is possible for an AI to do the same, and even go all the way to create a 3D version of this photo.
In our previous articles, we understood few limitations of R-CNN and how SPP-net & Fast R-CNN have solved the issues to a great extent leading to an enormous decrease in inference time to ~2s per test image, which is an improvement over the ~45-50s of the R-CNN. But even after such a speedup, there are still some flaws as well as enhancements that can be made for deploying it in an exceedingly real-time 30fps or 60fps video feed. As we know from our previous blog, the Fast R-CNN & SPP-net are still multi-stage training and involve the Selective Search Algorithm for generating the regions. This is often a huge bottleneck for the entire system because it takes plenty of time for the Selective Search Algorithm to generate ~2000 region proposals. This problem was solved in Faster R-CNN - the widely used State-of-the-Art version in the R-CNN family of Object Detectors. We’ve seen the evolution of architectures in the R-CNN family where the main improvements were computational efficiency, accuracy, and reduction of test time per image. Let's dive into Faster R-CNN now!
In the previous post, we had an in-depth overview of Region-based Convolutional Neural Networks (R-CNN), which is one of the fundamental architectures in the Two-Stage Object Detection pipeline approach. During the ramp-up of the Deep Learning era in around 2012 when AlexNet was published, the approach of solving the Object Detection problem changed from hand-built features like Haar features and Histogram of Oriented Gradients approaches to the Neural Network-based approach, and in that mainly the CNN-based architecture. Over time it has been solved via a 2-Stage approach, where the first stage will be mainly based on generating Region Proposals, and the second stage deals with classifying each proposed region.
In our last post, we had a quick overview of Object Detection and the various approaches and methods used to tackle this problem in Computer Vision. Now, it's time to dive deep into the popular methods of building a State-of-the-Art Object Detector. In particular, we shall focus on one of the earliest methods - Region-Based Convolutional Neural Network Family of Object Detectors. The reason it is called R-CNN is that with modifications to a CNN architecture in terms of structuring or adding auxiliary networks or layers, the Object Detector was built, albeit not achieving the state of the art performance. R-CNN is the best way to start in the Object Detection space.
Dynamic Sky Replacement and Harmonization in Videos. Through the power of neural network-based learning algorithms today it is possible to perform video to video translation. For instance, it goes, a daytime video, and out comes a nighttime version of the same footage. This work ‘Castle in the Sky’ proposes a vision-based method for video sky … Continue reading Castle in the Sky