YOLOv4

come back to another interesting read on the latest Advanced Object Detector architectures - the YOLOv4. YOLOv4 is the latest and one of the strongest state of the art object detectors now in the industry. Without wasting much time let's get straight into the YOLOv4 and understand why and how it became the new state-of-the-art with an mAP of ~45% @ 65 fps which is quite very real-time with a good performance.

YOLOv3

Hello and welcome back to another article in the Advanced Object Detection series! In our last post, we ventured out of the YOLO detectors a bit and touched on RetinaNet architecture which introduced a novel loss function called FocalLoss (& 𝛂-balanced FocalLoss) and solved the huge class-imbalance problem observed in single-stage object detectors. Now, let's come back to the YOLO object detectors, specifically the YOLOv3. The YOLOv3 had some minor updates on top of the YOLOv2 which made it better and stronger, but sadly not as fast. The authors traded the speed with accuracy - accurate but not so fast. It matched the accuracy of the SSD by 3x faster @ ~22s inference time and higher scaled images (416x416) pushing it to sub 30fps inference times. It even comes close to RetinaNet in accuracy but is way faster. Let’s dig deep and understand the improvements made to YOLOv2 and why it’s slower but more accurate.

RetinaNet

Welcome back to the Advanced Object Detection blog post series! In our previous posts, we had a thorough understanding of YOLOv1, YOLOv2, YOLO9000 & the SSD Multibox detector. All are State-of-the-Art detectors that outperform each other brilliantly. In this post, we shall talk about another one of them - RetinaNet. RetinaNet is quite different from the YOLOs & SSD in a few aspects, the main one being the loss function. The RetinaNet employs a Focal Loss function that focuses less on soft or easy negatives and focuses more on hard samples. This was the class imbalance problem observed in training an object detector. The architecture uses an FPN (Feature Pyramid Network) with ResNet as the backbone CNN outperforms the Faster R-CNN and won the Best Student Paper Award in ICCV (International Conference on Computer Vision) 2021.

SSD: Single Shot MultiBox Detector

Hi, welcome to the Advanced Object Detection Blog Post series i.e. Single-stage object detectors - notably the YOLO & SSD class of object detectors. In our previous posts, we had a detailed look at YOLO and YOLOv2 (and YOLO9000 too). Now, it’s time to introduce another state-of-the-art object detector - Single Shot Detector. It has few similarities to YOLO v2 - dividing the image into grid cells and using the anchor box approach for detection. YOLOv2 & SSD papers were published in 2016 in CVPR (International conference on Computer Vision & Pattern Recognition) & ECCV (European Conference on Computer Vision) conferences respectively. The SSD had a performance of 74.3% mAP @59 fps, which is better than both Faster R-CNN  (73% mAP @ 7 fps) & YOLO v1 (63.4% mAP @ 45 fps). Let’s get into the details of SSD, the architecture, loss function and we will compare it with YOLO v1 in various places in this article. Let’s assume that YOLOv2 is not yet out and we will always have this context of having Faster R-CNN & YOLO v1 as the most recent state-of-the-art object detectors. Well, let’s dive into SSD for now.

YOLO9000

In our previous article, we had a detailed look at the YOLO architecture, why it is famous and performs so well - up to 45fps with an accuracy of more than 65%! Like any other architecture, it also had some flaws which needed to be solved to break the 45fps barrier and to better the mAP of 65%. Coming to the drawbacks of YOLO v1, it used to get the localization wrong when the objects appeared in a different aspect ratio and failed to detect multiple small objects like a flock of birds. Let’s see how the YOLOv1 was improved to a better, faster, and stronger YOLO with an mAP of more than 78%, which is a huge improvement over YOLO v1 in terms of accuracy but with a slower speed of 40fps. But, at 67fps YOLO v2 performs at 76% mAP trained on the PASCAL VOC 2007. And we will also understand why we had the title as YOLO9000!