SSD: Single Shot MultiBox Detector

Hi, welcome to the Advanced Object Detection Blog Post series i.e. Single-stage object detectors - notably the YOLO & SSD class of object detectors. In our previous posts, we had a detailed look at YOLO and YOLOv2 (and YOLO9000 too). Now, it’s time to introduce another state-of-the-art object detector - Single Shot Detector. It has few similarities to YOLO v2 - dividing the image into grid cells and using the anchor box approach for detection. YOLOv2 & SSD papers were published in 2016 in CVPR (International conference on Computer Vision & Pattern Recognition) & ECCV (European Conference on Computer Vision) conferences respectively. The SSD had a performance of 74.3% mAP @59 fps, which is better than both Faster R-CNN  (73% mAP @ 7 fps) & YOLO v1 (63.4% mAP @ 45 fps). Let’s get into the details of SSD, the architecture, loss function and we will compare it with YOLO v1 in various places in this article. Let’s assume that YOLOv2 is not yet out and we will always have this context of having Faster R-CNN & YOLO v1 as the most recent state-of-the-art object detectors. Well, let’s dive into SSD for now.