Hello and welcome back to another article in the Advanced Object Detection series! In our last post, we ventured out of the YOLO detectors a bit and touched on RetinaNet architecture which introduced a novel loss function called FocalLoss (& 𝛂-balanced FocalLoss) and solved the huge class-imbalance problem observed in single-stage object detectors. Now, let's come back to the YOLO object detectors, specifically the YOLOv3. The YOLOv3 had some minor updates on top of the YOLOv2 which made it better and stronger, but sadly not as fast. The authors traded the speed with accuracy - accurate but not so fast. It matched the accuracy of the SSD by 3x faster @ ~22s inference time and higher scaled images (416x416) pushing it to sub 30fps inference times. It even comes close to RetinaNet in accuracy but is way faster. Let’s dig deep and understand the improvements made to YOLOv2 and why it’s slower but more accurate.

YOLO: You Only Look Once

Hello, I’m back with an Advanced Object Detector article! We are now past the amateur stage if and only if you have read through all our previous articles on Object Detection, R-CNN, Fast R-CNN, SPPnet, Faster R-CNN & Mask R-CNN. Although Mask R-CNN is also a tad advanced, we considered learning related to the R-CNN Family of Object Detectors. We are on the right track to mastering Object Detection after learning the building blocks of an Object Detector and have a fair intuition about the mechanism of how various parts are put together for a fully functional Object Detector. We progressed through the R-CNN, Fast R-CNN & Faster R-CNN architectures and understood how evolution happens to improve the accuracy and reduce the inference time taken per image.