The performance of deep learning models has improved significantly on several computer vision tasks, but yet supervised learning models rely on a large number of labeled images. We know how expensive it is to get high-quality annotations, and this motivates research in other directions, including Active Learning.
Let’s suppose we are trying to solve an object detection problem where we have some number of labeled images and can afford to annotate N more images from the remaining unlabeled data.
Now the question is, which N images should we annotate out of the unlabeled data?
Here ‘Active Learning’ can help us – it can suggest which N images we should select and annotate in such a way that getting annotations of those images can cause the maximal increase of model accuracy.
One such Active Learning approach is ‘Learning Loss for Active Learning’. It attaches a small parametric module to the target network and then learns to predict target losses of unlabeled inputs. This module can help us know which N images we should annotate, by suggesting images for which target model is likely to produce a wrong prediction.
Given an input, the target model outputs a target prediction, and the loss prediction module outputs a predicted loss. The target prediction and the target annotation are then used to compute a target loss to learn the target model. The target loss is then regarded as a ground-truth loss for the loss prediction module and used to compute the loss-prediction loss.
The above image shows the architecture of this model. Features of several layers of the main model are used, which pass through a Global Average Pooling, FC layer (with output width of 128), and a ReLU. All the outputs of these layers are concatenated and pass through another FC layer, which predicts the final loss.
This method is task-agnostic, that’s why it can also be used for image classification and human pose estimation problem 🙂