The research field of image translation with the aid of learning algorithms is improving at a fast speed. For example, this earlier technique would look at a large number of animal faces and could interpolate between them, or in other words, blend one kind of dog into another breed. It could even transform dogs into cats, or even further transform cats into cheetahs. The results were very good, but it would only work on the domains it was trained on i.e. it could only translate to and from species that it took the time to learn about.
This new method offers something more than that – it can handle multiple domains, or multiple breeds, even ones that it hadn’t seen previously. It seems impossible, so let’s have a look at the results. The dog in the image will be used as content, therefore the output should have a similar pose. Its breed has to be changed, but the problem is that the AI has never seen this breed before. This is a very challenging task because we only see the head of the dog used for style (for topmost dog image in the video). So should the body of the dog also get curly hair? You only know if you know this particular dog breed, or if you are smart and can infer missing information by looking at other kinds of dogs. Look at the result, the remnants also remain there in the output results.
However, this one is not the first technique to attempt to solve it, so let’s see how it performs against a previous method. For the older method, in its output we get two dogs, which seem to be a mix of the content and the style dog (Baseline). While the new method still seems to have some structural issues, the dog type and the pose is indeed correct, and the results appear to be significantly better.
This method, a few-shot unsupervised image-to-image translation framework leveraged example-guided episodic training and generated realistic images from unseen domains, given a few reference images. However, their framework is limited in one aspect. The few-shot translation framework frequently generates unsatisfactory translation outputs when the model is applied to objects with diverse appearances, such as animals with very different body poses. The domain invariant content that is supposed to remain unchanged disappears after translation, as shown above. The authors call this issue the content loss problem. So the authors proposed this novel network architecture to solve the content loss problem: they designed a style encoder called the content-conditioned style encoder. This content-conditioned style encoder hinders the transmission of task-irrelevant appearance information to the image translation process. In contrast to the existing style encoders, this style code is computed by conditioning on the input content image.
Let’s see how this research in image translation moves in the future 🙂
References
- COCO-FUNIT paper: https://nvlabs.github.io/COCO-FUNIT/
Author
Shubham Bindal