We all know that if we click a photo from a normal smartphone it will be in 2 Dimension, but can we convert it into a 3 Dimensional photo? The answer to it is 'YES'. The below image is a standard color photo made with a smartphone and hence, as we discussed, it contains only a 2D representation of the world. When we look at it, our brain is able to reconstruct the 3D information from it. Now, it is possible for an AI to do the same, and even go all the way to create a 3D version of this photo.
In today's world, everybody can make Deepfakes by recording a voice sample. So let's understand what this new method can do by example. Let’s watch the below short clip of a speech, and make sure to pay attention to the fact that the louder voice is the English translator. If you pay attention, you can hear the chancellor’s original voice in the background too. So what is the problem here? Honestly, there is no problem here, this is just the way the speech was recorded.
In March 2020, a paper named Neural Radiance Fields, NeRF appeared. With this technique, we could take a bunch of input photos and train a neural network to learn them, and then synthesize new, previously unseen views of not just the materials in the scene but the entire scene itself. We can learn and reproduce entire real-world scenes from only a few views by using neural networks. However, it had some limitations, such as trouble with scenes with variable lighting conditions and occlusions.
The research field of image translation with the aid of learning algorithms is improving at a fast speed. For example, this earlier technique would look at a large number of animal faces and could interpolate between them, or in other words, blend one kind of dog into another breed. It could even transform dogs into cats, or even further transform cats into cheetahs. The results were very good, but it would only work on the domains it was trained on i.e. it could only translate to and from species that it took the time to learn about.
The performance of deep learning models has improved significantly on several computer vision tasks, but yet supervised learning models rely on a large number of labeled images. We know how expensive it is to get high-quality annotations, and this motivates research in other directions, including Active Learning.