We all know that if we click a photo from a normal smartphone it will be in 2 Dimension, but can we convert it into a 3 Dimensional photo? The answer to it is ‘YES’. The below image is a standard color photo made with a smartphone and hence, as we discussed, it contains only a 2D representation of the world. When we look at it, our brain is able to reconstruct the 3D information from it. Now, it is possible for an AI to do the same, and even go all the way to create a 3D version of this photo.

This new learning-based method promises exactly that, so let’s have a look if it can indeed live up to its promises. First, we take a photograph as the input and then see the 3D photo as an output. You can also rotate it around as now almost every smartphone is equipped with a gyroscope. Notice that we can even look behind a human if we wanted to. But how is that possible as that content was not even part of the original photo? How does this work and what kind of phone do we need for this? Do we need a depth sensor?

[Video]

The answer is we just need a normal smartphone, and there’s no need for any depth sensor. It just takes one colored photograph as the input and then this algorithm creates a depth map by itself. This depth map tells the algorithm how far different parts of the image are from the camera. Then, with this depth of information, it now has an understanding of what is where in this image and creates these layers. But still, how do we extract the information on what is behind the person? For this, the authors use a technique that implements image inpainting to fill in these regions with sensible data.

You will be amazed to know that this whole process is done in approximately one second, in which the depth estimation step just takes a quarter of a second, and inpainting half a second and the remaining by the layer generation and meshing.

Under the hood, this approach has 4 different stages:

1- Depth Estimation: In it, a dense depth map is estimated from the input image using a neural network, constructed with efficient building blocks, and optimized with automatic architecture search and int8-quantization for fast inference on mobile devices.

2- Layer Generation: The pixels are lifted onto a layered depth image with newly synthesized geometry in parallax regions using carefully designed heuristic algorithms.

3- Color Inpainting: The authors synthesize colors for the newly synthesized geometry of the LDI using an inpainting neural network. A novel set of neural modules enables us to transform this 2D CNN to one that can be applied to the layered depth image structure directly.

4- Meshing: Finally, they create a compact representation that can be efficiently rendered even on low-end devices and effectively transferred over poor network connections.

And then, we have a 3D photo!! 🙂

### References

1. One Shot 3D photography: https://arxiv.org/abs/2008.12298

Shubham Bindal