Neural Neural Textures Make Sim2Real Consistent

Ryan Burgert, Jinghuan Shang, Xiang Li, Michael S. Ryoo

Stony Brook University

[Paper] [Code-Soon]


We propose TRITON (Texture Recovering Image Translation Network): an unpaired image translation algorithm which takes the UV map and object labels of a 3d scene and renders a realistic image. TRITON combines differentiable rendering with image translation to achieve temporal consistency over indefinite timescales, using surface consistency losses and neural neural textures.

Checkout the videos below and see how TRITON works!


TRITON was trained using two views (the rows labeled "Camera1" and "Camera2"), but was also evaluated on an unseen camera angle ("Unseen Camera"). "Sim (UVL)" is the input image containing UV maps and labels, and "Real GT" contains photographs of the robot arm to match the poses in the real world.


TRITON makes simulated images realistic, while being more consistent than other image translation algorithms. On the top row of the video we have input images, and on the bottom row we have TRITON's output images. None of these object placements were seen in real life. Note how the surfaces of the objects appear consistent throughout the video, even though the cubes have slight shadows under them and the apple and soda cans remain shiny.
From TRITON's outputs, we can recover textures for each object. In this image we show the three recovered texture sets with respect to the above video.


We compare TRITON to other image translation algorithms by moving the objects around. Note how although each individual frame might look realistic, CycleGAN and CUT let the top of each cube can randomly shift whereas they remain the same using TRITON.

Methodology

TRITON's goal is to turn 3D simulated images into realistic fake photographs (by training it without any matching pairs), while maintaining high surface consistency. It does this by simultaneously learning both an image translator and a set of realistic textures. TRITON introduces a learnable neural neural texture with two novel surface consistency losses to an existing image translator.

Check the figure below to see how we apply each loss and please refer to our paper to find more details.

The code will be released soon!

Neural Neural Texture

Previous works called these learnable textures "neural textures", and were parametrized by a discrete grid of differentiable texels. In contrast, we call our learnable textures as neural nerual textures, because our textures themselves are represented as a neural network function, parameterized continuously over UV space. Using this representation instead of using discrete texels allows TRITON to learn faster and yields better results.