Geometry Aware Texturing

Re-textured 3D garments using proposed method and a reference image input.

The research was conducted at Ready Player Me Labs and became a part of the product: Asset Designer.
Technical details could be found in the publication (SIGGRAPH Asia 2023, posters).
This work was also presented at Real Time Live! at SIGGRAPH Asia 2023. See the recording on YouTube and try SIGGRAPH demo.

Intro

Stable Diffusion and ControlNet could be adopted for generating high-quality 3D textures for any asset. In this work we demonstrate the idea on 3D outfits for Ready Player Me avatars.

1. Generated diffuse and PBR materials from prompt: Steampunk; 2. Original textures created by 3D artist.

1. Original asset, back view; 2. Generated diffuse and PBR materials, back view

By changing PBR materials of low-poly 3D mesh the look and feel of the asset could be completely transformed. This technology could be adopted for:

Creating assets variations
Stylize assets to match particular artistic style
Tool for artists to iterate quickly over the ideas
Technology powering user generated content

We wanted to have a solution that works fast and produces high-quality result that requires minimal edits. It could be ideal to get

ML Problem

3D mesh with existing UVs + prompt or image as input —> PBR UV textures as output

Solution

We came up with the solution that a texture could be generated in a UV space from the beginning with some smart guidance. As such guidance we used a linear combination of Object Space Normal map + Positions in 3D space. Using 3D mesh encoded into image for ControlNet conditioning gives the geometry awareness. Which means -- following object's symmetry, understanding positions of pockets, zippers and other small details.

Example #1 of conditional image and texture produced via prompt Plants Queen

Example #2 of condition image and generated textures from various prompts

In order to train ControlNet we created a dataset of ~1k assets using Ready Player Me assets library. We annotated them automatically using renders and Blip-2. As a result, Stable Diffusion with trained ControlNet produces textures of similar quality to the ones we have in the dataset. This method generalized to unseen outfits, but requires to prepare a conditioning image for them.

Baked shadows and lights

In our experiment it became evident that the data used for the ControlNet training infuses certain stylistical bias for the images generated with it. In our experiments with Stable Diffusion v1.5 we clearly see generations looking like the majority of Ready Player Me textures stylistically. It means having dark shadows on the clothing creases and shadows under the arms. We did not conduct experiments with alternative datasets of 3D models, but we assume that if the data does not have clear shadows on diffuse textures, trained ControlNet will inherit this bias.

Avatars wearing outfits textured entirely using the proposed method.

Image input

For the image input we used IP-Adapter to augment Stable Diffusion with image-prompt input. It works well in combination with ControlNet allowing us to transfer the concept of the input image for a UV texture of the garment. However, the exact same style of the input is not always achievable since the base Stable Diffusion model that is used for rendering may have some limitations with certain input styles and concepts.

Example of textures generated from input image with medieval outfit.

Limitations

This method does not work perfectly on the seams. It's often the case where geometric parts of a 3D asset positioned closely together are split into different UV islands. The misalignment then happens around UV edges.
The method is not suitable for creating full coherent textures for the assets with too fragmented UVs.
Sometimes without some key-words indicating what type of asset it is (t-shirt/jacket, pants, shoes, the model might get confused and hallucinate wrong details).
Using image-prompts with faces may cause the faces to appear on the garments.
Beautiful, coherent and useful generations happen once per 5 generations approximately, that is not very efficient use of the compute and this issue should be addressed with the model alignment or by other means.

a.) Render of pants with generated textures demonstrating misalignment in seams region. b.) The original asset created by a 3D artist. c.) Generated UV-texture d.) Conditional image that demonstrates how UV islands are separated on the image.

Demo

The main stable version is available in Asset Designer for a limited number of assets and with only diffuse available. In SIGGRAPH demo you can experience the latest SDXL based model with PBR materials. In the first version web demo, you can experience the same model that runs in Asset Designer, but with PBR materials and for full-body assets.

Asset Designer with an asset textured using predefined Medieval prompt.

Future improvements

Addressing seams and consistency issues

Recently a paper was released by Xianfang Zeng et al introducing Paint3D. The described method uses depth conditoning and several model views to generate a new desired look of the 3D object. Then it re-projects the texture to the original UVs and uses an inpainting and refinement steps to complete the texture. In the inpainting step it uses a similar techique to the one proposed in our method. Namely, the authors also trained ControlNet conditioned on the positional encodings embedded in the UV space to produce diffuse maps without baked lights. The produced textures inherit the lightless bias from the training set.

Main takeaways

More diverse meshes are necessary for training higher quality ControlNet model that could generalize to diverse assets. We had ~1k meshes, Paint3D used ~105k meshes sourced from the Objaverse dataset.

Creating coarse texture using depth-conditioned reprojections shall improve a lot the coherence, seams issues and help the model to generalize to any asset better.

Real-time performance

To speed up the generation process it makes sense to experiment with training custom LCM model on UV textures. Also, using smaller and faster models instead of the vanilla SD should also speed up the process significantly, which requires to re-train ControlNet for such models.