TexSpot: 3D Texture Enhancement with Spatially-uniform Point Latent Representation
Abstract
High-quality 3D texture generation remains a fundamental challenge due to the view-inconsistency inherent in current mainstream multi-view diffusion pipelines. Existing representations either rely on UV maps, which suffer from distortion during unwrapping, or point-based methods, which tightly couple texture fidelity to geometric density that limits high-resolution texture generation. To address these limitations, we introduce TexSpot, a diffusion-based texture enhancement framework. At its core is Texlet, a novel 3D texture representation that merges the geometric expressiveness of point-based 3D textures with the compactness of UV-based representation. Each Texlet latent vector encodes a local texture patch via a 2D encoder and is further aggregated using a 3D encoder to incorporate global shape context. A cascaded 3D-to-2D decoder reconstructs high-quality texture patches, enabling the Texlet space learning. Leveraging this representation, we train a diffusion transformer conditioned on Texlets to refine and enhance textures produced by multi-view diffusion methods. Extensive experiments demonstrate that TexSpot significantly improves visual fidelity, geometric consistency, and robustness over existing state-of-the-art 3D texture generation and enhancement approaches.
Visualization of texture reconstruction results by our VAE, with comparisons with ground truth (input) textures.
The qualitative results of comparison with the state-of-the-art methods in the task of 3D texture super resolution. Our TexSpot achieves the best performance in texture quality and global consistency. PBR-SR presented here is the re-implemented version by us.
Texture Enhancement result for generated textured 3D mesh.
Texture enhancement visualization of our TexSpot for scanned meshes of objects or scenes.
Video Presentation
Methodology
TexSpot Pipeline: The pipeline overview of TexSpot. It consists of (i) a texture patch partitioning that divides the surface texture into spatially-uniform small patches; (ii) a TexSpot VAE with a two-stage local-global architecture that represents all texture patches into a compact 3D latent space; and (iii) a conditional TexSpot DiT based on flow matching for texture enhancement.