explaingit

tencentarc/pixal3d

⭐ Rising1,314PythonAudience · researcherComplexity · 5/5ActiveLicenseSetup · hard

TLDR

Research project from Tencent ARC and Tsinghua that turns a single image into a textured 3D model by back-projecting each pixel into 3D space. SIGGRAPH 2026 paper, ships training code and a Gradio demo.

Mindmap

mindmap
  root((Pixal3D))
    Inputs
      Single image
      Inference config
      Training data
    Outputs
      Textured GLB mesh
      PBR materials
      Training checkpoints
    Use Cases
      Image to 3D
      Train custom model
      Gradio demo
    Tech Stack
      Python
      PyTorch
      CUDA
      Trellis.2
      Gradio

Things people build with this

USE CASE 1

Upload an image to the Hugging Face Space demo and download the resulting textured GLB mesh.

USE CASE 2

Run inference.py locally on an image to produce a 3D model with PBR textures.

USE CASE 3

Retrain the three-stage cascade on ObjaverseXL data with the included data toolkit.

USE CASE 4

Switch between the main branch (Trellis.2 backbone) and paper branch (Direct3D-S2) to reproduce SIGGRAPH submission numbers.

Tech stack

PythonPyTorchCUDAGradio

Getting it running

Difficulty · hard Time to first run · 1day+

Requires Trellis.2 setup, a CUDA-matched natten build, and a custom utils3d wheel before inference will run.

MIT license. Free to use, modify, and redistribute including for commercial purposes, with attribution.

In plain English

Pixal3D is a research project from Tencent ARC Lab and Tsinghua University that turns a single image into a textured 3D model. The paper has been accepted to SIGGRAPH 2026. The headline idea is that earlier image to 3D systems passed image features into a 3D network in a loose way through attention layers, while Pixal3D instead lifts each pixel into 3D space by back projection. This gives the network a direct correspondence between what is in the 2D image and where it should sit in the 3D volume, which the authors say produces geometry and PBR textures close to the quality of a full 3D reconstruction. There is a hosted demo on Hugging Face Spaces where you can upload an image in a browser and download the resulting GLB mesh, without installing anything locally. The repository ships two branches: main, which is the improved version built on the Trellis.2 backbone, and paper, which is the original implementation on Direct3D-S2 used to produce the numbers in the SIGGRAPH submission. Local installation starts by following the Trellis.2 setup, then installing the project's own requirements, a natten build matched to your CUDA architecture, and a small utils3d wheel. Inference is run through inference.py with an image path and an output GLB path. A low_vram flag drops the default resolution from 1536 to 1024 and loads model components on demand, and setting ATTN_BACKEND=sdpa lets you skip flash_attn if it is not installed. There is also a Gradio web demo launched via app.py. For people who want to retrain the model, the training code is included and organised as a three stage cascade. Stage 1 trains a sparse structure model at 32 then 64 voxel resolution, stage 2 a shape model going from 256 up to 1024, and stage 3 a texture model on the same resolution ladder. Each stage uses pixel aligned projection conditioning and two view aligned latents by default. A separate data toolkit prepares O-Voxel data and rendered condition images from a source such as ObjaverseXL, and each higher resolution step is launched by pointing its config's finetune_ckpt at the checkpoint produced by the previous step. The repository is released under the MIT license.

Copy-paste prompts

Prompt 1
Install Pixal3D on a single H100. Walk me through Trellis.2 setup, the natten build for my CUDA arch, and the utils3d wheel.
Prompt 2
Run Pixal3D inference on a product photo with low_vram set and ATTN_BACKEND=sdpa. Show the full command and where the GLB lands.
Prompt 3
Compare Pixal3D vs Hunyuan3D vs TripoSR for converting an iPhone photo of a chair into a textured mesh. Focus on geometry quality and texture sharpness.
Prompt 4
Prepare an ObjaverseXL subset with the Pixal3D data toolkit and train stage 1 sparse structure model at 32 then 64 voxel resolution. What configs do I edit?
Prompt 5
Use the Gradio app.py to host a Pixal3D demo on a single 4090. What ports and env vars do I set?
Open on GitHub → Explain another repo

Generated 2026-05-22 · Model: sonnet-4-6 · Verify against the repo before relying on details.