tencentarc/pixal3d

Analysis updated 2026-06-24

★ 1,279PythonAudience · researcherComplexity · 5/5LicenseSetup · hard

Mindmap

mindmap
  root((Pixal3D))
    Inputs
      Single image
      Inference config
      Training data
    Outputs
      Textured GLB mesh
      PBR materials
      Training checkpoints
    Use Cases
      Image to 3D
      Train custom model
      Gradio demo
    Tech Stack
      Python
      PyTorch
      CUDA
      Trellis.2
      Gradio

mindmap root((Pixal3D)) Inputs Single image Inference config Training data Outputs Textured GLB mesh PBR materials Training checkpoints Use Cases Image to 3D Train custom model Gradio demo Tech Stack Python PyTorch CUDA Trellis.2 Gradio

Click or tap to explore — scroll the page freely

What do people build with it?

USE CASE 1

Upload an image to the Hugging Face Space demo and download the resulting textured GLB mesh.

USE CASE 2

Run inference.py locally on an image to produce a 3D model with PBR textures.

USE CASE 3

Retrain the three-stage cascade on ObjaverseXL data with the included data toolkit.

USE CASE 4

Switch between the main branch (Trellis.2 backbone) and paper branch (Direct3D-S2) to reproduce SIGGRAPH submission numbers.

What is it built with?

PythonPyTorchCUDAGradio

How does it compare?

	tencentarc/pixal3d	yynxxxxx/codex-5.5-codex-instruct-5.5	claudiodrews/memory-os
Stars	1,279	1,285	1,222
Language	Python	Python	Python
Last pushed	—	2026-07-03	2026-06-10
Maintenance	—	Active	Active
Setup difficulty	hard	easy	moderate
Complexity	5/5	2/5	4/5
Audience	researcher	developer	developer

Figures from each repo's GitHub metadata at analysis time.

How do you get it running?

Difficulty · hard Time to first run · 1day+

Requires Trellis.2 setup, a CUDA-matched natten build, and a custom utils3d wheel before inference will run.

MIT license. Free to use, modify, and redistribute including for commercial purposes, with attribution.

In plain English

Pixal3D is a research project from Tencent ARC Lab and Tsinghua University that turns a single image into a textured 3D model. The paper has been accepted to SIGGRAPH 2026. The headline idea is that earlier image to 3D systems passed image features into a 3D network in a loose way through attention layers, while Pixal3D instead lifts each pixel into 3D space by back projection. This gives the network a direct correspondence between what is in the 2D image and where it should sit in the 3D volume, which the authors say produces geometry and PBR textures close to the quality of a full 3D reconstruction. There is a hosted demo on Hugging Face Spaces where you can upload an image in a browser and download the resulting GLB mesh, without installing anything locally. The repository ships two branches: main, which is the improved version built on the Trellis.2 backbone, and paper, which is the original implementation on Direct3D-S2 used to produce the numbers in the SIGGRAPH submission. Local installation starts by following the Trellis.2 setup, then installing the project's own requirements, a natten build matched to your CUDA architecture, and a small utils3d wheel. Inference is run through inference.py with an image path and an output GLB path. A low_vram flag drops the default resolution from 1536 to 1024 and loads model components on demand, and setting ATTN_BACKEND=sdpa lets you skip flash_attn if it is not installed. There is also a Gradio web demo launched via app.py. For people who want to retrain the model, the training code is included and organised as a three stage cascade. Stage 1 trains a sparse structure model at 32 then 64 voxel resolution, stage 2 a shape model going from 256 up to 1024, and stage 3 a texture model on the same resolution ladder. Each stage uses pixel aligned projection conditioning and two view aligned latents by default. A separate data toolkit prepares O-Voxel data and rendered condition images from a source such as ObjaverseXL, and each higher resolution step is launched by pointing its config's finetune_ckpt at the checkpoint produced by the previous step. The repository is released under the MIT license.

Copy-paste prompts

Prompt 1

Install Pixal3D on a single H100. Walk me through Trellis.2 setup, the natten build for my CUDA arch, and the utils3d wheel.

Prompt 2

Run Pixal3D inference on a product photo with low_vram set and ATTN_BACKEND=sdpa. Show the full command and where the GLB lands.

Prompt 3

Compare Pixal3D vs Hunyuan3D vs TripoSR for converting an iPhone photo of a chair into a textured mesh. Focus on geometry quality and texture sharpness.

Prompt 4

Prepare an ObjaverseXL subset with the Pixal3D data toolkit and train stage 1 sparse structure model at 32 then 64 voxel resolution. What configs do I edit?

Prompt 5

Use the Gradio app.py to host a Pixal3D demo on a single 4090. What ports and env vars do I set?

Frequently asked questions

What is pixal3d?

Research project from Tencent ARC and Tsinghua that turns a single image into a textured 3D model by back-projecting each pixel into 3D space. SIGGRAPH 2026 paper, ships training code and a Gradio demo.

What language is pixal3d written in?

Mainly Python. The stack also includes Python, PyTorch, CUDA.

What license does pixal3d use?

MIT license. Free to use, modify, and redistribute including for commercial purposes, with attribution.

How hard is pixal3d to set up?

Setup difficulty is rated hard, with roughly 1day+ to a first successful run.

Who is pixal3d for?

Mainly researcher.

Open on GitHub → Explain another repo

This repo across BitVibe Labs

Verify against the repo before relying on details.