Analysis updated 2026-05-18
Record a video of people moving and generate BVH motion-capture files to drive character animations in Blender without special motion-capture hardware.
Extract 70-joint 3D body and hand positions from a video clip and export them as CSV data for analysis in a research project.
Integrate the compiled shared library into a Python script using ctypes to get real-time 3D body pose data from a camera feed.
Use the multi-person tracking to capture synchronized BVH files for multiple performers in the same scene from a single ordinary camera.
| ammarkov/sam3dbody-cpp | fractalfir/crustc | facex-engine/facex | |
|---|---|---|---|
| Stars | 563 | 331 | 189 |
| Language | C | C | C |
| Setup difficulty | hard | hard | moderate |
| Complexity | 5/5 | 5/5 | 4/5 |
| Audience | researcher | developer | developer |
Figures from each repo's GitHub metadata at analysis time.
Requires a CUDA-capable GPU and downloading approximately 5 GB of ONNX model files from HuggingFace before building with CMake.
SAM3DBody-cpp is a C++ program that takes video from a single ordinary camera and produces a three-dimensional model of every person's body and hands visible in each frame, in real time. It does not require depth cameras, multiple camera setups, special sensors, or any Python installation to run. The entire inference pipeline runs through compiled C++ code. What it produces is a standard motion-capture file format called BVH for each detected person in the video. These files record how every joint in the body, including the hands, moves from frame to frame. You can open these files directly in animation software like Blender, and a bundled Blender plugin is included to drive a character rig from the results. Each person in a multi-person scene gets their own BVH file, and the software keeps track of which person is which across frames so the identities do not swap. The system works by running a sequence of neural network models. A detection model finds people in the image. A large visual understanding model then analyzes each person's crop and produces a compact description of their pose. A final set of lightweight models decode that description into the 519 specific numbers that describe a full-body skeleton with hand poses and even basic facial expressions. A linear blend skinning step then converts those numbers into 18,439 surface points forming the body mesh, plus 70 labeled joint positions. Running the GPU-accelerated version requires a CUDA-capable graphics card and about 5 gigabytes of downloaded model files from HuggingFace. A CPU-only version exists but processes a single frame in 5 to 15 seconds depending on the computer, which is not practical for video. Building from source requires CMake and a C++ compiler, ONNX Runtime and ggml handle the model execution. Python frontends are included for users who want to call the compiled library from Python scripts without rewriting any C++ code. A CSV exporter is also available for the 70 joint positions if you need the data in spreadsheet form. The full README is longer than what was shown.
A C++ real-time engine that reconstructs 3D full-body poses and hand poses from a single camera, exporting per-person BVH motion-capture files ready to use in Blender or other 3D software.
Mainly C. The stack also includes C, C++, ONNX Runtime.
The README does not state a license directly, check the repository for a license file before use.
Setup difficulty is rated hard, with roughly 1day+ to a first successful run.
Mainly researcher.
This repo across BitVibe Labs
Verify against the repo before relying on details.