Generate 3D scene representations from single photos at near-real-time speed for use in augmented reality or 3D content pipelines.
Reproduce and extend Apple's SHARP research paper results using the provided command-line tool and pretrained model weights.
Feed SHARP output files into existing 3D Gaussian rendering tools to produce video walkthroughs of a scene from a single input photo.
Benchmark SHARP against other single-image novel-view synthesis methods on standard image quality metrics.
Video rendering requires an NVIDIA GPU, model weights are downloaded automatically on first run.
SHARP is a research project from Apple that takes a single photograph as input and generates realistic images of the same scene from nearby camera angles. In other words, you give it one picture, and it produces what the scene would look like from slightly different positions, creating a sense of three-dimensional depth from a flat image. The way it works is that a trained neural network looks at the photo and quickly figures out a three-dimensional representation of the scene using a technique called 3D Gaussian splatting. This representation stores the scene as a large collection of small fuzzy blobs in three-dimensional space, each with color and opacity information. Once that representation is built, a rendering engine can produce new viewpoints in real time. The whole process from photo to 3D representation takes under a second on a standard graphics card, which the paper describes as three orders of magnitude faster than previous approaches. The output files are compatible with existing 3D Gaussian rendering tools. The project accompanies a research paper and includes a command-line tool called sharp. After installing the Python dependencies, you point it at a folder of input images and it writes the resulting 3D representation to an output folder. The model weights are downloaded automatically on the first run. A separate render command can then produce video along a camera path, though that step currently requires an NVIDIA GPU. The representation uses real-world scale, so camera movements correspond to actual distances rather than arbitrary units. The authors report that SHARP improves on previous methods by measurable amounts on several image quality benchmarks. The code and model are released under separate licenses, each with their own terms.
← apple on gitmyhub — every repo by this author, as a profile.
Verify against the repo before relying on details.