Abstract
SHARP synthesizes photorealistic views from a single image using a 3D Gaussian representation, achieving state-of-the-art results with rapid processing.
We present SHARP, an approach to photorealistic view synthesis from a single image. Given a single photograph, SHARP regresses the parameters of a 3D Gaussian representation of the depicted scene. This is done in less than a second on a standard GPU via a single feedforward pass through a neural network. The 3D Gaussian representation produced by SHARP can then be rendered in real time, yielding high-resolution photorealistic images for nearby views. The representation is metric, with absolute scale, supporting metric camera movements. Experimental results demonstrate that SHARP delivers robust zero-shot generalization across datasets. It sets a new state of the art on multiple datasets, reducing LPIPS by 25-34% and DISTS by 21-43% versus the best prior model, while lowering the synthesis time by three orders of magnitude. Code and weights are provided at https://github.com/apple/ml-sharp
Community
Sharp Monocular View Synthesis in Less Than a Second
https://huggingface.co/papers/2512.10685
Real-time photorealistic view synthesis from a single image. Given a single photograph, regresses the parameters of a 3D Gaussian representation of the depicted scene. Synthesis in less than a second on a standard GPU via a single feedforward pass through a neural network. The synthesized representation is then rendered in real time, yielding high-resolution photorealistic images for nearby views. The representation is metric, with absolute scale, supporting metric camera movements. Robust zero-shot generalization. SOTA on multiple datasets while lowering the synthesis time by three orders of magnitude.
Learn mode at and https://huggingface.co/apple/Sharp
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Depth Anything 3: Recovering the Visual Space from Any Views (2025)
- Novel View Synthesis from A Few Glimpses via Test-Time Natural Video Completion (2025)
- Diff4Splat: Controllable 4D Scene Generation with Latent Dynamic Reconstruction Models (2025)
- Splatent: Splatting Diffusion Latents for Novel View Synthesis (2025)
- Blur2Sharp: Human Novel Pose and View Synthesis with Generative Prior Refinement (2025)
- VGD: Visual Geometry Gaussian Splatting for Feed-Forward Surround-view Driving Reconstruction (2025)
- CloseUpShot: Close-up Novel View Synthesis from Sparse-views via Point-conditioned Diffusion Model. (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 1
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 1
Collections including this paper 0
No Collection including this paper
