Spatia: Video Generation with Updatable Spatial Memory
Abstract
Spatia, a spatial memory-aware video generation framework, maintains long-term spatial and temporal consistency by preserving and updating a 3D scene point cloud, enabling realistic video generation and interactive editing.
Existing video generation models struggle to maintain long-term spatial and temporal consistency due to the dense, high-dimensional nature of video signals. To overcome this limitation, we propose Spatia, a spatial memory-aware video generation framework that explicitly preserves a 3D scene point cloud as persistent spatial memory. Spatia iteratively generates video clips conditioned on this spatial memory and continuously updates it through visual SLAM. This dynamic-static disentanglement design enhances spatial consistency throughout the generation process while preserving the model's ability to produce realistic dynamic entities. Furthermore, Spatia enables applications such as explicit camera control and 3D-aware interactive editing, providing a geometrically grounded framework for scalable, memory-driven video generation.
Community
Existing video generation models struggle to maintain long-term spatial and temporal consistency due to the dense, high-dimensional nature of video signals. To overcome this limitation, we propose Spatia, a spatial memory-aware video generation framework that explicitly preserves a 3D scene point cloud as persistent spatial memory.
Spatia iteratively generates video clips conditioned on this spatial memory and continuously updates it through visual SLAM. This dynamic-static disentanglement design enhances spatial consistency throughout the generation process while preserving the model's ability to produce realistic dynamic entities.
Furthermore, Spatia enables applications such as:
- Explicit Camera Control
- 3D-Aware Interactive Editing
- Long-horizon Scene Exploration
arXiv lens breakdown of this paper ๐ https://arxivlens.com/PaperView/Details/spatia-video-generation-with-updatable-spatial-memory-7151-fc1e45e1
- Executive Summary
- Detailed Breakdown
- Practical Applications
arXiv explained breakdown of this paper ๐ https://arxivexplained.com/papers/spatia-video-generation-with-updatable-spatial-memory
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- VideoSSM: Autoregressive Long Video Generation with Hybrid State-Space Memory (2025)
- Endless World: Real-Time 3D-Aware Long Video Generation (2025)
- WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion (2025)
- GeoVideo: Introducing Geometric Regularization into Video Generation Model (2025)
- Captain Safari: A World Engine (2025)
- StoryMem: Multi-shot Long Video Storytelling with Memory (2025)
- Video4Spatial: Towards Visuospatial Intelligence with Context-Guided Video Generation (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper