ViSTA-SLAM: Visual SLAM with Symmetric Two-view Association

ViSTA-SLAM is a real-time monocular visual SLAM system that operates without requiring camera intrinsics. At its core, the system employs a lightweight symmetric two-view association (STA) model as the frontend, which simultaneously estimates relative camera poses and regresses local pointmaps from only two RGB images.

Method Overview

In the backend, ViSTA-SLAM constructs a specially designed Sim(3) pose graph that incorporates loop closures to address accumulated drift. Extensive experiments demonstrate that this approach achieves superior performance in both camera tracking and dense 3D reconstruction quality compared to current methods.

Citation

@misc{zhang2025vistaslam,
      title={{ViSTA-SLAM}: Visual {SLAM} with Symmetric Two-view Association}, 
      author={Ganlin Zhang and Shenhan Qian and Xi Wang and Daniel Cremers},
      year={2025},
      eprint={2509.01584},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2509.01584}, 
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for zhangganlin/vista_slam