The Pulse of Motion: Measuring Physical Frame Rate from Visual Dynamics
Abstract
Generative video models suffer from temporal ambiguity due to inconsistent frame rate training, which this work addresses through a Visual Chronometer that estimates physical frames per second from visual dynamics.
While recent generative video models have achieved remarkable visual realism and are being explored as world models, true physical simulation requires mastering both space and time. Current models can produce visually smooth kinematics, yet they lack a reliable internal motion pulse to ground these motions in a consistent, real-world time scale. This temporal ambiguity stems from the common practice of indiscriminately training on videos with vastly different real-world speeds, forcing them into standardized frame rates. This leads to what we term chronometric hallucination: generated sequences exhibit ambiguous, unstable, and uncontrollable physical motion speeds. To address this, we propose Visual Chronometer, a predictor that recovers the Physical Frames Per Second (PhyFPS) directly from the visual dynamics of an input video. Trained via controlled temporal resampling, our method estimates the true temporal scale implied by the motion itself, bypassing unreliable metadata. To systematically quantify this issue, we establish two benchmarks, PhyFPS-Bench-Real and PhyFPS-Bench-Gen. Our evaluations reveal a harsh reality: state-of-the-art video generators suffer from severe PhyFPS misalignment and temporal instability. Finally, we demonstrate that applying PhyFPS corrections significantly improves the human-perceived naturalness of AI-generated videos. Our project page is https://xiangbogaobarry.github.io/Visual_Chronometer/.
Community
The Pulse of Motion: Measuring Physical Frame Rate from Visual Dynamics
timely and important work!
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- PhysAlign: Physics-Coherent Image-to-Video Generation through Feature and 3D Representation Alignment (2026)
- DreamActor-M2: Universal Character Image Animation via Spatiotemporal In-Context Learning (2026)
- EgoForge: Goal-Directed Egocentric World Simulator (2026)
- 3D-Aware Implicit Motion Control for View-Adaptive Human Video Generation (2026)
- Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding (2026)
- Physion-Eval: Evaluating Physical Realism in Generated Video via Human Reasoning (2026)
- Egocentric World Model for Photorealistic Hand-Object Interaction Synthesis (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2603.14375 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 1
Collections including this paper 0
No Collection including this paper