Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning Paper • 2601.06943 • Published 2 days ago • 128
When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios Paper • 2507.20198 • Published Jul 27, 2025 • 27
SSR Collection Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning • 6 items • Updated Jul 7, 2025 • 2
Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers Paper • 2505.21497 • Published May 27, 2025 • 109
HoliTom: Holistic Token Merging for Fast Video Large Language Models Paper • 2505.21334 • Published May 27, 2025 • 21
PiTe: Pixel-Temporal Alignment for Large Video-Language Model Paper • 2409.07239 • Published Sep 11, 2024 • 15
SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning Paper • 2505.12448 • Published May 18, 2025 • 10
SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning Paper • 2505.12448 • Published May 18, 2025 • 10
SSR Collection Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning • 6 items • Updated Jul 7, 2025 • 2
SSR Collection Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning • 6 items • Updated Jul 7, 2025 • 2
SSR Collection Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning • 6 items • Updated Jul 7, 2025 • 2