Papers - Video - Understanding
updated
Video Mamba Suite: State Space Model as a Versatile Alternative for
Video Understanding
Paper
• 2403.09626
• Published
• 15
VideoAgent: Long-form Video Understanding with Large Language Model as
Agent
Paper
• 2403.10517
• Published
• 37
VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis
Paper
• 2403.13501
• Published
• 9
LITA: Language Instructed Temporal-Localization Assistant
Paper
• 2403.19046
• Published
• 19
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with
Interleaved Visual-Textual Tokens
Paper
• 2404.03413
• Published
• 27
Pegasus-v1 Technical Report
Paper
• 2404.14687
• Published
• 33
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation
in Videos
Paper
• 2406.08407
• Published
• 28
InternLM-XComposer-2.5: A Versatile Large Vision Language Model
Supporting Long-Contextual Input and Output
Paper
• 2407.03320
• Published
• 94
LLaVA-OneVision: Easy Visual Task Transfer
Paper
• 2408.03326
• Published
• 61
Apollo: An Exploration of Video Understanding in Large Multimodal Models
Paper
• 2412.10360
• Published
• 147