Next-Embedding Prediction Makes Strong Vision Learners Paper • 2512.16922 • Published 13 days ago • 82
StereoPilot: Learning Unified and Efficient Stereo Conversion via Generative Priors Paper • 2512.16915 • Published 13 days ago • 37
DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation Paper • 2511.19365 • Published Nov 24, 2025 • 63
UltraFlux: Data-Model Co-Design for High-quality Native 4K Text-to-Image Generation across Diverse Aspect Ratios Paper • 2511.18050 • Published Nov 22, 2025 • 37
FlashVSR: Towards Real-Time Diffusion-Based Streaming Video Super-Resolution Paper • 2510.12747 • Published Oct 14, 2025 • 37
UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist Paper • 2511.08521 • Published Nov 11, 2025 • 37
VideoSSR: Video Self-Supervised Reinforcement Learning Paper • 2511.06281 • Published Nov 9, 2025 • 24
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm Paper • 2511.04570 • Published Nov 6, 2025 • 210
Latent Sketchpad: Sketching Visual Thoughts to Elicit Multimodal Reasoning in MLLMs Paper • 2510.24514 • Published Oct 28, 2025 • 21
AlphaFlow: Understanding and Improving MeanFlow Models Paper • 2510.20771 • Published Oct 23, 2025 • 7
Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing Paper • 2510.19808 • Published Oct 22, 2025 • 29
MoGA: Mixture-of-Groups Attention for End-to-End Long Video Generation Paper • 2510.18692 • Published Oct 21, 2025 • 40
MUG-V 10B: High-efficiency Training Pipeline for Large Video Generation Models Paper • 2510.17519 • Published Oct 20, 2025 • 9
Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset Paper • 2510.15742 • Published Oct 17, 2025 • 50
BLIP3o-NEXT: Next Frontier of Native Image Generation Paper • 2510.15857 • Published Oct 17, 2025 • 24