Controlling Multimodal LLMs via Reward-guided Decoding Paper • 2508.11616 • Published Aug 15, 2025 • 7
JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse Paper • 2503.16365 • Published Mar 20, 2025 • 41
SRMT: Shared Memory for Multi-agent Lifelong Pathfinding Paper • 2501.13200 • Published Jan 22, 2025 • 69
Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos Paper • 2501.13826 • Published Jan 23, 2025 • 23
Video Depth Anything: Consistent Depth Estimation for Super-Long Videos Paper • 2501.12375 • Published Jan 21, 2025 • 23
3DIS-FLUX: simple and efficient multi-instance generation with DiT rendering Paper • 2501.05131 • Published Jan 9, 2025 • 37
MangaNinja: Line Art Colorization with Precise Reference Following Paper • 2501.08332 • Published Jan 14, 2025 • 61
MiniMax-01: Scaling Foundation Models with Lightning Attention Paper • 2501.08313 • Published Jan 14, 2025 • 300
Towards Best Practices for Open Datasets for LLM Training Paper • 2501.08365 • Published Jan 14, 2025 • 62
AnyDressing: Customizable Multi-Garment Virtual Dressing via Latent Diffusion Models Paper • 2412.04146 • Published Dec 5, 2024 • 23
MotionShop: Zero-Shot Motion Transfer in Video Diffusion Models with Mixture of Score Guidance Paper • 2412.05355 • Published Dec 6, 2024 • 8
DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation Paper • 2412.07589 • Published Dec 10, 2024 • 48