Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2601.22060

Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models

Paper • 2601.22060 • Published 13 days ago • 149

GUI-G^2: Gaussian Reward Modeling for GUI Grounding

Paper • 2507.15846 • Published Jul 21, 2025 • 133
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent

Paper • 2508.05748 • Published Aug 7, 2025 • 141
Mobile-Agent-v3: Foundamental Agents for GUI Automation

Paper • 2508.15144 • Published Aug 21, 2025 • 64
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

Paper • 2508.16153 • Published Aug 22, 2025 • 160

about 20 hours ago

Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26, 2025 • 169
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO

Paper • 2505.22453 • Published May 28, 2025 • 46
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning

Paper • 2505.23380 • Published May 29, 2025 • 22
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models

Paper • 2505.21523 • Published May 23, 2025 • 13

facebook/w2v-bert-2.0

Feature Extraction • 0.6B • Updated Jan 25, 2024 • 2.8M • 202
facebook/metaclip-h14-fullcc2.5b

Zero-Shot Image Classification • 1.0B • Updated Jan 11, 2024 • 22.6k • 49
openai/clip-vit-large-patch14

Zero-Shot Image Classification • 0.4B • Updated Sep 15, 2023 • 7.99M • 1.96k
Salesforce/blip-image-captioning-large

Image-to-Text • 0.5B • Updated Feb 3, 2025 • 684k • 1.45k

Vision-DeepResearch

Osilly/Vision-DeepResearch-Toy-SFT-Data

Viewer • Updated 10 days ago • 1k • 78
Osilly/Vision-DeepResearch-Toy-RL-Data

Viewer • Updated 10 days ago • 1k • 34
Osilly/VDR-Bench

Viewer • Updated 9 days ago • 2k • 49
Osilly/VDR-Bench-testmini

Viewer • Updated 9 days ago • 500 • 26

Applications and Uses

ComfyUI-R1: Exploring Reasoning Models for Workflow Generation

Paper • 2506.09790 • Published Jun 11, 2025 • 53
Saffron-1: Towards an Inference Scaling Paradigm for LLM Safety Assurance

Paper • 2506.06444 • Published Jun 6, 2025 • 73
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents

Paper • 2506.11763 • Published Jun 13, 2025 • 74
Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research

Paper • 2502.04644 • Published Feb 7, 2025 • 4

Multimodal Agent

Gemini Robotics: Bringing AI into the Physical World

Paper • 2503.20020 • Published Mar 25, 2025 • 30
Magma: A Foundation Model for Multimodal AI Agents

Paper • 2502.13130 • Published Feb 18, 2025 • 58
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 51
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

Paper • 2410.23218 • Published Oct 30, 2024 • 49

Vision-DeepResearch: Incentivizing DeepResearch Capability in Multimodal Large Language Models

Paper • 2601.22060 • Published 13 days ago • 149

Vision-DeepResearch

Osilly/Vision-DeepResearch-Toy-SFT-Data

Viewer • Updated 10 days ago • 1k • 78
Osilly/Vision-DeepResearch-Toy-RL-Data

Viewer • Updated 10 days ago • 1k • 34
Osilly/VDR-Bench

Viewer • Updated 9 days ago • 2k • 49
Osilly/VDR-Bench-testmini

Viewer • Updated 9 days ago • 500 • 26

GUI-G^2: Gaussian Reward Modeling for GUI Grounding

Paper • 2507.15846 • Published Jul 21, 2025 • 133
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent

Paper • 2508.05748 • Published Aug 7, 2025 • 141
Mobile-Agent-v3: Foundamental Agents for GUI Automation

Paper • 2508.15144 • Published Aug 21, 2025 • 64
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

Paper • 2508.16153 • Published Aug 22, 2025 • 160

Applications and Uses

ComfyUI-R1: Exploring Reasoning Models for Workflow Generation

Paper • 2506.09790 • Published Jun 11, 2025 • 53
Saffron-1: Towards an Inference Scaling Paradigm for LLM Safety Assurance

Paper • 2506.06444 • Published Jun 6, 2025 • 73
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents

Paper • 2506.11763 • Published Jun 13, 2025 • 74
Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research

Paper • 2502.04644 • Published Feb 7, 2025 • 4

about 20 hours ago

Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26, 2025 • 169
Unsupervised Post-Training for Multi-Modal LLM Reasoning via GRPO

Paper • 2505.22453 • Published May 28, 2025 • 46
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning

Paper • 2505.23380 • Published May 29, 2025 • 22
More Thinking, Less Seeing? Assessing Amplified Hallucination in Multimodal Reasoning Models

Paper • 2505.21523 • Published May 23, 2025 • 13

Multimodal Agent

Gemini Robotics: Bringing AI into the Physical World

Paper • 2503.20020 • Published Mar 25, 2025 • 30
Magma: A Foundation Model for Multimodal AI Agents

Paper • 2502.13130 • Published Feb 18, 2025 • 58
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 51
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents

Paper • 2410.23218 • Published Oct 30, 2024 • 49

facebook/w2v-bert-2.0

Feature Extraction • 0.6B • Updated Jan 25, 2024 • 2.8M • 202
facebook/metaclip-h14-fullcc2.5b

Zero-Shot Image Classification • 1.0B • Updated Jan 11, 2024 • 22.6k • 49
openai/clip-vit-large-patch14

Zero-Shot Image Classification • 0.4B • Updated Sep 15, 2023 • 7.99M • 1.96k
Salesforce/blip-image-captioning-large

Image-to-Text • 0.5B • Updated Feb 3, 2025 • 684k • 1.45k

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs