-
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 189 -
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training
Paper • 2401.00849 • Published • 17 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 51 -
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing
Paper • 2311.00571 • Published • 43
Collections
Discover the best community collections!
Collections including paper arxiv:2411.19930
-
On Domain-Specific Post-Training for Multimodal Large Language Models
Paper • 2411.19930 • Published • 31 -
VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation
Paper • 2412.10704 • Published • 16 -
Multi-task retriever fine-tuning for domain-specific and efficient RAG
Paper • 2501.04652 • Published • 10 -
M-A-D/Mixed-Arabic-Datasets-Repo
Viewer • Updated • 209M • 571 • 38
-
On Domain-Specific Post-Training for Multimodal Large Language Models
Paper • 2411.19930 • Published • 31 -
Adapting Large Language Models via Reading Comprehension
Paper • 2309.09530 • Published • 81 -
Instruction Pre-Training: Language Models are Supervised Multitask Learners
Paper • 2406.14491 • Published • 96
-
instruction-pretrain/finance-Llama3-8B
Text Generation • 8B • Updated • 230 • • 74 -
AdaptLLM/finance-chat
Text Generation • 7B • Updated • 1.09k • 100 -
On Domain-Specific Post-Training for Multimodal Large Language Models
Paper • 2411.19930 • Published • 31 -
HuggingFaceM4/Idefics3-8B-Llama3
Image-Text-to-Text • 8B • Updated • 137k • 302
-
On Domain-Specific Post-Training for Multimodal Large Language Models
Paper • 2411.19930 • Published • 31 -
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 113 -
Union of Experts: Adapting Hierarchical Routing to Equivalently Decomposed Transformer
Paper • 2503.02495 • Published • 9 -
Fine-Tuning Small Language Models for Domain-Specific AI: An Edge AI Perspective
Paper • 2503.01933 • Published • 13
-
PUMA: Empowering Unified MLLM with Multi-granular Visual Generation
Paper • 2410.13861 • Published • 56 -
JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation
Paper • 2411.07975 • Published • 31 -
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Paper • 2411.10442 • Published • 87 -
Multimodal Autoregressive Pre-training of Large Vision Encoders
Paper • 2411.14402 • Published • 47
-
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 189 -
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training
Paper • 2401.00849 • Published • 17 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 51 -
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing
Paper • 2311.00571 • Published • 43
-
On Domain-Specific Post-Training for Multimodal Large Language Models
Paper • 2411.19930 • Published • 31 -
Adapting Large Language Models via Reading Comprehension
Paper • 2309.09530 • Published • 81 -
Instruction Pre-Training: Language Models are Supervised Multitask Learners
Paper • 2406.14491 • Published • 96
-
instruction-pretrain/finance-Llama3-8B
Text Generation • 8B • Updated • 230 • • 74 -
AdaptLLM/finance-chat
Text Generation • 7B • Updated • 1.09k • 100 -
On Domain-Specific Post-Training for Multimodal Large Language Models
Paper • 2411.19930 • Published • 31 -
HuggingFaceM4/Idefics3-8B-Llama3
Image-Text-to-Text • 8B • Updated • 137k • 302
-
On Domain-Specific Post-Training for Multimodal Large Language Models
Paper • 2411.19930 • Published • 31 -
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 113 -
Union of Experts: Adapting Hierarchical Routing to Equivalently Decomposed Transformer
Paper • 2503.02495 • Published • 9 -
Fine-Tuning Small Language Models for Domain-Specific AI: An Edge AI Perspective
Paper • 2503.01933 • Published • 13
-
On Domain-Specific Post-Training for Multimodal Large Language Models
Paper • 2411.19930 • Published • 31 -
VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation
Paper • 2412.10704 • Published • 16 -
Multi-task retriever fine-tuning for domain-specific and efficient RAG
Paper • 2501.04652 • Published • 10 -
M-A-D/Mixed-Arabic-Datasets-Repo
Viewer • Updated • 209M • 571 • 38
-
PUMA: Empowering Unified MLLM with Multi-granular Visual Generation
Paper • 2410.13861 • Published • 56 -
JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation
Paper • 2411.07975 • Published • 31 -
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Paper • 2411.10442 • Published • 87 -
Multimodal Autoregressive Pre-training of Large Vision Encoders
Paper • 2411.14402 • Published • 47