-
Med-RLVR: Emerging Medical Reasoning from a 3B base model via reinforcement Learning
Paper • 2502.19655 • Published -
MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning
Paper • 2502.19634 • Published • 63 -
R1-T1: Fully Incentivizing Translation Capability in LLMs via Reasoning Learning
Paper • 2502.19735 • Published • 9 -
AlphaMaze: Enhancing Large Language Models' Spatial Intelligence via GRPO
Paper • 2502.14669 • Published • 15
Deping Zhang
Deping
AI & ML interests
Deep Reinforcement Learning, Computer Vision, Large Language Models ( especially their "emergence" capabilities), Theoretical Condensed Matter Physics ( superconductivity, ferromagnetism)
Organizations
None yet
Video_MLLMS
LLMs
VideoEncoder
Video Understanding, Video Embedding, Video Tasks
-
Video as the New Language for Real-World Decision Making
Paper • 2402.17139 • Published • 21 -
VideoPrism: A Foundational Visual Encoder for Video Understanding
Paper • 2402.13217 • Published • 38 -
World Model on Million-Length Video And Language With RingAttention
Paper • 2402.08268 • Published • 40 -
microsoft/xclip-base-patch16-zero-shot
Video Classification • 0.2B • Updated • 13.3k • 26
GeneralDetector
LLM_Infra
VisionExpertModels
-
facebook/dinov2-giant
Image Feature Extraction • 1B • Updated • 106k • 54 -
openai/clip-vit-large-patch14-336
Zero-Shot Image Classification • Updated • 4.83M • 282 -
microsoft/layoutlmv3-large
Updated • 61.8k • 124 -
laion/CLIP-convnext_large_d_320.laion2B-s29B-b131K-ft-soup
Zero-Shot Image Classification • Updated • 73.2k • 21
VLMS
-
PsiPi/liuhaotian_llava-v1.5-13b-GGUF
Image-Text-to-Text • 13B • Updated • 740 • 37 -
TRI-ML/prismatic-vlms
Image-to-Text • Updated • 26 -
bczhou/tiny-llava-v1-hf
Image-Text-to-Text • 1B • Updated • 1.49k • 57 -
ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling
Paper • 2402.06118 • Published • 15
VLM_Datasets
MM_Datasets
LLM_VLM_R1
-
Med-RLVR: Emerging Medical Reasoning from a 3B base model via reinforcement Learning
Paper • 2502.19655 • Published -
MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning
Paper • 2502.19634 • Published • 63 -
R1-T1: Fully Incentivizing Translation Capability in LLMs via Reasoning Learning
Paper • 2502.19735 • Published • 9 -
AlphaMaze: Enhancing Large Language Models' Spatial Intelligence via GRPO
Paper • 2502.14669 • Published • 15
LLM_Infra
Video_MLLMS
VisionExpertModels
-
facebook/dinov2-giant
Image Feature Extraction • 1B • Updated • 106k • 54 -
openai/clip-vit-large-patch14-336
Zero-Shot Image Classification • Updated • 4.83M • 282 -
microsoft/layoutlmv3-large
Updated • 61.8k • 124 -
laion/CLIP-convnext_large_d_320.laion2B-s29B-b131K-ft-soup
Zero-Shot Image Classification • Updated • 73.2k • 21
LLMs
VLMS
-
PsiPi/liuhaotian_llava-v1.5-13b-GGUF
Image-Text-to-Text • 13B • Updated • 740 • 37 -
TRI-ML/prismatic-vlms
Image-to-Text • Updated • 26 -
bczhou/tiny-llava-v1-hf
Image-Text-to-Text • 1B • Updated • 1.49k • 57 -
ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling
Paper • 2402.06118 • Published • 15
VideoEncoder
Video Understanding, Video Embedding, Video Tasks
-
Video as the New Language for Real-World Decision Making
Paper • 2402.17139 • Published • 21 -
VideoPrism: A Foundational Visual Encoder for Video Understanding
Paper • 2402.13217 • Published • 38 -
World Model on Million-Length Video And Language With RingAttention
Paper • 2402.08268 • Published • 40 -
microsoft/xclip-base-patch16-zero-shot
Video Classification • 0.2B • Updated • 13.3k • 26
VLM_Datasets
GeneralDetector
MM_Datasets