HuggingFaceTB/SmolVLM2-500M-Video-Instruct Image-Text-to-Text • 0.5B • Updated Apr 8, 2025 • 79.9k • 112
microsoft/Phi-4-multimodal-instruct Automatic Speech Recognition • 6B • Updated about 1 month ago • 183k • 1.56k