meta-llama/Llama-3.2-11B-Vision-Instruct Image-Text-to-Text • 11B • Updated Dec 4, 2024 • 144k • 1.59k
meta-llama/Llama-3.2-90B-Vision-Instruct Image-Text-to-Text • 89B • Updated Mar 4, 2025 • 12.3k • 355
microsoft/Phi-4-multimodal-instruct Automatic Speech Recognition • 6B • Updated Dec 10, 2025 • 348k • 1.59k
meta-llama/Llama-4-Maverick-17B-128E-Instruct Image-Text-to-Text • 402B • Updated May 22, 2025 • 30.6k • • 479
Running on Zero Agents 39 Multimodal RAG with Granite Vision 🚀 39 RAG example using Granite [vision, embedding, instruct]
Running on Zero Agents Featured 255 MatchAnything 🏢 255 Match images to find similar pictures instantly