Visual Question Answering
Transformers
Safetensors
English
videollama3_qwen2
text-generation
multi-modal
large-language-model
video-language-model
custom_code
Instructions to use DAMO-NLP-SG/VideoLLaMA3-7B-Image with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use DAMO-NLP-SG/VideoLLaMA3-7B-Image with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("visual-question-answering", model="DAMO-NLP-SG/VideoLLaMA3-7B-Image", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("DAMO-NLP-SG/VideoLLaMA3-7B-Image", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
| { | |
| "auto_map": { | |
| "AutoImageProcessor": "image_processing_videollama3.Videollama3ImageProcessor", | |
| "AutoProcessor": "processing_videollama3.Videollama3Qwen2Processor" | |
| }, | |
| "do_convert_rgb": true, | |
| "do_normalize": true, | |
| "do_rescale": true, | |
| "do_resize": true, | |
| "image_mean": [ | |
| 0.5, | |
| 0.5, | |
| 0.5 | |
| ], | |
| "image_processor_type": "Videollama3ImageProcessor", | |
| "image_std": [ | |
| 0.5, | |
| 0.5, | |
| 0.5 | |
| ], | |
| "max_tokens": 16384, | |
| "min_tokens": 16, | |
| "patch_size": 14, | |
| "processor_class": "Videollama3Qwen2Processor", | |
| "resample": 3, | |
| "rescale_factor": 0.00392156862745098 | |
| } | |