Instructions to use prithivMLmods/DREX-062225-exp with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use prithivMLmods/DREX-062225-exp with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="prithivMLmods/DREX-062225-exp")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText

processor = AutoProcessor.from_pretrained("prithivMLmods/DREX-062225-exp")
model = AutoModelForImageTextToText.from_pretrained("prithivMLmods/DREX-062225-exp")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use prithivMLmods/DREX-062225-exp with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "prithivMLmods/DREX-062225-exp"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/DREX-062225-exp",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/prithivMLmods/DREX-062225-exp

SGLang

How to use prithivMLmods/DREX-062225-exp with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "prithivMLmods/DREX-062225-exp" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/DREX-062225-exp",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "prithivMLmods/DREX-062225-exp" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "prithivMLmods/DREX-062225-exp",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use prithivMLmods/DREX-062225-exp with Docker Model Runner:
```
docker model run hf.co/prithivMLmods/DREX-062225-exp
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

DREX-062225-exp

The DREX-062225-exp (Document Retrieval and Extraction eXpert) model is a specialized fine-tuned version of docscopeOCR-7B-050425-exp, optimized for Document Retrieval, Content Extraction, and Analysis Recognition. Built on top of the Qwen2.5-VL architecture, this model enhances document comprehension capabilities with focused training on the Opendoc2-Analysis-Recognition dataset for superior document analysis and information extraction tasks.

DREX: Document Retrieval and Extraction eXpert [ experimental ]

Key Enhancements

Advanced Document Retrieval: Specialized capabilities for locating and retrieving specific information from complex document structures and layouts.
Enhanced Content Extraction: Optimized for extracting structured data, key information, and relevant content from diverse document types including reports, forms, and technical documentation.
Superior Analysis Recognition: Fine-tuned recognition abilities for document analysis tasks, pattern identification, and contextual understanding of document hierarchies.
Inherited OCR Excellence: Maintains all advanced OCR capabilities from the base docscopeOCR model including mathematical LaTeX formatting and multi-language support.
Document-Centric Understanding: Specialized training for understanding document relationships, cross-references, and contextual dependencies within complex document sets.

Markdown (.MD) - Inference

Quick Start with Transformers

from transformers import Qwen2_5_VLForConditionalGeneration, AutoTokenizer, AutoProcessor
from qwen_vl_utils import process_vision_info

model = Qwen2_5_VLForConditionalGeneration.from_pretrained(
    "prithivMLmods/DREX-062225-exp", torch_dtype="auto", device_map="auto"
)

processor = AutoProcessor.from_pretrained("prithivMLmods/DREX-062225-exp")

messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg",
            },
            {"type": "text", "text": "Extract and analyze the key information from this document."},
        ],
    }
]

text = processor.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)
image_inputs, video_inputs = process_vision_info(messages)
inputs = processor(
    text=[text],
    images=image_inputs,
    videos=video_inputs,
    padding=True,
    return_tensors="pt",
)
inputs = inputs.to("cuda")

generated_ids = model.generate(**inputs, max_new_tokens=128)
generated_ids_trimmed = [
    out_ids[len(in_ids):] for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]
output_text = processor.batch_decode(
    generated_ids_trimmed, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
print(output_text)

Training Details

Parameter	Value
Dataset	Opendoc2-Analysis-Recognition
Dataset Size	6,910 samples
Base Model	docscopeOCR-7B-050425-exp
Model Architecture	`Qwen2_5_VLForConditionalGeneration`
Hardware	2 × A40 (19 vCPUs)
Total Disk	280,000 MB
Training Time	3,407 seconds (~0.95 hours)
Warmup Steps	250
Precision	bfloat16

This model builds upon the robust foundation of docscopeOCR-7B-050425-exp with specialized training for document retrieval and extraction tasks.

Intended Use

This model is specifically designed for:

Document Retrieval: Efficiently locating specific information within large document collections and complex layouts.
Content Extraction: Precise extraction of structured data, tables, forms, and key information from various document types.
Analysis Recognition: Advanced recognition and analysis of document patterns, structures, and contextual relationships.
Enterprise Document Processing: Automated processing of business documents, reports, contracts, and administrative forms.
Research Document Analysis: Academic paper analysis, citation extraction, and research document comprehension.
Regulatory Compliance: Processing of compliance documents, regulatory filings, and standardized reporting formats.

Limitations

Inherits computational requirements from the base docscopeOCR model, requiring substantial resources for optimal performance.
Performance may vary on document types significantly different from the Opendoc2-Analysis-Recognition training dataset.
May show reduced accuracy on extremely specialized or domain-specific document formats not covered in training.
Long document processing requires adequate memory allocation and may not be suitable for real-time streaming applications.
Optimal performance depends on proper visual token configuration and input preprocessing.

References

Base Model: docscopeOCR-7B-050425-exp https://huggingface.co/prithivMLmods/docscopeOCR-7B-050425-exp
DocVLM: Make Your VLM an Efficient Reader https://arxiv.org/pdf/2412.08746v1
YaRN: Efficient Context Window Extension of Large Language Models
https://arxiv.org/pdf/2309.00071
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
https://arxiv.org/pdf/2409.12191
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
https://arxiv.org/pdf/2308.12966
A Comprehensive and Challenging OCR Benchmark for Evaluating Large Multimodal Models in Literacy https://arxiv.org/pdf/2412.02210