Model card for keras-dots-ocr-finetuned-v1

This model is a fined-tuned version of the Dots OCR model. It has been fine-tuned specifically for extracting drug names from images of prescription. This model has been trained on 6k images of prescriptions with drug names annotated and verified by pharmacists.

Model Details

Model Type: Vision-Language Model
Base Model: Dots OCR
Fine-tuning Method: Supervised Fine-Tuning (SFT)
Training Data: 6k images of prescriptions with annotated drug names
Intended Use Case: Extracting drug names from prescription images for pharmacy and healthcare applications

Requirements

transformers == 4.51.3
torch == 2.7.0
torchvision < 0.23.0

Usage

This model uses very specific prompt to extract drug names. Please don't change the prompt structure even slightly as it may lead to suboptimal results.

from transformers import AutoModelForCausalLM, AutoProcessor
import torch
from qwen_vl_utils import process_vision_info


model = AutoModelForCausalLM.from_pretrained(
    "KeraCare/keras-dots-ocr-finetuned-v1",
    trust_remote_code=True,
    attn_implementation="sdpa", # If the GPU supports flash attention, use "flash_attention_2"
    torch_dtype=torch.bfloat16,
).to("cuda")

processor = AutoProcessor.from_pretrained(
    "KeraCare/keras-dots-ocr-finetuned-v1",
    trust_remote_code=True,
)

# Prepare inputs
prompt = """
You are an assistant that extracts drug names from prescription images (primarily French, sometimes English), even if noisy, blurry, or with background clutter.

Rules: return only drug names. Normalize spelling to the closest valid INN/brand as written (e.g., preserve brand vs. generic identity and combination names), deduplicate, and sort the drug names in lexical order. Do not invent or map to equivalents; if none are found, return an empty list.
Strip accent marks and special characters, and convert to lowercase.

Output strict JSON only in the following format:

{
    "drug_names": [
        "<drug_name_1>",
        "<drug_name_2>",
    ]
}
"""
image_path = "path_to_your_image.jpg"

messages = [
    {"role": "system", "content": prompt},
    {"role": "user", "content": [{"type": "image", "image": image_path}]},
]

text = processor.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=False  # training on full dialogue
)

images, _ = process_vision_info(messages)

inputs = processor(
    text=[text],
    images=images,
    padding=True,
    return_tensors="pt",
).to("cuda")

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=512,
        temperature=0.0,
        top_p=0.9,
        repetition_penalty=1.1,
        eos_token_id=processor.tokenizer.eos_token_id,
    )

Model Card Authors

Mitiku Yohannes (kmitiku@kera.health)

Model Card Contact

For questions or issues regarding this model, please contact Mitiku Yohannes at kmitiku@kera.health.

Downloads last month: 359

Safetensors

Model size

3B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for KeraCare/keras-dots-ocr-finetuned-v1

Base model

rednote-hilab/dots.ocr

Finetuned

(5)

this model