Model card for keras-dots-ocr-finetuned-v1
This model is a fined-tuned version of the Dots OCR model. It has been fine-tuned specifically for extracting drug names from images of prescription. This model has been trained on 6k images of prescriptions with drug names annotated and verified by pharmacists.
Model Details
- Model Type: Vision-Language Model
- Base Model: Dots OCR
- Fine-tuning Method: Supervised Fine-Tuning (SFT)
- Training Data: 6k images of prescriptions with annotated drug names
- Intended Use Case: Extracting drug names from prescription images for pharmacy and healthcare applications
Requirements
- transformers == 4.51.3
- torch == 2.7.0
- torchvision < 0.23.0
Usage
This model uses very specific prompt to extract drug names. Please don't change the prompt structure even slightly as it may lead to suboptimal results.
from transformers import AutoModelForCausalLM, AutoProcessor
import torch
from qwen_vl_utils import process_vision_info
model = AutoModelForCausalLM.from_pretrained(
"KeraCare/keras-dots-ocr-finetuned-v1",
trust_remote_code=True,
attn_implementation="sdpa", # If the GPU supports flash attention, use "flash_attention_2"
torch_dtype=torch.bfloat16,
).to("cuda")
processor = AutoProcessor.from_pretrained(
"KeraCare/keras-dots-ocr-finetuned-v1",
trust_remote_code=True,
)
# Prepare inputs
prompt = """
You are an assistant that extracts drug names from prescription images (primarily French, sometimes English), even if noisy, blurry, or with background clutter.
Rules: return only drug names. Normalize spelling to the closest valid INN/brand as written (e.g., preserve brand vs. generic identity and combination names), deduplicate, and sort the drug names in lexical order. Do not invent or map to equivalents; if none are found, return an empty list.
Strip accent marks and special characters, and convert to lowercase.
Output strict JSON only in the following format:
{
"drug_names": [
"<drug_name_1>",
"<drug_name_2>",
]
}
"""
image_path = "path_to_your_image.jpg"
messages = [
{"role": "system", "content": prompt},
{"role": "user", "content": [{"type": "image", "image": image_path}]},
]
text = processor.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=False # training on full dialogue
)
images, _ = process_vision_info(messages)
inputs = processor(
text=[text],
images=images,
padding=True,
return_tensors="pt",
).to("cuda")
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.0,
top_p=0.9,
repetition_penalty=1.1,
eos_token_id=processor.tokenizer.eos_token_id,
)
Model Card Authors
- Mitiku Yohannes (kmitiku@kera.health)
Model Card Contact
For questions or issues regarding this model, please contact Mitiku Yohannes at kmitiku@kera.health.
- Downloads last month
- 359
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for KeraCare/keras-dots-ocr-finetuned-v1
Base model
rednote-hilab/dots.ocr