You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Qwen3.5-4B GUI Grounding β€” v1 (SFT LoRA)

LoRA adapter for Qwen3.5-4B fine-tuned on GUI grounding: given a screenshot and a natural language instruction, predict the (x, y) click coordinate of the target UI element.

Results β€” ScreenSpot-V2

Split Accuracy
Overall 92.5%

Training Data

~23.5K samples from 3 GUI grounding datasets covering desktop, web, and mobile platforms.

Output Format

<|box_start|>(x,y)<|box_end|>

Coordinates are in [0, 1000] normalized space. To convert to pixel coordinates:

pixel_x = x / 1000 * image_width
pixel_y = y / 1000 * image_height

Usage

Requires transformers>=5.2.0 and peft.

from transformers import AutoProcessor, Qwen3_5ForConditionalGeneration
from peft import PeftModel
import torch

base = Qwen3_5ForConditionalGeneration.from_pretrained("Qwen/Qwen3.5-4B", torch_dtype=torch.bfloat16)
model = PeftModel.from_pretrained(base, "dabism23/qwen35-gui-grounding")
processor = AutoProcessor.from_pretrained("Qwen/Qwen3.5-4B")

Access

Model weights are gated. Request access to download. Training configuration details are included with the model files.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for mdabis/qwen35-gui-grounding

Finetuned
Qwen/Qwen3.5-4B
Adapter
(40)
this model

Datasets used to train mdabis/qwen35-gui-grounding