You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Qwen3.5-4B GUI Grounding — v3 (SFT LoRA + Unfrozen ViT)

LoRA adapter for Qwen3.5-4B fine-tuned on GUI grounding: given a screenshot and a natural language instruction, predict the (x, y) click coordinate of the target UI element.

v3 was an experimental run that unfroze the vision encoder and increased input resolution. It resulted in a regression vs v2, demonstrating that unfreezing the full ViT on ~23.5K samples degrades pretrained visual features.

Results — ScreenSpot-V2

Split	Accuracy
Overall	92.7%

Training Data

~23.5K samples from 3 GUI grounding datasets covering desktop, web, and mobile platforms.

Output Format

<|box_start|>(x,y)<|box_end|>

Usage

Requires transformers>=5.2.0 and peft.

from transformers import AutoProcessor, Qwen3_5ForConditionalGeneration
from peft import PeftModel
import torch

base = Qwen3_5ForConditionalGeneration.from_pretrained("Qwen/Qwen3.5-4B", torch_dtype=torch.bfloat16)
model = PeftModel.from_pretrained(base, "dabism23/qwen35-gui-grounding_v3")
processor = AutoProcessor.from_pretrained("Qwen/Qwen3.5-4B")

Key Findings

Unfreezing the full ViT on ~23.5K samples caused overfitting and degraded performance
Higher input resolution (2M pixels) did not compensate for ViT degradation
Frozen ViT remains the better approach at this dataset scale

Version History

Version	ScreenSpot-V2	Notes
v1	92.5%	Baseline
v2	93.4%	Best
v3	92.7%	Regression (unfrozen ViT)

Access

Model weights are gated. Request access to download. Training configuration details are included with the model files.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Image-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mdabis/qwen35-gui-grounding_v3

Base model

Qwen/Qwen3.5-4B-Base

Finetuned

Qwen/Qwen3.5-4B

Adapter

(38)

this model

mdabis
/

qwen35-gui-grounding_v3