You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Qwen3.5-4B GUI Grounding β€” v3 (SFT LoRA + Unfrozen ViT)

LoRA adapter for Qwen3.5-4B fine-tuned on GUI grounding: given a screenshot and a natural language instruction, predict the (x, y) click coordinate of the target UI element.

v3 was an experimental run that unfroze the vision encoder and increased input resolution. It resulted in a regression vs v2, demonstrating that unfreezing the full ViT on ~23.5K samples degrades pretrained visual features.

Results β€” ScreenSpot-V2

Split Accuracy
Overall 92.7%

Training Data

~23.5K samples from 3 GUI grounding datasets covering desktop, web, and mobile platforms.

Output Format

<|box_start|>(x,y)<|box_end|>

Usage

Requires transformers>=5.2.0 and peft.

from transformers import AutoProcessor, Qwen3_5ForConditionalGeneration
from peft import PeftModel
import torch

base = Qwen3_5ForConditionalGeneration.from_pretrained("Qwen/Qwen3.5-4B", torch_dtype=torch.bfloat16)
model = PeftModel.from_pretrained(base, "dabism23/qwen35-gui-grounding_v3")
processor = AutoProcessor.from_pretrained("Qwen/Qwen3.5-4B")

Key Findings

  • Unfreezing the full ViT on ~23.5K samples caused overfitting and degraded performance
  • Higher input resolution (2M pixels) did not compensate for ViT degradation
  • Frozen ViT remains the better approach at this dataset scale

Version History

Version ScreenSpot-V2 Notes
v1 92.5% Baseline
v2 93.4% Best
v3 92.7% Regression (unfrozen ViT)

Access

Model weights are gated. Request access to download. Training configuration details are included with the model files.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for mdabis/qwen35-gui-grounding_v3

Finetuned
Qwen/Qwen3.5-4B
Adapter
(38)
this model

Datasets used to train mdabis/qwen35-gui-grounding_v3