Papers
arxiv:2601.09136

SkinFlow: Efficient Information Transmission for Open Dermatological Diagnosis via Dynamic Visual Encoding and Staged RL

Published on Jan 14
· Submitted by
CharlesChen
on Jan 15
Authors:
,
,
,
,
,
,
,
,
,

Abstract

SkinFlow introduces a novel framework for dermatological vision-language modeling that improves diagnostic accuracy through optimized visual information transmission efficiency rather than parameter scaling alone.

AI-generated summary

General-purpose Large Vision-Language Models (LVLMs), despite their massive scale, often falter in dermatology due to "diffuse attention" - the inability to disentangle subtle pathological lesions from background noise. In this paper, we challenge the assumption that parameter scaling is the only path to medical precision. We introduce SkinFlow, a framework that treats diagnosis as an optimization of visual information transmission efficiency. Our approach utilizes a Virtual-Width Dynamic Vision Encoder (DVE) to "unfold" complex pathological manifolds without physical parameter expansion, coupled with a two-stage Reinforcement Learning strategy. This strategy sequentially aligns explicit medical descriptions (Stage I) and reconstructs implicit diagnostic textures (Stage II) within a constrained semantic space. Furthermore, we propose a clinically grounded evaluation protocol that prioritizes diagnostic safety and hierarchical relevance over rigid label matching. Empirical results are compelling: our 7B model establishes a new state-of-the-art on the Fitzpatrick17k benchmark, achieving a +12.06% gain in Top-1 accuracy and a +28.57% boost in Top-6 accuracy over the massive general-purpose models (e.g., Qwen3VL-235B and GPT-5.2). These findings demonstrate that optimizing geometric capacity and information flow yields superior diagnostic reasoning compared to raw parameter scaling.

Community

General-purpose Large Vision-Language Models (LVLMs), despite their massive scale, often falter in dermatology due to "diffuse attention" - the inability to disentangle subtle pathological lesions from background noise. In this paper, we challenge the assumption that parameter scaling is the only path to medical precision. We introduce SkinFlow, a framework that treats diagnosis as an optimization of visual information transmission efficiency. Our approach utilizes a Virtual-Width Dynamic Vision Encoder (DVE) to "unfold" complex pathological manifolds without physical parameter expansion, coupled with a two-stage Reinforcement Learning strategy. This strategy sequentially aligns explicit medical descriptions (Stage I) and reconstructs implicit diagnostic textures (Stage II) within a constrained semantic space. Furthermore, we propose a clinically grounded evaluation protocol that prioritizes diagnostic safety and hierarchical relevance over rigid label matching. Empirical results are compelling: our 7B model establishes a new state-of-the-art on the Fitzpatrick17k benchmark, achieving a +12.06% gain in Top-1 accuracy and a +28.57% boost in Top-6 accuracy over the massive general-purpose models (e.g., Qwen3VL-235B and GPT-5.2). These findings demonstrate that optimizing geometric capacity and information flow yields superior diagnostic reasoning compared to raw parameter scaling.

SkinFlow
When Scaling Fails in Dermatology

Despite the rapid scaling of foundation models, dermatological diagnosis remains unexpectedly difficult.
SkinFlow shows that the bottleneck is not reasoning capacity, but visual information loss.

Key Insight

We reinterpret dermatological VLMs as information transmission systems.
Critical diagnostic errors originate from inefficient visual encoding, where subtle pathological textures are irreversibly lost before reasoning begins.

Our Approach

Dynamic Visual Encoding (DVE)
Adaptively unfolds pathological manifolds, increasing visual SNR without increasing parameters.
20260115-114641
Two-Stage Training
Describe → Diagnose: structured medical descriptions first, followed by top-K ranking–based diagnosis.
20260115-114607

Clinically Aligned Evaluation
Metrics grounded in disease hierarchy and diagnostic safety, beyond Top-1 accuracy.

Results

exported_img_v3_02tu_36596ab8-aefa-4cd9-acd5-6d7c89c3fd8g

With only 7B parameters, SkinFlow outperforms 100B+ general-purpose models (e.g., GPT-5.2, Qwen3-VL-235B) on Fitzpatrick17k.

Takeaway

Scaling is not enough.
For expert visual diagnosis, how information is encoded and transmitted matters more than model size.

perfect!!!

SkinFlow is an excellent work that combines strong clinical applicability with academic innovation. By optimizing information flow and geometric architectural design, it achieves breakthroughs in dermatological diagnosis with a small number of parameters, challenging the conventional belief that large models are necessary for high performance. This work offers a lightweight and safety-oriented paradigm for medical AI, and with further advances in generalization and interpretability, it has the potential to become a representative model in specialized domains.

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2601.09136 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2601.09136 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2601.09136 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.