Qwen 2.5 7B SoftLabel

LoRA adapter fine-tuned with KL divergence against soft probability distributions from a Bayesian Graded Response Model (GRM) teacher for Big Five personality prediction.

Unlike standard cross-entropy (hard labels), KL divergence training preserves the teacher's uncertainty, producing calibrated probability estimates over 5-point Likert responses.

Training


Base model	Qwen 2.5 7B Instruct
Loss	KL Divergence (batchmean)
Precision	bf16
Infrastructure	University cluster (SLURM) — 4x NVIDIA RTX A6000 48GB

Data

11,250 train / 1,250 valid / 3,125 test episodes
Each episode: multi-turn IPIP-50 personality questionnaire with soft label targets over responses 1–5

Hyperparameters


LoRA r / alpha / dropout	16 / 16 / 0.05
Target modules	q, k, v, o, gate, up, down proj
Learning rate	1.5e-4 (cosine schedule, 100 warmup steps)
Effective batch size	32 (2 per-GPU x 4 GPUs x 4 grad accum)
Max epochs	3 (early stopping, patience=5)
Optimizer	AdamW fused (weight decay 0.01)
Max sequence length	4096

Results


Best eval loss (KL div)	0.000756
Final train loss	0.0008
Best checkpoint	Step 1000
Test accuracy	—
Teacher ceiling	51.28%

Downloads last month: 1

Model tree for DavidL123/qwen-2.5-7b-SoftLabel

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Adapter

(2127)

this model