Text Generation
PEFT
Safetensors
lora
soft-labels
kl-divergence
personality-prediction
big-five
bayesian-grm
conversational
Instructions to use DavidL123/qwen-2.5-7b-SoftLabel with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use DavidL123/qwen-2.5-7b-SoftLabel with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-7B-Instruct") model = PeftModel.from_pretrained(base_model, "DavidL123/qwen-2.5-7b-SoftLabel") - Notebooks
- Google Colab
- Kaggle
Qwen 2.5 7B SoftLabel
LoRA adapter fine-tuned with KL divergence against soft probability distributions from a Bayesian Graded Response Model (GRM) teacher for Big Five personality prediction.
Unlike standard cross-entropy (hard labels), KL divergence training preserves the teacher's uncertainty, producing calibrated probability estimates over 5-point Likert responses.
Training
| Base model | Qwen 2.5 7B Instruct |
| Loss | KL Divergence (batchmean) |
| Precision | bf16 |
| Infrastructure | University cluster (SLURM) โ 4x NVIDIA RTX A6000 48GB |
Data
- 11,250 train / 1,250 valid / 3,125 test episodes
- Each episode: multi-turn IPIP-50 personality questionnaire with soft label targets over responses 1โ5
Hyperparameters
| LoRA r / alpha / dropout | 16 / 16 / 0.05 |
| Target modules | q, k, v, o, gate, up, down proj |
| Learning rate | 1.5e-4 (cosine schedule, 100 warmup steps) |
| Effective batch size | 32 (2 per-GPU x 4 GPUs x 4 grad accum) |
| Max epochs | 3 (early stopping, patience=5) |
| Optimizer | AdamW fused (weight decay 0.01) |
| Max sequence length | 4096 |
Results
| Best eval loss (KL div) | 0.000756 |
| Final train loss | 0.0008 |
| Best checkpoint | Step 1000 |
| Test accuracy | โ |
| Teacher ceiling | 51.28% |
- Downloads last month
- 1