emotion_urgency_classifier
A fine-tuned DeBERTa-v3-base model with two independent classification heads that simultaneously predicts the urgency and emotion level of UK telecoms customer complaints.
Each head outputs one of three classes: Low, Medium, or High.
Model Description
| Property | Value |
|---|---|
| Base model | microsoft/deberta-v3-base |
| Task | Dual-head text classification |
| Language | English (UK) |
| Input | Customer complaint text |
| Urgency output | Low / Medium / High |
| Emotion output | Low / Medium / High |
Architecture
The backbone is DeBERTa-v3-base. The [CLS] token representation feeds into two independent linear heads:
DeBERTa-v3-base
│
[CLS]
┌───┴───┐
urgency emotion
Linear Linear
(768→3) (768→3)
Training Data
5,000 synthetic UK telecoms customer complaints generated with GPT-5-mini, covering a 3×3 urgency × emotion grid. Each complaint was assigned a specific scenario, writing style, customer profile, and complaint history before generation to ensure diversity and label consistency.
| Urgency | Count | Share |
|---|---|---|
| Low | 1,750 | 35% |
| Medium | 2,000 | 40% |
| High | 1,250 | 25% |
Emotion is distributed evenly within each urgency group (~333 per emotion level per urgency level).
20 complaint scenarios including: Complete Service Outage, Fraud & Scams, Poor Network Coverage, Hidden Fees & Charges, Unfulfilled Fix Promises, and more.
Training Procedure
| Hyperparameter | Value |
|---|---|
| Max sequence length | 192 tokens |
| Batch size | 16 |
| Optimizer | AdamW |
| Learning rate | 2e-5 |
| LR schedule | Linear warmup (10%) + decay |
| Max epochs | 10 |
| Early stopping patience | 3 (on val urgency macro F1) |
| Urgency class weights | [1.0, 1.5, 1.2] — Low / Medium / High |
| Train / Val / Test split | 70 / 15 / 15 (stratified on urgency×emotion cell) |
Evaluation Results
Test set (held-out 15%)
| Head | Low F1 | Medium F1 | High F1 | Macro F1 |
|---|---|---|---|---|
| Urgency | 0.823 | 0.736 | 0.844 | 0.801 |
| Emotion | 0.891 | 0.825 | 0.841 | 0.852 |
Adversarial test (10 hand-crafted edge cases)
6 / 10 passed. The model handles urgency/emotion decoupling, technical jargon, and cheerful-but-urgent complaints correctly. Known failure cases: sarcasm (detected as neutral emotion) and cold formal legal language (detected as Low emotion despite High intent).
How to Use
Load and run inference
import torch
import torch.nn as nn
from transformers import AutoTokenizer, AutoModel, AutoConfig
class DeBERTaMultiHead(nn.Module):
def __init__(self, model_dir, num_classes=3):
super().__init__()
config = AutoConfig.from_pretrained(model_dir)
self.backbone = AutoModel.from_config(config)
hidden_size = config.hidden_size
self.urgency_head = nn.Linear(hidden_size, num_classes)
self.emotion_head = nn.Linear(hidden_size, num_classes)
def forward(self, input_ids, attention_mask, token_type_ids=None):
out = self.backbone(input_ids=input_ids, attention_mask=attention_mask,
token_type_ids=token_type_ids)
cls = out.last_hidden_state[:, 0, :]
return self.urgency_head(cls), self.emotion_head(cls)
LABEL_NAMES = ["Low", "Medium", "High"]
MODEL_DIR = "path/to/model_output" # or snapshot_download("yuansheng-tao/emotion_urgency_classifier")
tokenizer = AutoTokenizer.from_pretrained(MODEL_DIR)
model = DeBERTaMultiHead(MODEL_DIR)
model.load_state_dict(torch.load(f"{MODEL_DIR}/model_weights.pt", map_location="cpu"))
model.eval()
text = "My broadband has been down for three days and nobody is responding to my calls."
enc = tokenizer(text, max_length=192, padding="max_length",
truncation=True, return_tensors="pt")
with torch.no_grad():
urg_logits, emo_logits = model(**enc)
print("Urgency:", LABEL_NAMES[urg_logits.argmax().item()])
print("Emotion:", LABEL_NAMES[emo_logits.argmax().item()])
Download via script (from the project repo)
python model_training/download_model.py
Limitations
- Trained on synthetic data — may not fully generalise to real customer complaints, particularly for nuanced emotional expression.
- Sarcasm is not reliably detected: sarcastic praise is predicted as neutral or Low emotion.
- Cold formal language (e.g. legal threats) is consistently predicted as Low emotion despite High emotional intent.
- Domain-specific to UK telecoms — performance on other industries is untested.
- Downloads last month
- 34
Model tree for yuansheng-tao/emotion_urgency_classifier
Base model
microsoft/deberta-v3-base