Opposition Detector

Binary classifier for parliamentary sentences: does a sentence express Opposition toward the European Union (label 1), or does it not (label 0, covering both Neutral and Support stances)?

This is the first stage of a two-step stance-detection cascade. A sentence is first passed through this Opposition detector; if it is classified as Non-Opposition, it is then passed to a separate Support detector (Support vs Neutral). Both stages use a 0.5 decision threshold.

Fine-tuned from jhu-clsp/mmBERT-base on hand-annotated parliamentary speeches from AUS, CZE, DEU, DNK, ESP, GBR, NLD, and SWE.

Labels

  • 0 โ€” Non-Opposition (Neutral or Support)
  • 1 โ€” Opposition

Training data

  • Source: hand-annotated parliamentary sentences labelled Neutral, Support, or Opposition.
  • Binarised for this model as Opposition vs the other two classes.
  • File: Stance_Retrain_undersampled.csv (undersampled to address class imbalance).
  • Split: leakage-safe StratifiedGroupKFold (n_splits=10) on country ร— speech_ID, so no speech appears in more than one fold. Realised allocation: 8 folds train / 1 fold val / 1 fold test (~80/10/10). The Opposition and Support detectors share the same underlying stance split for consistent cascade evaluation.

Hyperparameters

  • Base model: jhu-clsp/mmBERT-base
  • Max sequence length: 320
  • Learning rate: 1.5e-05
  • Epochs: 3
  • Batch size: 16 (with gradient accumulation if large model)
  • Warmup ratio: 0.2
  • Weight decay: 0.05
  • LR scheduler: cosine
  • Optimizer: AdamW (HF Trainer default)
  • Mixed precision: fp16
  • Early stopping patience: 2 (monitoring f1_positive on val)
  • Class weights: balanced (sklearn compute_class_weight)
  • Focal loss: disabled (plain weighted cross-entropy)
  • Random seed: 123
  • Model selection: best checkpoint by validation f1_positive (minority-class F1)

Input format

Sentence-only input (no surrounding context window). Truncation to 320 tokens.

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

tok = AutoTokenizer.from_pretrained("LBenoit/opposition-detector-mmbert")
mdl = AutoModelForSequenceClassification.from_pretrained("LBenoit/opposition-detector-mmbert")

text = "This directive from Brussels undermines our national sovereignty."
enc  = tok(text, truncation=True, max_length=320, return_tensors="pt")
with torch.no_grad():
    prob_opp = torch.softmax(mdl(**enc).logits, dim=-1)[0, 1].item()
print("P(Opposition) =", prob_opp)

Intended use

Research on parliamentary stance toward the EU. Designed to be used as the first stage of an Opposition โ†’ Support cascade for full 3-way stance classification (Neutral / Support / Opposition). Outputs reflect the training corpus and annotation scheme; downstream prevalence estimates should ideally be calibrated against a base-rate-representative sample.

Limitations

  • Trained on parliamentary register; performance on social media, journalism, or other domains is not guaranteed.
  • Coverage limited to the eight countries listed above; generalisation to other parliaments is untested.
  • Sentence-level only; longer-range discourse context is not modelled.
Downloads last month
-
Safetensors
Model size
0.3B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for LBenoit/opposition-detector-mmbert

Finetuned
(108)
this model