XLM-RoBERTa-large + CRF for Situation-Entity Segmentation

Fine-tuned XLM-RoBERTa-large with a linear classifier and a CRF output layer for situation-entity segmentation.

The model assigns BI(O) tags (B-EDU, I-EDU, O) to each token, marking the boundaries and spans of situation-entity segments — contiguous clause-level segments that describe a single situation type.

We use the multilingual version of RoBERTa to improve possible zero-shot transfer to situation segmentation in other language varieties.

Architecture

XLM-RoBERTa-large encoder  →  Linear(1024 → 3)  →  CRF(3 tags)
  • Encoder: FacebookAI/xlm-roberta-large
  • Classifier: single linear layer mapping the encoder's hidden states to 3 tag logits
  • Decoder: Viterbi decoding via a linear-chain CRF (pytorch-crf)
  • Labels: B-EDU (0), I-EDU (1), O (2)

Training Data

Fine-tuned on the situation entity annotated corpus from:

Annemarie Friedrich, Alexis Palmer and Manfred Pinkal. Situation entity types: automatic classification of clause-level aspect. ACL 2016. (GitHub)

The dataset is licensed under the Apache License 2.0. Per the terms of the Apache 2.0 license, notice is hereby given that these weights represent a modified derivative work based on that data.

The corpus contains English text with clause-level situation-entity annotations. The standard train/dev/test split from the original paper is used.

Training Details

Hyperparameter Value
Base model FacebookAI/xlm-roberta-large
Learning rate 4e-5
Epochs (max) 20
Batch size 64
Weight decay 0.001
Early stopping patience 3 (B-EDU F1 on dev)
Precision fp16

Please find further training details in our code on GitHub.

Results

Evaluated on the held-out test set. The table shows the best single run and the mean ± std across 5 random seeds for the best hyperparameter configuration (lr=4e-5, wd=0.001). A full grid search over 4 configurations × 5 seeds (20 runs total) was conducted; all configurations achieved similar B-EDU F1 in the range 0.902–0.904.

Metric Best run Mean ± std (5 seeds)
B-EDU F1 0.907 0.904 ± 0.002
B-EDU Precision 0.901 0.898 ± 0.010
B-EDU Recall 0.914 0.911 ± 0.009
Token Accuracy 0.982 0.979 ± 0.002
WindowDiff (↓) 0.075 0.077 ± 0.002
Exact Match (sentence) 0.753 0.742 ± 0.007

WindowDiff (Pevzner & Hearst, 2002) measures boundary-level segmentation quality within a sliding window of half the average reference segment length (lower is better). Exact Match is the fraction of sentences whose full tag sequence is predicted correctly (sentence level).

Usage

Requirements

pip install transformers torch pytorch-crf

spaCy is not a hard dependency, but is recommended for sentence splitting (matching the training setup):

pip install spacy && python -m spacy download en_core_web_sm

Loading the model

from transformers import AutoConfig, AutoModel, AutoTokenizer

config    = AutoConfig.from_pretrained("xaver-krueckl/situation-entity-segmenter", trust_remote_code=True)
model     = AutoModel.from_pretrained("xaver-krueckl/situation-entity-segmenter", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("FacebookAI/xlm-roberta-large")

model.eval()

Inference

The model was trained on spaCy-tokenised, sentence-split input (en_core_web_sm), so inference should follow the same setup. Split your input text into sentences using spaCy first, then call model.predict_text(words, tokenizer) with the word tokens for each sentence:

import spacy

nlp = spacy.load("en_core_web_sm")

text    = "The cat sat on the mat. It looked around the room."
results = []

for sent in nlp(text).sents:
    words = [token.text for token in sent]
    results.extend(model.predict_text(words, tokenizer))

for word, tag in results:
    print(f"{word:20s} {tag}")

B-EDU marks the start of a new situation-entity segment; I-EDU marks its continuation; O marks tokens outside any segment.

Limitations

  • Trained and evaluated on ~40.000 situation English segments.
  • Performance may vary on out-of-domain text.
  • Sub-token sequences longer than 512 tokens need to be chunked before inference - regular sentences should be shorter, though.

Acknowledgement

We gratefully acknowledge the scientific support and HPC resources provided by the Erlangen National High Performance Computing Center (NHR@FAU) of the Friedrich-Alexander Universität Erlangen-Nürnberg (FAU) under the NHR project v110ee. NHR funding is provided by federal and Bavarian state authorities. NHR@FAU hardware is partially funded by the German Research Foundation (DFG) – 440719683.

Citation

Please cite our paper when using the model:

@inproceedings{schmueck-etal-2026,
    title = "Cross-Linguistic Situation Entity Segmentation for Discourse Analysis in Diachronic English and German Text",
    author = "we will update when published :)"
}

Please also cite the original annotation paper:

@inproceedings{friedrich-etal-2016-situation,
    title = "Situation entity types: automatic classification of clause-level aspect",
    author = "Friedrich, Annemarie  and
      Palmer, Alexis  and
      Pinkal, Manfred",
    editor = "Erk, Katrin  and
      Smith, Noah A.",
    booktitle = "Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = aug,
    year = "2016",
    address = "Berlin, Germany",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/P16-1166/",
    doi = "10.18653/v1/P16-1166",
    pages = "1757--1768"
}
Downloads last month
244
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for xaver-maria-krueckl/situation-entity-segmenter

Finetuned
(964)
this model