roberta-base-pii-guardrails

A finetuned RoBERTa-base model for Personally Identifiable Information (PII) detection via Named Entity Recognition (NER), trained as part of a multi-layer enterprise AI guardrails system. The model identifies and classifies 57 types of PII entities in text using BIO tagging.

Model Description

This model is the PII detection component of a defense-in-depth guardrails gateway designed to prevent sensitive data exfiltration through LLM deployments. It operates alongside a jailbreak classifier and a prompt injection detector to form a multi-signal security layer.

Property Value
Base model roberta-base (125M params)
Task Token classification (BIO NER)
Entity types 57 PII categories
Max input length 256 tokens
Training framework PyTorch + HuggingFace Transformers

Training Data

Trained on the English subset of ai4privacy/pii-masking-200k โ€” a large, diverse dataset of synthetic text with fine-grained PII annotations across 57 entity types.

Dataset split (English only):

Split Examples
Train ~34,800
Validation ~4,350
Test ~4,351

Training Details

Optimizer:          AdamW
Learning rate:      3e-5
Epochs:             5 (best checkpoint: epoch 4)
Batch size:         16
Warmup ratio:       0.1
Weight decay:       0.01
Grad clipping:      1.0
Max seq length:     256
Label alignment:    First subword only (-100 for subsequent subwords)
Padding:            Dynamic (DataCollatorForTokenClassification)

Evaluation Results

Overall Metrics (Test Set)

Metric Value
PII Recall 98.21% โ† most critical for security
PII F1 97.96%
PII Precision 97.71%
F1 Micro (entity-level) 93.77%
Loss 0.0876

Training Curve

Epoch Train Loss Val F1 Micro PII Recall
1 0.5546 0.8905 97.12%
2 0.0980 0.9252 97.72%
3 0.0784 0.9291 98.00%
4 0.0672 0.9380 98.07%
5 0.0565 0.9355 98.03%

Per-Entity-Type F1 (Test Set)

Entity F1 Entity F1
NEARBYGPSCOORDINATE 1.000 CREDITCARDISSUER 0.975
SEX 0.998 CITY 0.974
EMAIL 0.993 VEHICLEVIN 0.974
ORDINALDIRECTION 0.993 FIRSTNAME 0.973
URL 0.991 JOBTYPE 0.968
ACCOUNTNAME 0.990 LASTNAME 0.967
MAC 0.990 HEIGHT 0.965
VEHICLEVRM 0.990 BIC 0.961
USERNAME 0.990 STREET 0.959
IBAN 0.989 MIDDLENAME 0.949
GENDER 0.989 ACCOUNTNUMBER 0.948
USERAGENT 0.989 BITCOINADDRESS 0.946
SSN 0.988 CURRENCYSYMBOL 0.945
ETHEREUMADDRESS 0.987 EYECOLOR 0.945
PHONENUMBER 0.985 AMOUNT 0.944
PHONEIMEI 0.984 PIN 0.939
PASSWORD 0.982 CREDITCARDCVV 0.930
COMPANYNAME 0.982 PREFIX 0.929
SECONDARYADDRESS 0.981 BUILDINGNUMBER 0.918
JOBAREA 0.980 ZIPCODE 0.905
COUNTY 0.979 CREDITCARDNUMBER 0.901
TIME 0.979 LITECOINADDRESS 0.899
JOBTITLE 0.977 AGE 0.895
STATE 0.977 DATE 0.880
CITY 0.974 CURRENCYCODE 0.880
โ€” โ€” IP / IPV4 / IPV6 0.81 / 0.15

Harder entities: IP addresses (F1=0.15โ€“0.81), DOB (0.71), CURRENCYNAME (0.42), and MASKEDNUMBER (0.84) score lower due to high ambiguity โ€” these formats appear frequently in non-PII contexts.

Usage

from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch

tokenizer = AutoTokenizer.from_pretrained("shashidharbabu/roberta-base-pii-guardrails")
model     = AutoModelForTokenClassification.from_pretrained("shashidharbabu/roberta-base-pii-guardrails")
model.eval()

def detect_pii(text):
    words = text.split()
    enc   = tokenizer(
        words,
        is_split_into_words=True,
        return_tensors="pt",
        truncation=True,
        max_length=256,
    )
    with torch.no_grad():
        logits = model(**enc).logits
    preds    = logits.argmax(dim=-1)[0].tolist()
    word_ids = enc.word_ids(0)
    id2label = model.config.id2label

    results, prev_wid = [], None
    for wid, pred in zip(word_ids, preds):
        if wid is None or wid == prev_wid:
            prev_wid = wid
            continue
        label = id2label[pred]
        if label != "O":
            results.append({"word": words[wid], "entity": label})
        prev_wid = wid
    return results

# Example
text = "Hi, my name is John Smith and my email is john.smith@gmail.com"
print(detect_pii(text))
# โ†’ [
#     {'word': 'John',               'entity': 'B-FIRSTNAME'},
#     {'word': 'Smith',              'entity': 'B-LASTNAME'},
#     {'word': 'john.smith@gmail.com','entity': 'B-EMAIL'},
#   ]

Supported Entity Types (57 total)

ACCOUNTNAME ยท ACCOUNTNUMBER ยท AGE ยท AMOUNT ยท BIC ยท BITCOINADDRESS ยท BUILDINGNUMBER ยท CITY ยท COMPANYNAME ยท COUNTY ยท CREDITCARDCVV ยท CREDITCARDISSUER ยท CREDITCARDNUMBER ยท CURRENCY ยท CURRENCYCODE ยท CURRENCYNAME ยท CURRENCYSYMBOL ยท DATE ยท DOB ยท EMAIL ยท ETHEREUMADDRESS ยท EYECOLOR ยท FIRSTNAME ยท GENDER ยท HEIGHT ยท IBAN ยท IP ยท IPV4 ยท IPV6 ยท JOBAREA ยท JOBTITLE ยท JOBTYPE ยท LASTNAME ยท LITECOINADDRESS ยท MAC ยท MASKEDNUMBER ยท MIDDLENAME ยท NEARBYGPSCOORDINATE ยท ORDINALDIRECTION ยท PASSWORD ยท PHONEIMEI ยท PHONENUMBER ยท PIN ยท PREFIX ยท SECONDARYADDRESS ยท SEX ยท SSN ยท STATE ยท STREET ยท TIME ยท URL ยท USERAGENT ยท USERNAME ยท VEHICLEVIN ยท VEHICLEVRM ยท ZIPCODE

Limitations

  • English only: Trained exclusively on English text. Performance on non-English inputs is untested and likely poor
  • IP address detection: F1 of 0.15 for generic IP label โ€” bare IP addresses are highly ambiguous and frequently appear in non-PII contexts
  • DOB / date ambiguity: Date of birth (F1=0.71) is confused with general dates; context-free date strings are hard to classify
  • Credit card vs SSN confusion: Long numeric sequences (e.g. credit card numbers in space-separated format) are sometimes misclassified as SSN
  • Synthetic training data: Trained on synthetically generated text โ€” performance may differ on real-world informal text styles
  • Max 256 tokens: Long documents must be chunked; PII spanning chunk boundaries may be missed

Intended Use

This model is designed as a pre-filter and post-filter in an LLM serving pipeline to:

  • Detect PII in user inputs before they reach the base model
  • Detect PII in model outputs before they are returned to the user
  • Flag or redact sensitive information in enterprise AI deployments

Not intended for:

  • Legal compliance decisions without human review
  • Languages other than English
  • Real-time systems requiring sub-millisecond latency without optimization

Citation

@misc{guardrails2026,
  title  = {Multi-Layer LLM Security Gateway with Specialized Finetuned Models},
  author = {Shashidhar Babu et al.},
  year   = {2026},
  note   = {San Jose State University, Graduate Project}
}

Project

This model is part of the Guardrails Gateway project at San Jose State University โ€” a multi-layer LLM security system combining:

Tracked with Weights & Biases.

Downloads last month
28
Safetensors
Model size
0.1B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for shashidharbabu/deberta-pii-guardrails

Finetuned
(2162)
this model

Dataset used to train shashidharbabu/deberta-pii-guardrails

Evaluation results

  • PII F1 (binary) on ai4privacy/pii-masking-200k (English, held-out test set)
    self-reported
    0.980
  • PII Recall on ai4privacy/pii-masking-200k (English, held-out test set)
    self-reported
    0.982
  • PII Precision on ai4privacy/pii-masking-200k (English, held-out test set)
    self-reported
    0.977
  • F1 Micro (entity-level) on ai4privacy/pii-masking-200k (English, held-out test set)
    self-reported
    0.938