You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Model Card for roberta-large-self-disclosure-sentence-classification

The model is used to classify whether a given sentence contains disclosure or not. It is a binary sentence-level classification where label 1 means containing self-disclosure, and 0 means not containing.

For more details, please read the paper: Reducing Privacy Risks in Online Self-Disclosures with Language Models .

Accessing this model implies automatic agreement to the following guidelines:

Only use the model for research purposes.
No redistribution without the author's agreement.
Any derivative works created using this model must acknowledge the original author.

Model Description

Model type: A finetuned sentence level classifier that classifies whether a given sentence contains disclosure or not.
Language(s) (NLP): English
License: Creative Commons Attribution-NonCommercial
Finetuned from model: FacebookAI/roberta-large

Example Code

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoConfig

config = AutoConfig.from_pretrained("douy/roberta-large-self-disclosure-sentence-classification")
tokenizer = AutoTokenizer.from_pretrained("douy/roberta-large-self-disclosure-sentence-classification")

model = AutoModelForSequenceClassification.from_pretrained("douy/roberta-large-self-disclosure-sentence-classification", 
            config=config, device_map="cuda:0").eval()

sentences = [
    "I am a 23-year-old who is currently going through the last leg of undergraduate school.",
    "There is a joke in the design industry about that.",
    "My husband and I live in US.",
    "I was messing with advanced voice the other day and I was like, 'Oh, I can do this.'",
]

inputs = tokenizer(sentences, return_tensors="pt", padding=True, truncation=True).to(model.device)

with torch.no_grad():
    logits = model(**inputs).logits

# predicted is the argmax of each row
predicted_class = logits.argmax(dim=-1)

# 1 means the sentence contains self-disclosure
# 0 means the sentence does not contain self-disclosure

# predicted_class: tensor([1, 0, 1, 0], device='cuda:0')

Evaluation

The model achieves 88.6% accuracy.

Citation

@article{dou2023reducing,
  title={Reducing Privacy Risks in Online Self-Disclosures with Language Models},
  author={Dou, Yao and Krsek, Isadora and Naous, Tarek and Kabra, Anubha and Das, Sauvik and Ritter, Alan and Xu, Wei},
  journal={arXiv preprint arXiv:2311.09538},
  year={2023}
}

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for douy/roberta-large-self-disclosure-sentence-classification

Base model

FacebookAI/roberta-large

Finetuned

(433)

this model

Paper for douy/roberta-large-self-disclosure-sentence-classification

Reducing Privacy Risks in Online Self-Disclosures with Language Models

Paper • 2311.09538 • Published Nov 16, 2023

douy
/

roberta-large-self-disclosure-sentence-classification