Spanish Fake News Classifier: A Sequential Fine-Tuning Approach
Paper: https://openreview.net/forum?id=ARX3eBpHgq
Overview
This model is a binary classifier (real vs. fake) for Spanish long-form news articles.
It uses a sequential fine-tuning approach to overcome dataset imbalance between short and long articles:
- Stage 1: Pre-fine-tuning on ~57k short Spanish news articles to learn general fake-news patterns.
- Stage 2: Target fine-tuning on ~2k long-form articles to adapt to real-world journalism structure.
The base architecture is bert-base-spanish-wwm-uncased.
Model Details
- Base model:
dccuchile/bert-base-spanish-wwm-uncased - Max tokens: 512
- Stage 1 strategy: Last 3 encoder layers + pooler + classifier unfrozen
- Stage 2 strategy: Last 2 encoder layers + pooler + classifier unfrozen
- Intended use: Fake news detection in long-form Spanish journalism
- Language: Spanish (es)
Final Evaluation Metrics (Test Set)
| Metric | Value |
|---|---|
| Accuracy | 0.8205 |
| Precision | 0.7835 |
| Recall | 0.8702 |
| F1-score | 0.8246 |
| Loss | 0.4183 |
Usage
Installation
You can install the required dependencies using pip:
pip install transformers torch
Loading the Model
import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer
model = AutoModelForSequenceClassification.from_pretrained("Juanillaberia/spanish-fake-news-classifier")
tokenizer = AutoTokenizer.from_pretrained("Juanillaberia/spanish-fake-news-classifier")
Predict Function
def predict_article(article_text: str):
"""
Predicts whether a given article text is 'Real' or 'Fake' using the fine-tuned model.
Args:
article_text (str): The text of the article to classify.
Returns:
str: 'Real' if the article is predicted as real, 'Fake' otherwise.
"""
# Tokenize the input text
inputs = tokenizer(article_text, truncation=True, max_length=512, padding="max_length", return_tensors="pt")
# Make prediction
with torch.no_grad():
outputs = model(**inputs)
# Get logits and convert to probabilities
logits = outputs.logits
probabilities = torch.softmax(logits, dim=-1)
# Get predicted label (1 for Real, 0 for Fake)
predicted_label_id = torch.argmax(probabilities, dim=-1).item()
# Map label ID to 'Real' or 'Fake'
return "Real" if predicted_label_id == 1 else "Fake"
Making Predictions
text = "Your spanish article"
predicted_label = predict_article(text)
print(f"Predicted Label: {predicted_label}")
Note: Label (1) is for "Real" articles and label (0) is for "Fake" articles. This is how the model was train.
License
Apache License 2.0
Acknowledgments
Thanks to DCC UChile for the base Spanish BERT model.
- Downloads last month
- 49