Spanish Fake News Classifier: A Sequential Fine-Tuning Approach

Paper: https://openreview.net/forum?id=ARX3eBpHgq

Overview

This model is a binary classifier (real vs. fake) for Spanish long-form news articles.
It uses a sequential fine-tuning approach to overcome dataset imbalance between short and long articles:

Stage 1: Pre-fine-tuning on ~57k short Spanish news articles to learn general fake-news patterns.
Stage 2: Target fine-tuning on ~2k long-form articles to adapt to real-world journalism structure.

The base architecture is bert-base-spanish-wwm-uncased.

Model Details

Base model: dccuchile/bert-base-spanish-wwm-uncased
Max tokens: 512
Stage 1 strategy: Last 3 encoder layers + pooler + classifier unfrozen
Stage 2 strategy: Last 2 encoder layers + pooler + classifier unfrozen
Intended use: Fake news detection in long-form Spanish journalism
Language: Spanish (es)

Final Evaluation Metrics (Test Set)

Metric	Value
Accuracy	0.8205
Precision	0.7835
Recall	0.8702
F1-score	0.8246
Loss	0.4183

Usage

Installation

You can install the required dependencies using pip:

pip install transformers torch

Loading the Model

import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer

model = AutoModelForSequenceClassification.from_pretrained("Juanillaberia/spanish-fake-news-classifier")
tokenizer = AutoTokenizer.from_pretrained("Juanillaberia/spanish-fake-news-classifier")

Predict Function

def predict_article(article_text: str):
    """
    Predicts whether a given article text is 'Real' or 'Fake' using the fine-tuned model.

    Args:
        article_text (str): The text of the article to classify.

    Returns:
        str: 'Real' if the article is predicted as real, 'Fake' otherwise.
    """
    # Tokenize the input text
    inputs = tokenizer(article_text, truncation=True, max_length=512, padding="max_length", return_tensors="pt")

    # Make prediction
    with torch.no_grad():
        outputs = model(**inputs)

    # Get logits and convert to probabilities
    logits = outputs.logits
    probabilities = torch.softmax(logits, dim=-1)

    # Get predicted label (1 for Real, 0 for Fake)
    predicted_label_id = torch.argmax(probabilities, dim=-1).item()

    # Map label ID to 'Real' or 'Fake'
    return "Real" if predicted_label_id == 1 else "Fake"

Making Predictions

text = "Your spanish article"
predicted_label = predict_article(text)
print(f"Predicted Label: {predicted_label}")

Note: Label (1) is for "Real" articles and label (0) is for "Fake" articles. This is how the model was train.

License

Apache License 2.0

Acknowledgments

Thanks to DCC UChile for the base Spanish BERT model.

Downloads last month: 49

Safetensors

Model size

0.1B params

Tensor type

F32