Spanish Fake News Classifier: A Sequential Fine-Tuning Approach

Paper: https://openreview.net/forum?id=ARX3eBpHgq

Overview

This model is a binary classifier (real vs. fake) for Spanish long-form news articles.
It uses a sequential fine-tuning approach to overcome dataset imbalance between short and long articles:

  1. Stage 1: Pre-fine-tuning on ~57k short Spanish news articles to learn general fake-news patterns.
  2. Stage 2: Target fine-tuning on ~2k long-form articles to adapt to real-world journalism structure.

The base architecture is bert-base-spanish-wwm-uncased.

Model Details

  • Base model: dccuchile/bert-base-spanish-wwm-uncased
  • Max tokens: 512
  • Stage 1 strategy: Last 3 encoder layers + pooler + classifier unfrozen
  • Stage 2 strategy: Last 2 encoder layers + pooler + classifier unfrozen
  • Intended use: Fake news detection in long-form Spanish journalism
  • Language: Spanish (es)

Final Evaluation Metrics (Test Set)

Metric Value
Accuracy 0.8205
Precision 0.7835
Recall 0.8702
F1-score 0.8246
Loss 0.4183

Usage

Installation

You can install the required dependencies using pip:

pip install transformers torch

Loading the Model

import torch
from transformers import AutoModelForSequenceClassification, AutoTokenizer

model = AutoModelForSequenceClassification.from_pretrained("Juanillaberia/spanish-fake-news-classifier")
tokenizer = AutoTokenizer.from_pretrained("Juanillaberia/spanish-fake-news-classifier")

Predict Function

def predict_article(article_text: str):
    """
    Predicts whether a given article text is 'Real' or 'Fake' using the fine-tuned model.

    Args:
        article_text (str): The text of the article to classify.

    Returns:
        str: 'Real' if the article is predicted as real, 'Fake' otherwise.
    """
    # Tokenize the input text
    inputs = tokenizer(article_text, truncation=True, max_length=512, padding="max_length", return_tensors="pt")

    # Make prediction
    with torch.no_grad():
        outputs = model(**inputs)

    # Get logits and convert to probabilities
    logits = outputs.logits
    probabilities = torch.softmax(logits, dim=-1)

    # Get predicted label (1 for Real, 0 for Fake)
    predicted_label_id = torch.argmax(probabilities, dim=-1).item()

    # Map label ID to 'Real' or 'Fake'
    return "Real" if predicted_label_id == 1 else "Fake"

Making Predictions

text = "Your spanish article"
predicted_label = predict_article(text)
print(f"Predicted Label: {predicted_label}")

Note: Label (1) is for "Real" articles and label (0) is for "Fake" articles. This is how the model was train.

License

Apache License 2.0

Acknowledgments

Thanks to DCC UChile for the base Spanish BERT model.

Downloads last month
49
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support