Model Card for trainer_output

This model is a fine-tuned version of hbpkillerX/gemma3-100m. It has been trained using TRL.

Quick start

from transformers import pipeline

question = "classify the following comment into Hate, Extreme Hate or Not Hate: Kya karr rhe ho tum log"
generator = pipeline("text-generation", model="hbpkillerX/trainer_output", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])

Training procedure

Dataset: The dataset contains Indonesian text comments, each labeled as 'Hate', 'Extreme Hate', or 'Not Hate'.

Training Procedure:

The dataset is formatted into a chat-based prompt structure, where the user asks the model to classify a comment, and the assistant's expected response is the correct label. The trl library's SFTTrainer is used to perform supervised fine-tuning on the gemma3-100m model with this formatted dataset. The goal is to teach the model to act as a classifier for the given categories.

Results & Analysis

The primary objective of this experiment was not to achieve state-of-the-art classification performance. Instead, the goal was to explore the process of pre-training a small language model from scratch and then applying Supervised Fine-Tuning (SFT) to adapt it for a downstream classification task.

Dataset Imbalance

The Indo-HateSpeech dataset used for fine-tuning is highly imbalanced, which significantly impacts the training and evaluation. The class distribution is as follows:

Not Hate: 64,194 samples
Hate: 11,034 samples
Extreme Hate: 2,698 samples

Evaluation Metrics

The model was evaluated on a test set split from the original data. The results show a high overall accuracy, but a closer look at the per-class metrics reveals the effect of the data imbalance.

Classification Report:

Class	Precision	Recall	F1-Score	Support
Not Hate	0.98	0.99	0.98	3211
Hate	0.75	0.88	0.81	557
Extreme Hate	0.00	0.00	0.00	129
Accuracy			0.94	3897
Macro Avg	0.57	0.62	0.60	3897
Weighted Avg	0.91	0.94	0.92	3897

Confusion Matrix:

True / Predicted	Not Hate	Hate
Not Hate	3166	45
Hate	69	488
Extreme Hate	9	120

Analysis

The model performs exceptionally well on the 'Not Hate' class, which is expected as it is the vast majority class. Performance on the 'Hate' class is reasonable, with a good recall (0.88) but lower precision (0.75).

The model completely fails to identify the 'Extreme Hate' class. The precision, recall, and F1-score are all zero. The confusion matrix shows that not a single 'Extreme Hate' sample was correctly classified. This is a classic symptom of training on severely imbalanced data; the model learns that it can achieve higher overall accuracy by never predicting the rarest class.

In conclusion, while the experiment was successful in demonstrating the SFT process on a custom-trained small model, the results underscore the critical challenge posed by imbalanced datasets, especially for minority classes.

Framework versions

TRL: 0.23.0.dev0
Transformers: 4.56.1
Pytorch: 2.8.0
Datasets: 4.0.0
Tokenizers: 0.22.0

Citations

Indo-HateSpeech Dataset Cite as -

@misc{kaware2024indo,
    title={Indo-HateSpeech},
    author={Kaware, Pravin},
    year={2024},
    publisher={Mendeley Data},
    version={V1},
    doi={10.17632/snc7mxpj6t.1}
}

Cite TRL as:

@misc{vonwerra2022trl,
    title={{TRL: Transformer Reinforcement Learning}},
    author={Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
    year={2020},
    journal={GitHub repository},
    publisher={GitHub},
    howpublished={\url{https://github.com/huggingface/trl}}
}

Downloads last month: 2

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for hbpkillerX/gemma3-indo-hatespeech-100m

Base model

hbpkillerX/gemma3-100m

Finetuned

(1)

this model