Model Card for trainer_output
This model is a fine-tuned version of hbpkillerX/gemma3-100m. It has been trained using TRL.
Quick start
from transformers import pipeline
question = "classify the following comment into Hate, Extreme Hate or Not Hate: Kya karr rhe ho tum log"
generator = pipeline("text-generation", model="hbpkillerX/trainer_output", device="cuda")
output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
print(output["generated_text"])
Training procedure
Dataset: The dataset contains Indonesian text comments, each labeled as 'Hate', 'Extreme Hate', or 'Not Hate'.
Training Procedure:
The dataset is formatted into a chat-based prompt structure, where the user asks the model to classify a comment, and the assistant's expected response is the correct label. The trl library's SFTTrainer is used to perform supervised fine-tuning on the gemma3-100m model with this formatted dataset. The goal is to teach the model to act as a classifier for the given categories.
Results & Analysis
The primary objective of this experiment was not to achieve state-of-the-art classification performance. Instead, the goal was to explore the process of pre-training a small language model from scratch and then applying Supervised Fine-Tuning (SFT) to adapt it for a downstream classification task.
Dataset Imbalance
The Indo-HateSpeech dataset used for fine-tuning is highly imbalanced, which significantly impacts the training and evaluation. The class distribution is as follows:
- Not Hate: 64,194 samples
- Hate: 11,034 samples
- Extreme Hate: 2,698 samples
Evaluation Metrics
The model was evaluated on a test set split from the original data. The results show a high overall accuracy, but a closer look at the per-class metrics reveals the effect of the data imbalance.
Classification Report:
| Class | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| Not Hate | 0.98 | 0.99 | 0.98 | 3211 |
| Hate | 0.75 | 0.88 | 0.81 | 557 |
| Extreme Hate | 0.00 | 0.00 | 0.00 | 129 |
| Accuracy | 0.94 | 3897 | ||
| Macro Avg | 0.57 | 0.62 | 0.60 | 3897 |
| Weighted Avg | 0.91 | 0.94 | 0.92 | 3897 |
Confusion Matrix:
| True / Predicted | Not Hate | Hate | Extreme Hate |
|---|---|---|---|
| Not Hate | 3166 | 45 | 0 |
| Hate | 69 | 488 | 0 |
| Extreme Hate | 9 | 120 | 0 |
Analysis
The model performs exceptionally well on the 'Not Hate' class, which is expected as it is the vast majority class. Performance on the 'Hate' class is reasonable, with a good recall (0.88) but lower precision (0.75).
The model completely fails to identify the 'Extreme Hate' class. The precision, recall, and F1-score are all zero. The confusion matrix shows that not a single 'Extreme Hate' sample was correctly classified. This is a classic symptom of training on severely imbalanced data; the model learns that it can achieve higher overall accuracy by never predicting the rarest class.
In conclusion, while the experiment was successful in demonstrating the SFT process on a custom-trained small model, the results underscore the critical challenge posed by imbalanced datasets, especially for minority classes.
Framework versions
- TRL: 0.23.0.dev0
- Transformers: 4.56.1
- Pytorch: 2.8.0
- Datasets: 4.0.0
- Tokenizers: 0.22.0
Citations
Indo-HateSpeech Dataset Cite as -
@misc{kaware2024indo,
title={Indo-HateSpeech},
author={Kaware, Pravin},
year={2024},
publisher={Mendeley Data},
version={V1},
doi={10.17632/snc7mxpj6t.1}
}
Cite TRL as:
@misc{vonwerra2022trl,
title={{TRL: Transformer Reinforcement Learning}},
author={Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
year={2020},
journal={GitHub repository},
publisher={GitHub},
howpublished={\url{https://github.com/huggingface/trl}}
}
- Downloads last month
- 2
Model tree for hbpkillerX/gemma3-indo-hatespeech-100m
Base model
hbpkillerX/gemma3-100m