Instructions to use SpiceeChat/Genre-Classifier-1-20M-BASE-BF16 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use SpiceeChat/Genre-Classifier-1-20M-BASE-BF16 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-classification", model="SpiceeChat/Genre-Classifier-1-20M-BASE-BF16", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("SpiceeChat/Genre-Classifier-1-20M-BASE-BF16", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
Genre-Classifier-1-20M-BASE-BF16
A lightweight base model for first-name gender classification.
Overview
A lightweight 20M-parameter CausalLM built for first-name gender classification. This is a base model, designed to be fine-tuned on downstream tasks rather than used directly.
Model Details
| Property | Value |
|---|---|
| Architecture | FirstNameGenderForCausalLM |
| Parameters | ~20M |
| Context Length | 20 tokens |
| Layers | 4 |
| Attention Heads | 4 |
| Hidden Size | 384 |
| Vocab Size | 32,768 |
| Tensor Type | BF16 |
| License | Apache 2.0 |
Special Tokens
| Token | ID |
|---|---|
F_ID |
42 |
M_ID |
49 |
PAD_ID |
0 |
The model uses a causal language modeling objective with weight tying between the input embedding and output head (head.weight = tok_emb.weight).
Architecture
A lightweight GPT-style decoder with:
- 4 transformer layers
- 4 attention heads with a head dimension of 96
- SageAttention support (falls back to PyTorch attention if
sageattentionis not installed) - GELU activations in the MLP blocks
- LayerNorm before each attention and MLP block
Usage
from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained(
"SpiceeChat/Genre-Classifier-1-20M-BASE-BF16",
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
"SpiceeChat/Genre-Classifier-1-20M-BASE-BF16",
trust_remote_code=True
)
Inference
The model provides a dedicated predict_gender() method:
inputs = tokenizer("Arjun", return_tensors="pt")
pred_idx, probs = model.predict_gender(inputs.input_ids)
gender = "M" if pred_idx.item() == 1 else "F"
print(gender) # M
Training Data
This base model was pre-trained on a large-scale first-name dataset. It is not fine-tuned for any specific downstream task โ it's meant to be used as a starting point.
Fine-Tuning
To fine-tune this model on your own dataset:
- Load the base model with
trust_remote_code=True - Use a causal LM loss on the last token (the model was designed to predict the gender token at the final position)
- The special token IDs (42 = F, 49 = M) can be used as targets
Note: The model expects input sequences of length โค 20 tokens. Longer names will be truncated.
Dependencies
| Package | Version |
|---|---|
transformers |
>= 4.30.0 |
torch |
>= 2.0.0 |
sageattention |
optional, for faster attention |
Acknowledgements
Built by PhysiQuanty for SpiceeChat.
๐ This is a base model. For a production-ready fine-tuned version, see FirstName-Genre-Classifier-30M-SFT.
Built with a lot of caffeine โ by SpiceeChat
- Downloads last month
- 110