A newer version of this model is available: SpiceeChat/FirstName-Genre-Classifier-30M-SFT

SpiceeChat

Genre-Classifier-1-20M-BASE-BF16

A lightweight base model for first-name gender classification.

SpiceeChat License Params BF16


Overview

A lightweight 20M-parameter CausalLM built for first-name gender classification. This is a base model, designed to be fine-tuned on downstream tasks rather than used directly.


Model Details

Property Value
Architecture FirstNameGenderForCausalLM
Parameters ~20M
Context Length 20 tokens
Layers 4
Attention Heads 4
Hidden Size 384
Vocab Size 32,768
Tensor Type BF16
License Apache 2.0

Special Tokens

Token ID
F_ID 42
M_ID 49
PAD_ID 0

The model uses a causal language modeling objective with weight tying between the input embedding and output head (head.weight = tok_emb.weight).


Architecture

A lightweight GPT-style decoder with:

  • 4 transformer layers
  • 4 attention heads with a head dimension of 96
  • SageAttention support (falls back to PyTorch attention if sageattention is not installed)
  • GELU activations in the MLP blocks
  • LayerNorm before each attention and MLP block

Usage

from transformers import AutoModel, AutoTokenizer

model = AutoModel.from_pretrained(
    "SpiceeChat/Genre-Classifier-1-20M-BASE-BF16",
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
    "SpiceeChat/Genre-Classifier-1-20M-BASE-BF16",
    trust_remote_code=True
)

Inference

The model provides a dedicated predict_gender() method:

inputs = tokenizer("Arjun", return_tensors="pt")
pred_idx, probs = model.predict_gender(inputs.input_ids)
gender = "M" if pred_idx.item() == 1 else "F"
print(gender)  # M

Training Data

This base model was pre-trained on a large-scale first-name dataset. It is not fine-tuned for any specific downstream task โ€” it's meant to be used as a starting point.


Fine-Tuning

To fine-tune this model on your own dataset:

  1. Load the base model with trust_remote_code=True
  2. Use a causal LM loss on the last token (the model was designed to predict the gender token at the final position)
  3. The special token IDs (42 = F, 49 = M) can be used as targets

Note: The model expects input sequences of length โ‰ค 20 tokens. Longer names will be truncated.


Dependencies

Package Version
transformers >= 4.30.0
torch >= 2.0.0
sageattention optional, for faster attention

Acknowledgements

Built by PhysiQuanty for SpiceeChat.


๐Ÿ“Œ This is a base model. For a production-ready fine-tuned version, see FirstName-Genre-Classifier-30M-SFT.

Built with a lot of caffeine โ˜• by SpiceeChat

Downloads last month
110
Safetensors
Model size
19.7M params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for SpiceeChat/Genre-Classifier-1-20M-BASE-BF16

Finetunes
1 model

Dataset used to train SpiceeChat/Genre-Classifier-1-20M-BASE-BF16