A newer version of this model is available: SpiceeChat/FirstName-Genre-Classifier-30M-SFT

SpiceeChat

Genre-Classifier-1-20M-BASE-BF16

A lightweight base model for first-name gender classification.

Overview

A lightweight 20M-parameter CausalLM built for first-name gender classification. This is a base model, designed to be fine-tuned on downstream tasks rather than used directly.

Model Details

Property	Value
Architecture	`FirstNameGenderForCausalLM`
Parameters	~20M
Context Length	20 tokens
Layers	4
Attention Heads	4
Hidden Size	384
Vocab Size	32,768
Tensor Type	BF16
License	Apache 2.0

Special Tokens

Token	ID
`F_ID`	42
`M_ID`	49
`PAD_ID`	0

The model uses a causal language modeling objective with weight tying between the input embedding and output head (head.weight = tok_emb.weight).

Architecture

A lightweight GPT-style decoder with:

4 transformer layers
4 attention heads with a head dimension of 96
SageAttention support (falls back to PyTorch attention if sageattention is not installed)
GELU activations in the MLP blocks
LayerNorm before each attention and MLP block

Usage

from transformers import AutoModel, AutoTokenizer

model = AutoModel.from_pretrained(
    "SpiceeChat/Genre-Classifier-1-20M-BASE-BF16",
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
    "SpiceeChat/Genre-Classifier-1-20M-BASE-BF16",
    trust_remote_code=True
)

Inference

The model provides a dedicated predict_gender() method:

inputs = tokenizer("Arjun", return_tensors="pt")
pred_idx, probs = model.predict_gender(inputs.input_ids)
gender = "M" if pred_idx.item() == 1 else "F"
print(gender)  # M

Training Data

This base model was pre-trained on a large-scale first-name dataset. It is not fine-tuned for any specific downstream task — it's meant to be used as a starting point.

Fine-Tuning

To fine-tune this model on your own dataset:

Load the base model with trust_remote_code=True
Use a causal LM loss on the last token (the model was designed to predict the gender token at the final position)
The special token IDs (42 = F, 49 = M) can be used as targets

Note: The model expects input sequences of length ≤ 20 tokens. Longer names will be truncated.

Dependencies

Package	Version
`transformers`	>= 4.30.0
`torch`	>= 2.0.0
`sageattention`	optional, for faster attention

Acknowledgements

Built by PhysiQuanty for SpiceeChat.

📌 This is a base model. For a production-ready fine-tuned version, see FirstName-Genre-Classifier-30M-SFT.

_{Built with a lot of caffeine ☕ by SpiceeChat}

Downloads last month: 110

Safetensors

Model size

19.7M params

Tensor type

BF16

Model tree for SpiceeChat/Genre-Classifier-1-20M-BASE-BF16

Finetunes

1 model

SpiceeChat
/

Genre-Classifier-1-20M-BASE-BF16