train_hellaswag_456_1768397600

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the hellaswag dataset. It achieves the following results on the evaluation set:

Loss: 0.0876
Num Input Tokens Seen: 99684352

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 456
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.3995	0.5000	8979	0.1717	4988144
0.0257	1.0001	17958	0.1163	9975408
0.0393	1.5001	26937	0.1007	14958096
0.2131	2.0001	35916	0.0926	19946752
0.0005	2.5001	44895	0.1090	24920064
0.0876	3.0002	53874	0.0876	29906480
0.002	3.5002	62853	0.1251	34898064
0.0091	4.0002	71832	0.1022	39881968
0.0014	4.5003	80811	0.1163	44857344
0.059	5.0003	89790	0.1221	49845424
0.0029	5.5003	98769	0.1300	54836592
0.0012	6.0003	107748	0.1204	59819808
0.0	6.5004	116727	0.1324	64788176
0.0	7.0004	125706	0.1354	69784880
0.0001	7.5004	134685	0.1433	74764224
0.0	8.0004	143664	0.1540	79745680
0.1557	8.5005	152643	0.1564	84731648
0.0	9.0005	161622	0.1593	89712960
0.0	9.5005	170601	0.1609	94706800

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.1+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 10

Model tree for rbelanec/train_hellaswag_456_1768397600

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2309)

this model