train_hellaswag_456_1768397600

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the hellaswag dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0876
  • Num Input Tokens Seen: 99684352

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 456
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.3995 0.5000 8979 0.1717 4988144
0.0257 1.0001 17958 0.1163 9975408
0.0393 1.5001 26937 0.1007 14958096
0.2131 2.0001 35916 0.0926 19946752
0.0005 2.5001 44895 0.1090 24920064
0.0876 3.0002 53874 0.0876 29906480
0.002 3.5002 62853 0.1251 34898064
0.0091 4.0002 71832 0.1022 39881968
0.0014 4.5003 80811 0.1163 44857344
0.059 5.0003 89790 0.1221 49845424
0.0029 5.5003 98769 0.1300 54836592
0.0012 6.0003 107748 0.1204 59819808
0.0 6.5004 116727 0.1324 64788176
0.0 7.0004 125706 0.1354 69784880
0.0001 7.5004 134685 0.1433 74764224
0.0 8.0004 143664 0.1540 79745680
0.1557 8.5005 152643 0.1564 84731648
0.0 9.0005 161622 0.1593 89712960
0.0 9.5005 170601 0.1609 94706800

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.1+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
10
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_hellaswag_456_1768397600

Adapter
(2309)
this model