train_svamp_789_1768397604

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the svamp dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1569
  • Num Input Tokens Seen: 685888

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 789
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
1.6184 0.5016 158 1.6024 34528
0.3558 1.0032 316 0.3049 68640
0.0946 1.5048 474 0.2215 103344
0.1005 2.0063 632 0.1901 137520
0.0814 2.5079 790 0.1839 171568
0.1122 3.0095 948 0.1744 206288
0.0742 3.5111 1106 0.1666 240800
0.1734 4.0127 1264 0.1569 275088
0.1127 4.5143 1422 0.1650 309712
0.0637 5.0159 1580 0.1595 344000
0.088 5.5175 1738 0.1586 378480
0.0549 6.0190 1896 0.1610 412816
0.0067 6.5206 2054 0.1666 447072
0.1563 7.0222 2212 0.1688 481536
0.0135 7.5238 2370 0.1680 516016
0.0093 8.0254 2528 0.1733 550416
0.076 8.5270 2686 0.1762 585008
0.0225 9.0286 2844 0.1746 619312
0.0078 9.5302 3002 0.1729 653744

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.1+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
31
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_svamp_789_1768397604

Adapter
(2204)
this model