train_cb_123_1768397587

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cb dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1729
  • Num Input Tokens Seen: 318872

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 123
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
0.3701 0.5044 57 0.3213 17264
0.3664 1.0088 114 0.1825 32600
0.4096 1.5133 171 0.1899 49000
0.1741 2.0177 228 0.1937 64424
0.2235 2.5221 285 0.1757 80280
0.0016 3.0265 342 0.1788 96824
0.055 3.5310 399 0.1816 113160
0.1124 4.0354 456 0.1729 129560
0.0015 4.5398 513 0.1750 144920
0.0939 5.0442 570 0.2100 161104
0.0008 5.5487 627 0.2388 177936
0.3414 6.0531 684 0.2104 193600
0.0118 6.5575 741 0.2093 210320
0.0003 7.0619 798 0.2310 225808
0.0126 7.5664 855 0.2324 241424
0.0013 8.0708 912 0.2356 257632
0.0009 8.5752 969 0.2362 273792
0.0166 9.0796 1026 0.2397 289768
0.0006 9.5841 1083 0.2401 305784

Framework versions

  • PEFT 0.17.1
  • Transformers 4.51.3
  • Pytorch 2.9.1+cu128
  • Datasets 4.0.0
  • Tokenizers 0.21.4
Downloads last month
32
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_cb_123_1768397587

Adapter
(2204)
this model