train_svamp_789_1768397604

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the svamp dataset. It achieves the following results on the evaluation set:

Loss: 0.1569
Num Input Tokens Seen: 685888

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 789
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
1.6184	0.5016	158	1.6024	34528
0.3558	1.0032	316	0.3049	68640
0.0946	1.5048	474	0.2215	103344
0.1005	2.0063	632	0.1901	137520
0.0814	2.5079	790	0.1839	171568
0.1122	3.0095	948	0.1744	206288
0.0742	3.5111	1106	0.1666	240800
0.1734	4.0127	1264	0.1569	275088
0.1127	4.5143	1422	0.1650	309712
0.0637	5.0159	1580	0.1595	344000
0.088	5.5175	1738	0.1586	378480
0.0549	6.0190	1896	0.1610	412816
0.0067	6.5206	2054	0.1666	447072
0.1563	7.0222	2212	0.1688	481536
0.0135	7.5238	2370	0.1680	516016
0.0093	8.0254	2528	0.1733	550416
0.076	8.5270	2686	0.1762	585008
0.0225	9.0286	2844	0.1746	619312
0.0078	9.5302	3002	0.1729	653744

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.1+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 31

Model tree for rbelanec/train_svamp_789_1768397604

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2204)

this model