train_stsb_42_1767887010

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the stsb dataset. It achieves the following results on the evaluation set:

Loss: 0.4508
Num Input Tokens Seen: 3928080

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.7515	0.5002	1294	0.6846	197040
0.9202	1.0004	2588	0.5452	392608
0.5493	1.5006	3882	0.5131	588592
0.3837	2.0008	5176	0.4846	785728
0.3415	2.5010	6470	0.4859	982048
0.3315	3.0012	7764	0.4833	1178784
0.4845	3.5014	9058	0.4625	1374176
0.6275	4.0015	10352	0.4571	1571952
0.5573	4.5017	11646	0.4721	1768848
0.5374	5.0019	12940	0.4582	1964960
0.348	5.5021	14234	0.4588	2161632
0.8313	6.0023	15528	0.4508	2358288
0.4394	6.5025	16822	0.4532	2554352
0.6566	7.0027	18116	0.4547	2750912
0.3465	7.5029	19410	0.4567	2947664
0.3684	8.0031	20704	0.4522	3144128
0.4798	8.5033	21998	0.4562	3339904
0.2915	9.0035	23292	0.4530	3537024
0.2465	9.5037	24586	0.4512	3733152

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.1+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 6

Model tree for rbelanec/train_stsb_42_1767887010

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2204)

this model