train_cb_123_1768397587

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cb dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 123
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.3701	0.5044	57	0.3213	17264
0.3664	1.0088	114	0.1825	32600
0.4096	1.5133	171	0.1899	49000
0.1741	2.0177	228	0.1937	64424
0.2235	2.5221	285	0.1757	80280
0.0016	3.0265	342	0.1788	96824
0.055	3.5310	399	0.1816	113160
0.1124	4.0354	456	0.1729	129560
0.0015	4.5398	513	0.1750	144920
0.0939	5.0442	570	0.2100	161104
0.0008	5.5487	627	0.2388	177936
0.3414	6.0531	684	0.2104	193600
0.0118	6.5575	741	0.2093	210320
0.0003	7.0619	798	0.2310	225808
0.0126	7.5664	855	0.2324	241424
0.0013	8.0708	912	0.2356	257632
0.0009	8.5752	969	0.2362	273792
0.0166	9.0796	1026	0.2397	289768
0.0006	9.5841	1083	0.2401	305784

Base model

Adapter

(2204)

this model