llada-1.0-s1

This model is a fine-tuned version of GSAI-ML/LLaDA-8B-Instruct on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2898

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 1
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • total_train_batch_size: 8
  • total_eval_batch_size: 64
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss
0.0763 0.8065 100 0.3922
0.1548 1.6129 200 0.3422
0.661 2.4194 300 0.3234
0.6585 3.2258 400 0.3156
0.1586 4.0323 500 0.3227
0.1371 4.8387 600 0.3219
2.4102 5.6452 700 0.3070
1.9522 6.4516 800 0.3320
0.2021 7.2581 900 0.3156
0.4729 8.0645 1000 0.3109
0.2006 8.8710 1100 0.3117
0.3131 9.6774 1200 0.2914
0.2943 10.4839 1300 0.3234
0.0529 11.2903 1400 0.2883
1.0432 12.0968 1500 0.2820
0.0808 12.9032 1600 0.2891
0.3329 13.7097 1700 0.2883
0.0928 14.5161 1800 0.3102
0.0672 15.3226 1900 0.3047
0.1119 16.1290 2000 0.2961
1.6034 16.9355 2100 0.3187
0.212 17.7419 2200 0.2937
0.2682 18.5484 2300 0.2883
0.2163 19.3548 2400 0.2898

Framework versions

  • PEFT 0.15.1
  • Transformers 4.49.0
  • Pytorch 2.6.0+cu124
  • Datasets 3.3.2
  • Tokenizers 0.21.4
Downloads last month
433
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for JakeOh/llada-1.0-s1

Adapter
(30)
this model