Instructions to use mrcuddle/Tiny-Darkllama3.2-1B-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use mrcuddle/Tiny-Darkllama3.2-1B-Instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="mrcuddle/Tiny-Darkllama3.2-1B-Instruct")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("mrcuddle/Tiny-Darkllama3.2-1B-Instruct")
model = AutoModelForCausalLM.from_pretrained("mrcuddle/Tiny-Darkllama3.2-1B-Instruct")

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use mrcuddle/Tiny-Darkllama3.2-1B-Instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "mrcuddle/Tiny-Darkllama3.2-1B-Instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "mrcuddle/Tiny-Darkllama3.2-1B-Instruct",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/mrcuddle/Tiny-Darkllama3.2-1B-Instruct

SGLang

How to use mrcuddle/Tiny-Darkllama3.2-1B-Instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "mrcuddle/Tiny-Darkllama3.2-1B-Instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "mrcuddle/Tiny-Darkllama3.2-1B-Instruct",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "mrcuddle/Tiny-Darkllama3.2-1B-Instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "mrcuddle/Tiny-Darkllama3.2-1B-Instruct",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use mrcuddle/Tiny-Darkllama3.2-1B-Instruct with Docker Model Runner:
```
docker model run hf.co/mrcuddle/Tiny-Darkllama3.2-1B-Instruct
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

See axolotl config

axolotl version: 0.6.0

base_model: unsloth/Llama-3.2-1B
bf16: false
dataset_prepared_path: last_run_prepared
datasets:
- chat_template: alpaca
  field_messages: conversations
  message_field_content: value
  message_field_role: from
  path: ChaoticNeutrals/Luminous_Opus
  split: train
  type: chat_template
debug: null
deepspeed: null
early_stopping_patience: null
evals_per_epoch: null
flash_attention: false
fp16: false
fsdp: null
fsdp_config: null
gradient_accumulation_steps: 1
gradient_checkpointing: true
group_by_length: false
hub_model_id: mrcuddle/Tiny-Darkllama3.2-1B-Instruct
is_llama_derived_model: true
learning_rate: 0.0002
load_in_4bit: false
load_in_8bit: false
local_rank: null
logging_steps: 1
lr_scheduler: linear
max_steps: 20
micro_batch_size: 1
mlflow_experiment_name: colab-example
model_type: LlamaForCausalLM
num_epochs: 4
optimizer: adamw_torch
output_dir: ./llama2
pad_to_sequence_len: true
resume_from_checkpoint: null
sample_packing: true
saves_per_epoch: null
sequence_len: 1096
special_tokens: null
strict: false
tf32: false
tokenizer_type: LlamaTokenizer
train_on_inputs: false
wandb_entity: null
wandb_log_model: null
wandb_name: null
wandb_project: null
wandb_watch: null
warmup_steps: 10
weight_decay: 0.0
xformers_attention: null

Tiny-Darkllama3.2-1B-Instruct

This model was trained from unsloth/Llama-3.2-1B on the ChaoticNeutrals/Luminous_Opus, Synthetic-Dark-RP, Synthetic-RP datasets.

Training and evaluation data

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 1
eval_batch_size: 1
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 10
training_steps: 20

Training results

[2025-02-11 13:09:27,300] [INFO] [axolotl.train.train:173] [PID:7240] [RANK:0] Starting trainer... [2025-02-11 13:09:27,706] [INFO] [axolotl.utils.samplers.multipack.calc_min_len:203] [PID:7240] [RANK:0] gather_len_batches: [35] [2025-02-11 13:09:27,761] [INFO] [axolotl.callbacks.on_train_begin:39] [PID:7240] [RANK:0] The Axolotl config has been saved to the MLflow artifacts. {'loss': 3.4922, 'grad_norm': 9.877531051635742, 'learning_rate': 2e-05, 'epoch': 0.03} 5% 1/20 [00:02<00:37, 1.98s/it][2025-02-11 13:09:31,221] [INFO] [axolotl.callbacks.on_step_end:127] [PID:7240] [RANK:0] cuda memory usage while training: 12.320GB (+8.604GB cache, +0.565GB misc) {'loss': 3.3057, 'grad_norm': 11.661816596984863, 'learning_rate': 4e-05, 'epoch': 0.06} {'loss': 2.4733, 'grad_norm': 8.751928329467773, 'learning_rate': 6e-05, 'epoch': 0.09} {'loss': 2.9842, 'grad_norm': 10.503549575805664, 'learning_rate': 8e-05, 'epoch': 0.11} {'loss': 2.6624, 'grad_norm': 12.645892143249512, 'learning_rate': 0.0001, 'epoch': 0.14} {'loss': 2.7616, 'grad_norm': 10.691230773925781, 'learning_rate': 0.00012, 'epoch': 0.17} {'loss': 2.9891, 'grad_norm': 10.076760292053223, 'learning_rate': 0.00014, 'epoch': 0.2} {'loss': 2.3745, 'grad_norm': 10.034379959106445, 'learning_rate': 0.00016, 'epoch': 0.23} {'loss': 2.4965, 'grad_norm': 9.778562545776367, 'learning_rate': 0.00018, 'epoch': 0.26} {'loss': 2.3811, 'grad_norm': 19.146963119506836, 'learning_rate': 0.0002, 'epoch': 0.29} {'loss': 3.3611, 'grad_norm': 14.556534767150879, 'learning_rate': 0.00018, 'epoch': 0.31} {'loss': 2.9619, 'grad_norm': 16.88424301147461, 'learning_rate': 0.00016, 'epoch': 0.34} {'loss': 2.121, 'grad_norm': 9.94941520690918, 'learning_rate': 0.00014, 'epoch': 0.37} {'loss': 2.1042, 'grad_norm': 23.178285598754883, 'learning_rate': 0.00012, 'epoch': 0.4} {'loss': 2.4722, 'grad_norm': 10.403461456298828, 'learning_rate': 0.0001, 'epoch': 0.43} {'loss': 2.7434, 'grad_norm': 11.339975357055664, 'learning_rate': 8e-05, 'epoch': 0.46} {'loss': 2.2349, 'grad_norm': 202.98793029785156, 'learning_rate': 6e-05, 'epoch': 0.49} {'loss': 2.3479, 'grad_norm': 10.250885009765625, 'learning_rate': 4e-05, 'epoch': 0.51} {'loss': 2.4169, 'grad_norm': 14.021651268005371, 'learning_rate': 2e-05, 'epoch': 0.54} {'loss': 3.4686, 'grad_norm': 10.988056182861328, 'learning_rate': 0.0, 'epoch': 0.57} {'train_runtime': 172.0118, 'train_samples_per_second': 0.116, 'train_steps_per_second': 0.116, 'train_loss': 2.707640600204468, 'epoch': 0.57} 100% 20/20 [02:52<00:00, 8.65s/it]