Llama-3.1-Minitron-4B-Depth-Chat

This is a supervised fine-tuned (SFT) chat model based on
nvidia/Llama-3.1-Minitron-4B-Depth-Base.

It was trained to improve instruction following and conversational ability, using a standard chat format compatible with the Transformers chat API. The finetuning was conducted on a H100 GPU and took around 1h.


πŸ“Œ Model Details

  • Base model: nvidia/Llama-3.1-Minitron-4B-Depth-Base
  • Fine-tuning method: Supervised Fine-Tuning (TRL)
  • Trainer: trl.SFTTrainer
  • Dataset: OpenHermes-2.5 (selected 64k quality filtered samples)
  • Precision: bf16
  • Context length: up to 4096 tokens
  • Chat format: ChatML-style (<|im_start|>role\ncontent<|im_end|>)

This model aims to better follow human instructions and generate more natural responses than the base model.


Example Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("jonny-vr/Llama-3.1-Minitron-4B-Depth-Chat")
model = AutoModelForCausalLM.from_pretrained("jonny-vr/Llama-3.1-Minitron-4B-Depth-Chat")

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain how to do SFT training."}
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

output = model.generate(**inputs, max_new_tokens=128)
print(tokenizer.decode(output[0], skip_special_tokens=True))
Downloads last month
2
Safetensors
Model size
5B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for jonny-vr/Llama-3.1-Minitron-4B-Depth-Chat

Finetuned
(5)
this model

Dataset used to train jonny-vr/Llama-3.1-Minitron-4B-Depth-Chat