Llama-3.1-Minitron-4B-Depth-Chat

This is a supervised fine-tuned (SFT) chat model based on
nvidia/Llama-3.1-Minitron-4B-Depth-Base.

It was trained to improve instruction following and conversational ability, using a standard chat format compatible with the Transformers chat API. The finetuning was conducted on a H100 GPU and took around 1h.

📌 Model Details

Base model: nvidia/Llama-3.1-Minitron-4B-Depth-Base
Fine-tuning method: Supervised Fine-Tuning (TRL)
Trainer: trl.SFTTrainer
Dataset: OpenHermes-2.5 (selected 64k quality filtered samples)
Precision: bf16
Context length: up to 4096 tokens
Chat format: ChatML-style (<|im_start|>role\ncontent<|im_end|>)

This model aims to better follow human instructions and generate more natural responses than the base model.

Example Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("jonny-vr/Llama-3.1-Minitron-4B-Depth-Chat")
model = AutoModelForCausalLM.from_pretrained("jonny-vr/Llama-3.1-Minitron-4B-Depth-Chat")

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Explain how to do SFT training."}
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

output = model.generate(**inputs, max_new_tokens=128)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Downloads last month: 2

Safetensors

Model size

5B params

Tensor type

F32

Model tree for jonny-vr/Llama-3.1-Minitron-4B-Depth-Chat

Base model

nvidia/Llama-3.1-Minitron-4B-Depth-Base

Finetuned

(5)

this model

jonny-vr
/

Llama-3.1-Minitron-4B-Depth-Chat

Llama-3.1-Minitron-4B-Depth-Chat

📌 Model Details

Example Usage

Model tree for jonny-vr/Llama-3.1-Minitron-4B-Depth-Chat

Dataset used to train jonny-vr/Llama-3.1-Minitron-4B-Depth-Chat