Llama-3.1-Minitron-4B-Depth-Chat
This is a supervised fine-tuned (SFT) chat model based onnvidia/Llama-3.1-Minitron-4B-Depth-Base.
It was trained to improve instruction following and conversational ability, using a standard chat format compatible with the Transformers chat API. The finetuning was conducted on a H100 GPU and took around 1h.
π Model Details
- Base model:
nvidia/Llama-3.1-Minitron-4B-Depth-Base - Fine-tuning method: Supervised Fine-Tuning (TRL)
- Trainer:
trl.SFTTrainer - Dataset: OpenHermes-2.5 (selected 64k quality filtered samples)
- Precision: bf16
- Context length: up to 4096 tokens
- Chat format: ChatML-style (
<|im_start|>role\ncontent<|im_end|>)
This model aims to better follow human instructions and generate more natural responses than the base model.
Example Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("jonny-vr/Llama-3.1-Minitron-4B-Depth-Chat")
model = AutoModelForCausalLM.from_pretrained("jonny-vr/Llama-3.1-Minitron-4B-Depth-Chat")
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain how to do SFT training."}
]
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
output = model.generate(**inputs, max_new_tokens=128)
print(tokenizer.decode(output[0], skip_special_tokens=True))
- Downloads last month
- 2
Model tree for jonny-vr/Llama-3.1-Minitron-4B-Depth-Chat
Base model
nvidia/Llama-3.1-Minitron-4B-Depth-Base