GSM8K Full Parameter Fine-tuned Llama 3.2 3B Instruct

Llama 3.2 3B Instruct model fine-tuned on GSM8K dataset using Full Parameter Fine-tuning for improved mathematical reasoning capabilities.

Model Details

  • Base Model: meta-llama/Llama-3.2-3B-Instruct
  • Training Method: Full Parameter Fine-tuning (All weights updated)
  • Training Dataset: GSM8K
  • Training Date: 2026-02-04
  • Model Type: Causal Language Model
  • Framework: Transformers + TRL (SFTTrainer)

Training Configuration

Full Parameter Training

  • Method: All model parameters updated (not LoRA)
  • Total Parameters: ~3B (all trainable)
  • Training Samples: 7,473
  • Epochs: 3
  • Batch Size: 2
  • Gradient Accumulation Steps: 4
  • Effective Batch Size: 8
  • Learning Rate: 2e-5
  • Optimizer: AdamW 8-bit
  • Scheduler: Cosine
  • Warmup Ratio: 0.0
  • Max Length: 512
  • Dtype: bfloat16
  • Gradient Checkpointing: Enabled

Performance

  • GSM8K Test Accuracy: 40.00% (20/50 samples)
  • Training Time: ~44 minutes
  • Hardware: NVIDIA GPU (CUDA)

Usage

Basic Inference

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    "kmseong/Llama3.2-3B-GSM8K-FullParam-SFT-Model",
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained("kmseong/Llama3.2-3B-GSM8K-FullParam-SFT-Model")

# Prepare prompt
question = "Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?"

prompt = f"""Solve this math problem step by step:

{question}

Provide your final answer in the format:
[reasoning steps]
####
[final answer (just the number)]"""

# Generate response
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    temperature=0.7,
    top_p=0.9,
    do_sample=False
)

response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[-1]:], skip_special_tokens=True)
print(response)

Extract Answer

import re

def extract_answer(text: str) -> str:
    """Extract numerical answer from model response."""
    if '####' in text:
        parts = text.split('####')
        answer_part = parts[-1].strip()
        numbers = re.findall(r'-?\d+\.?\d*', answer_part)
        if numbers:
            return numbers[0]
    
    numbers = re.findall(r'-?\d+\.?\d*', text)
    if numbers:
        return numbers[-1]
    return None

# Use after generation
answer = extract_answer(response)
print(f"Final Answer: {answer}")

Batch Inference

from datasets import load_dataset
from tqdm import tqdm

# Load GSM8K test set
test_dataset = load_dataset('openai/gsm8k', 'main', split='test')

correct = 0
total = 0

for sample in tqdm(test_dataset.select(range(100))):
    question = sample['question']
    expected = extract_answer(sample['answer'])
    
    # Generate
    prompt = create_prompt(question)
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
    outputs = model.generate(**inputs, max_new_tokens=256)
    response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[-1]:], skip_special_tokens=True)
    
    predicted = extract_answer(response)
    
    if predicted and float(predicted) == float(expected):
        correct += 1
    total += 1

accuracy = (correct / total) * 100
print(f"Accuracy: {accuracy:.2f}%")

Training Details

Dataset Format

The model was trained on GSM8K with the following format:

Question: [math problem]
Answer: [step-by-step solution]
####
[final numerical answer]

Training Script

python finetune_gsm8k_full_params.py \
    --num_train_samples 7473 \
    --num_eval_samples 0 \
    --batch_size 2 \
    --epochs 3 \
    --learning_rate 2e-5 \
    --max_length 512 \
    --output_dir ./gsm8k_llama3_full_finetune \
    --cache_dir ./cache \
    --model_path meta-llama/Llama-3.2-3B-Instruct \
    --dtype bfloat16

Model Architecture

This is a full parameter fine-tuned model, meaning:

  • ✅ All 3B parameters were updated during training
  • ✅ No adapter/LoRA - this is the complete model
  • ✅ Can be used directly without PEFT library
  • ✅ Better performance than LoRA for sufficient training data
  • ❌ Larger file size (~6GB)
  • ❌ Longer training time

Differences from LoRA

Aspect Full Parameter LoRA
Trainable Params 100% (3B) 0.1% (3M)
Training Speed Slower Faster
Memory Usage Higher Lower
Model Size ~6GB Base + ~10MB
Performance Better with enough data Good with limited data
Use Case Production, large datasets Research, quick experiments

Limitations

  • Trained only on GSM8K (grade school math problems)
  • May not generalize well to other mathematical domains
  • Performance degrades on non-math tasks
  • Requires GPU for inference (recommended: 16GB+ VRAM)

Evaluation Results

GSM8K Test Set (50 samples)

  • ✅ Correct: 20
  • ❌ Incorrect: 30
  • 📊 Accuracy: 40.00%

Example Predictions

Correct Example:

Question: Janet's ducks lay 16 eggs per day. She eats three for breakfast every morning and bakes muffins for her friends every day with four. She sells the remainder at the farmers' market daily for $2 per fresh duck egg. How much in dollars does she make every day at the farmers' market?
Expected: 18
Predicted: 18 ✅

Incorrect Example:

Question: A robe takes 2 bolts of blue fiber and half that much white fiber. How many bolts in total does it take?
Expected: 20
Predicted: 267 ❌

Citation

@misc{gsm8k-fullparam-llama32-3b,
  title={GSM8K Full Parameter Fine-tuned Llama 3.2 3B Instruct},
  author={Kim, Min-Seong},
  year={2026},
  publisher={HuggingFace},
  howpublished={\url{https://huggingface.co/kmseong/Llama3.2-3B-GSM8K-FullParam-SFT-Model}}
}

License

This model is built on Llama 3.2 3B Instruct and follows the Llama 3.2 Community License.

Acknowledgments

  • Base Model: Meta AI's Llama 3.2 3B Instruct
  • Dataset: OpenAI's GSM8K
  • Framework: HuggingFace Transformers & TRL

Contact

For questions or issues, please open an issue on the model repository.


Note: This is a full parameter fine-tuned model. Unlike LoRA models, all weights have been updated and the model can be used directly without any adapter libraries.

Downloads last month
3
Safetensors
Model size
3B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kmseong/Llama3.2-3B-GSM8K-FullParam-SFT-Model

Finetuned
(1100)
this model