LFM2.5-230M-Synth

Fine-tuned LiquidAI/LFM2.5-230M on a large synthetic reasoning dataset with preserved chain-of-thought (thinking) traces.

Model Details

Base model LiquidAI/LFM2.5-230M (Lfm2ForCausalLM)
Architecture LFM2 — hybrid conv+attention (8 conv + 6 full attention layers, 14 total)
Parameters ~229M (tied embeddings)
Hidden size 1024
Attention heads 16 (8 KV heads, GQA)
Vocab size 64,402
Max context 128K (trained at 2048)
Precision bfloat16
Model size 457 MB (safetensors)

Training Details

Dataset Synthetic reasoning mix (1.63M conversations, multi-turn with chain-of-thought)
Dataset size ~6.23 GiB (Arrow)
Training tokens ~2.88B (22,000 steps × effective batch 64 × seq 2048)
Epochs ~0.86 (partial epoch at checkpoint-22000)
Effective batch size 64 (per-device 8 × grad-accum 8)
Learning rate 5e-5, cosine schedule, 2% warmup
Optimizer AdamW (PyTorch fused)
Sequence length 2048
Hardware NVIDIA L40 48GB
Precision bf16 + torch.compile
Framework HuggingFace Transformers + TRL SFTTrainer

Training Results

Step Train Loss Eval Loss
500 2.71 1.8661
3,000 1.75 1.7121
5,500 1.69 1.6835
8,000 1.67 1.6687
10,500 1.66 1.6602
13,000 1.65 1.6542
15,500 1.65 1.6501
18,000 1.65 1.6478
20,500 1.65 1.6460
22,000 1.655 1.6457

Best eval loss: 1.6457 at step 22,000 (still improving at checkpoint time).

Loss decreased from 1.866 → 1.646 over 2.88B tokens — a 12% relative reduction with clear continued downward trend at the checkpoint boundary.

Chat Template

Uses the Liquid/LFM2 chat template with preserve_thinking=True. Reasoning traces from the dataset's reasoning_content field are mapped to the model's native thinking field before tokenization.

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("mkurman/lfm25-230m-synth", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("mkurman/lfm25-230m-synth", trust_remote_code=True, dtype="bfloat16")

messages = [{"role": "user", "content": "Explain quantum entanglement simply."}]

# preserve_thinking=True so the model generates reasoning before its answer
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    preserve_thinking=True,
)
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Intended Use

This is a research model fine-tuned on synthetic reasoning data. It is intended for:

  • Experimentation with small-model reasoning capabilities
  • Chain-of-thought / thinking-trace generation
  • Evaluation of synthetic data quality at small scale (230M)
  • On-device or edge reasoning model prototyping

Limitations

  • Partial training: This checkpoint is at ~0.86 epochs (step 22,000 / 36,500 planned). The full run continues.
  • Small model: 230M parameters — not suitable for production deployment without further evaluation.
  • Synthetic data only: Trained exclusively on synthetic reasoning traces; may exhibit distribution biases from the data generation pipeline.
  • Limited context at training: Trained at seq=2048 despite the architecture supporting 128K. Long-context behavior is untested.

Checkpoint Info

This is checkpoint-22000 from a 36,500-step training run. The checkpoint includes:

  • model.safetensors — 457 MB (weights only, no optimizer state)
  • Full tokenizer files + chat template
  • config.json with architecture details

Citation

If you use this model, please cite the base model:

@misc{liquid_lfm2,
  title={LFM2: Liquid Foundation Models},
  author={Liquid AI},
  year={2025},
  url={https://huggingface.co/LiquidAI/LFM2.5-230M}
}
Downloads last month
195
Safetensors
Model size
0.2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mkurman/LFM2.5-230M-SYNTH

Finetuned
(15)
this model

Datasets used to train mkurman/LFM2.5-230M-SYNTH