Instructions to use mkurman/LFM2.5-230M-SYNTH with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use mkurman/LFM2.5-230M-SYNTH with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="mkurman/LFM2.5-230M-SYNTH") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("mkurman/LFM2.5-230M-SYNTH") model = AutoModelForCausalLM.from_pretrained("mkurman/LFM2.5-230M-SYNTH") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use mkurman/LFM2.5-230M-SYNTH with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "mkurman/LFM2.5-230M-SYNTH" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mkurman/LFM2.5-230M-SYNTH", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/mkurman/LFM2.5-230M-SYNTH
- SGLang
How to use mkurman/LFM2.5-230M-SYNTH with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "mkurman/LFM2.5-230M-SYNTH" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mkurman/LFM2.5-230M-SYNTH", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "mkurman/LFM2.5-230M-SYNTH" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "mkurman/LFM2.5-230M-SYNTH", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use mkurman/LFM2.5-230M-SYNTH with Docker Model Runner:
docker model run hf.co/mkurman/LFM2.5-230M-SYNTH
LFM2.5-230M-Synth
Fine-tuned LiquidAI/LFM2.5-230M on a large synthetic reasoning dataset with preserved chain-of-thought (thinking) traces.
Model Details
| Base model | LiquidAI/LFM2.5-230M (Lfm2ForCausalLM) |
| Architecture | LFM2 — hybrid conv+attention (8 conv + 6 full attention layers, 14 total) |
| Parameters | ~229M (tied embeddings) |
| Hidden size | 1024 |
| Attention heads | 16 (8 KV heads, GQA) |
| Vocab size | 64,402 |
| Max context | 128K (trained at 2048) |
| Precision | bfloat16 |
| Model size | 457 MB (safetensors) |
Training Details
| Dataset | Synthetic reasoning mix (1.63M conversations, multi-turn with chain-of-thought) |
| Dataset size | ~6.23 GiB (Arrow) |
| Training tokens | ~2.88B (22,000 steps × effective batch 64 × seq 2048) |
| Epochs | ~0.86 (partial epoch at checkpoint-22000) |
| Effective batch size | 64 (per-device 8 × grad-accum 8) |
| Learning rate | 5e-5, cosine schedule, 2% warmup |
| Optimizer | AdamW (PyTorch fused) |
| Sequence length | 2048 |
| Hardware | NVIDIA L40 48GB |
| Precision | bf16 + torch.compile |
| Framework | HuggingFace Transformers + TRL SFTTrainer |
Training Results
| Step | Train Loss | Eval Loss |
|---|---|---|
| 500 | 2.71 | 1.8661 |
| 3,000 | 1.75 | 1.7121 |
| 5,500 | 1.69 | 1.6835 |
| 8,000 | 1.67 | 1.6687 |
| 10,500 | 1.66 | 1.6602 |
| 13,000 | 1.65 | 1.6542 |
| 15,500 | 1.65 | 1.6501 |
| 18,000 | 1.65 | 1.6478 |
| 20,500 | 1.65 | 1.6460 |
| 22,000 | 1.655 | 1.6457 |
Best eval loss: 1.6457 at step 22,000 (still improving at checkpoint time).
Loss decreased from 1.866 → 1.646 over 2.88B tokens — a 12% relative reduction with clear continued downward trend at the checkpoint boundary.
Chat Template
Uses the Liquid/LFM2 chat template with preserve_thinking=True. Reasoning traces from the dataset's reasoning_content field are mapped to the model's native thinking field before tokenization.
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("mkurman/lfm25-230m-synth", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("mkurman/lfm25-230m-synth", trust_remote_code=True, dtype="bfloat16")
messages = [{"role": "user", "content": "Explain quantum entanglement simply."}]
# preserve_thinking=True so the model generates reasoning before its answer
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
preserve_thinking=True,
)
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Intended Use
This is a research model fine-tuned on synthetic reasoning data. It is intended for:
- Experimentation with small-model reasoning capabilities
- Chain-of-thought / thinking-trace generation
- Evaluation of synthetic data quality at small scale (230M)
- On-device or edge reasoning model prototyping
Limitations
- Partial training: This checkpoint is at ~0.86 epochs (step 22,000 / 36,500 planned). The full run continues.
- Small model: 230M parameters — not suitable for production deployment without further evaluation.
- Synthetic data only: Trained exclusively on synthetic reasoning traces; may exhibit distribution biases from the data generation pipeline.
- Limited context at training: Trained at seq=2048 despite the architecture supporting 128K. Long-context behavior is untested.
Checkpoint Info
This is checkpoint-22000 from a 36,500-step training run. The checkpoint includes:
model.safetensors— 457 MB (weights only, no optimizer state)- Full tokenizer files + chat template
config.jsonwith architecture details
Citation
If you use this model, please cite the base model:
@misc{liquid_lfm2,
title={LFM2: Liquid Foundation Models},
author={Liquid AI},
year={2025},
url={https://huggingface.co/LiquidAI/LFM2.5-230M}
}
- Downloads last month
- 195