Ettin Checkpoints

This repository contains the raw training checkpoints for the mmBERT models. Each model contains three subfolders for decay, ext, and pretrain.

These files work with Composer and contain all state needed to resume pre-training. Please see the ModernBERT repository for usage details.

🔗 Related Resources

Models: mmBERT Model Suite
Phase 1: Pre-training Data (2.3T tokens)
Phase 2: Mid-training Data (600B tokens)
Phase 3: Decay Phase Data (100B tokens)
Paper: Arxiv link
Code: GitHub Repository

Citation

@misc{marone2025mmbertmodernmultilingualencoder,
      title={mmBERT: A Modern Multilingual Encoder with Annealed Language Learning}, 
      author={Marc Marone and Orion Weller and William Fleshman and Eugene Yang and Dawn Lawrie and Benjamin Van Durme},
      year={2025},
      eprint={2509.06888},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2509.06888}, 
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including jhu-clsp/mmBERT-checkpoints

mmBERT: a modern multilingual encoder

Collection

mmBERT is trained on 3T tokens from over 1800 languages, showing SoTA scores on benchmarks and exceptional low-resource performance • 16 items • Updated Sep 9, 2025 • 51

Paper for jhu-clsp/mmBERT-checkpoints

mmBERT: A Modern Multilingual Encoder with Annealed Language Learning

Paper • 2509.06888 • Published Sep 8, 2025 • 12