Scaling Laws for Code: Every Programming Language Matters Paper β’ 2512.13472 β’ Published Dec 15, 2025 β’ 12
Running 16 The Jagged AI Frontier is a Data Frontier π§ 16 Why AI capabilities are shaped by data availability
view article Article Saving Memory Using Padding-Free Transformer Layers during Finetuning Jun 11, 2024 β’ 21
view article Article Nemotron 3 Nano \- A new Standard for Efficient, Open, and Intelligent Agentic Models Dec 15, 2025 β’ 106
view article Article Tensor Parallelism (TP) in Transformers: 5 Minutes to Understand Dec 4, 2025 β’ 63
Fantastic Pretraining Optimizers and Where to Find Them Paper β’ 2509.02046 β’ Published Sep 2, 2025 β’ 13
Running on CPU Upgrade Featured 2.89k The Smol Training Playbook π 2.89k The secrets to building world-class LLMs
Running 77 Unlocking On-Policy Distillation for Any Model Family π 77 Apply on-policy distillation to any model family
Less is More: Recursive Reasoning with Tiny Networks Paper β’ 2510.04871 β’ Published Oct 6, 2025 β’ 505
ServiceNow-AI/Apriel-1.5-15b-Thinker Image-Text-to-Text β’ 15B β’ Updated Oct 6, 2025 β’ 401 β’ 463