view article Article Continuous batching from first principles +1 ror, ArthurZ, mcpotato • Nov 25, 2025 • 378
view article Article Llasa Goes RL: Training LLaSA with GRPO for Improved Prosody and Expressiveness Steveeeeeeen • Nov 5, 2025 • 12
view article Article Tensor Parallelism (TP) in Transformers: 5 Minutes to Understand qgallouedec • Dec 4, 2025 • 69
view article Article KV Caching Explained: Optimizing Transformer Inference Efficiency not-lain • Jan 30, 2025 • 326
view article Article We Got Claude to Build CUDA Kernels and teach open models! +2 burtenshaw, evalstate, merve, pcuenq • Jan 28 • 156
view article Article Scaling Real-Time Voice Agents with Cache-Aware Streaming ASR nvidia • Jan 5 • 86
An Anatomy of Vision-Language-Action Models: From Modules to Milestones and Challenges Paper • 2512.11362 • Published Dec 12, 2025 • 23
view article Article Tiny Agents in Python: a MCP-powered agent in ~70 lines of code +2 celinah, julien-c, Wauplin, evalstate • May 23, 2025 • 172
view article Article Introducing Trackio: A Lightweight Experiment Tracking Library from Hugging Face +3 abidlabs, znation, nouamanetazi, sasha, qgallouedec • Jul 29, 2025 • 223
view article Article SmolLM3: smol, multilingual, long-context reasoner +21 eliebak, cmpatino, anton-l, edbeeching, m-ric, nouamanetazi, akseljoonas, guipenedo, hynky, clefourrier, SaylorTwift, kashif, qgallouedec, hlarcher, glutamatt, Xenova, reach-vb, ngxson, craffel, lewtun, loubnabnl, lvwerra, thomwolf • Jul 8, 2025 • 773