trl internal testing

company

Activity Feed Request to join this org

AI & ML interests

Internal testing artifact mangement for trl library

Recent Activity

qgallouedec updated a model 4 days ago

trl-internal-testing/tiny-Gemma4ForConditionalGeneration

qgallouedec published a model 4 days ago

trl-internal-testing/tiny-Gemma4ForConditionalGeneration

qgallouedec new activity 12 days ago

trl-internal-testing/tiny-DbrxForCausalLM:This was causing test failures due to stricter config typing

View all activity

qgallouedec

updated a model 4 days ago

trl-internal-testing/tiny-Gemma4ForConditionalGeneration

Image-Text-to-Text • 13.9M • Updated 4 days ago • 1.38k

qgallouedec

published a model 4 days ago

trl-internal-testing/tiny-Gemma4ForConditionalGeneration

Image-Text-to-Text • 13.9M • Updated 4 days ago • 1.38k

sergiopaniego

posted an update 5 days ago

Post

2581

Gemma 4 💎 is here and it’s strong!

to celebrate, we’re rolling out in TRL:

> support for multimodal tool responses for environments (OpenEnv)
> an example to train it in CARLA for autonomous driving with image-based tool calls

go check it out 🏎️🏎️

blog: https://huggingface.co/blog/gemma4
script: https://github.com/huggingface/trl/blob/main/examples/scripts/openenv/carla_vlm_gemma.py

sergiopaniego

posted an update 7 days ago

Post

1894

TRL is officially an adult 🥳

excited to announce TRL v1.0❗️

head to the blog to see how we got here and what’s next for this post-training library, designed to keep pace with the field

https://huggingface.co/blog/trl-v1

2 replies

qgallouedec

posted an update 7 days ago

Post

2130

TRL v1.0 is out!

Hugging Face's TRL library is downloaded 3 million times a month. Over 130k models trained with it are public on the Hub, and major projects like @unsloth and @axolotl-ai-co build directly on top of it. v1.0 is the moment we acknowledged that responsibility explicitly, with a real stability contract.

The field hasn't settled. Building stable software in a domain that keeps invalidating its own assumptions is the actual problem we're solving. The answer is a design that can absorb the next shift without breaking what people rely on.

What's in v1.0:
Deep Hugging Face integration, low infrastructure burden
What's next: asynchronous GRPO, better scaling support, and making training legible enough that agents can inspect and steer it.

pip install --upgrade trl

This was causing test failures due to stricter config typing

#2 opened 12 days ago by

tomaarsen

sergiopaniego

updated a model 20 days ago

trl-internal-testing/tiny-NemotronHForCausalLM

Text Generation • 4.22M • Updated 20 days ago • 3.49k

qgallouedec

in trl-internal-testing/tiny-Llama4ForCausalLM 21 days ago

attn_temperature_tuning should be bool

#1 opened 22 days ago by

BenjaminB

sergiopaniego

posted an update 25 days ago

Post

721

ICYMI, great blog by @kashif and @stas on Ulysses Sequence Parallelism: train with million-token contexts

on 4×H100s: 12x longer sequences, 3.7x throughput

learn how to integrate it with Accelerate, Transformers, and TRL ⤵️
https://huggingface.co/blog/ulysses-sp

sergiopaniego

posted an update 26 days ago

Post

412

We just released a big blog surveying 16 OSS frameworks for async RL training of LLMs!

We're building a new async GRPO trainer for TRL and as first step, we needed to understand how the ecosystem solves this problem today.

The problem: in synchronous RL training, generation dominates wall-clock time. 32K-token rollouts on a 32B model take hours while training GPUs sit completely idle. With reasoning models and agentic RL making rollouts longer and more variable, this only gets worse.

The ecosystem converged on the same fix: separate inference + training onto different GPU pools, rollout buffer, and async weight sync.

We compared 16 frameworks across 7 axes: orchestration, buffer design, weight sync, staleness management, partial rollouts, LoRA, and MoE support.

This survey is step one. The async GRPO trainer for TRL is next!

https://huggingface.co/blog/async-rl-training-landscape

sergiopaniego

published a model 27 days ago

trl-internal-testing/tiny-NemotronHForCausalLM

Text Generation • 4.22M • Updated 20 days ago • 3.49k

sergiopaniego

posted an update 27 days ago

Post

391

Nemotron 3 Super by @nvidia is here! NVIDIA's hybrid Mamba2/Transformer models are now natively supported in transformers (no trust_remote_code needed)

Fine-tune them with TRL in just a few lines of code. Notebook + script included to get started right away. goooo!

- Notebook: https://colab.research.google.com/github/huggingface/trl/blob/main/examples/notebooks/sft_nemotron_3.ipynb
- Script: https://github.com/huggingface/trl/blob/main/examples/scripts/sft_nemotron_3.py
- Collection with all the models: https://huggingface.co/collections/nvidia/nvidia-nemotron-v3

sergiopaniego

posted an update about 1 month ago

Post

623

did you know you can train agentic models with RL deploying the environments on HF Spaces? 🤗

with TRL + OpenEnv, your training script connects to remote environments hosted as Spaces

want to train faster? → just add more Spaces (TRL handles the parallelization natively)

we used this to train a model to solve the trolley problem in CARLA. 2 HF Spaces running a full driving simulator, each on a T4 GPU

full write-up with code and results → https://huggingface.co/blog/sergiopaniego/bringing-carla-to-openenv-trl

qgallouedec

updated a model about 1 month ago

trl-internal-testing/tiny-Qwen3_5ForConditionalGeneration

Image-Text-to-Text • 4.64M • Updated Mar 2 • 225k

qgallouedec

published a model about 1 month ago

trl-internal-testing/tiny-Qwen3_5ForConditionalGeneration

Image-Text-to-Text • 4.64M • Updated Mar 2 • 225k

sergiopaniego

posted an update about 1 month ago

Post

479

Qwen3.5 dense (smol 🤏) models just dropped

- natively multimodal
- 0.8B · 2B · 4B · 9B (+ base variants)
- 262K context extensible to 1M
- built-in thinking

fine-tune them with TRL out of the box → SFT, GRPO, DPO and more!

examples: https://huggingface.co/docs/trl/example_overview
collection: https://huggingface.co/collections/Qwen/qwen35

sergiopaniego

posted an update about 1 month ago

Post

2470

What happens when you make an LLM drive a car where physics are real and actions can't be undone?

I ported CARLA, the autonomous driving simulator, to OpenEnv and added training support via TRL + Hugging Face Spaces.

The model interacts with the simulator through tool calls (observe, brake, change lane) and learns from a reward signal.

In 50 training steps, Qwen 0.6B learns to swerve and brake to avoid pedestrians in emergency situations.

The project supports text and vision (VLMs can see through a camera sensor), open-world driving with traffic, and multiple driving scenarios.

This builds on the carla-env project by sinatras, which originally placed LLMs inside CARLA for evaluation. We extended it with vision, new scenarios, rubric-based rewards, and made it trainable end-to-end.

Blog: https://huggingface.co/blog/sergiopaniego/bringing-carla-to-openenv-trl/
CARLA env in OpenEnv: https://github.com/meta-pytorch/OpenEnv/tree/main/envs/carla_env
Training script: https://github.com/huggingface/trl/blob/main/examples/scripts/openenv/carla.py

albertvillanova

posted an update about 1 month ago

Post

2267

🚀 TRL v0.29.0 introduces trl-training: an agent-native training skill.

This makes the TRL CLI a structured, agent-readable capability, allowing AI agents to reliably execute training workflows such as:
- Supervised Fine-Tuning (SFT)
- Direct Preference Optimization (DPO)
- Group Relative Policy Optimization (GRPO)

We’re excited to see what the community builds on top of this.

If you’re working on AI agents, alignment research, or scalable RL training infrastructure: give TRL v0.29.0 a try! 🤗

The future of ML tooling is agent-native.
🔗 https://github.com/huggingface/trl/releases/tag/v0.29.0