OpenTinker: Separating Concerns in Agentic Reinforcement Learning
Abstract
OpenTinker provides a modular infrastructure for reinforcement learning of large language model agents with separated components and managed execution runtime.
We introduce OpenTinker, an infrastructure for reinforcement learning (RL) of large language model (LLM) agents built around a separation of concerns across algorithm design, execution, and agent-environment interaction. Rather than relying on monolithic, end-to-end RL pipelines, OpenTinker decomposes agentic learning systems into lightweight, composable components with clearly defined abstraction boundaries. Users specify agents, environments, and interaction protocols, while inference and training are delegated to a managed execution runtime. OpenTinker introduces a centralized scheduler for managing training and inference workloads, including LoRA-based and full-parameter RL, supervised fine-tuning, and inference, over shared resources. We further discuss design principles for extending OpenTinker to multi-agent training. Finally, we present a set of RL use cases that demonstrate the effectiveness of the framework in practical agentic learning scenarios.
Community
๐ Introducing OpenTinker
๐ A scalable RL infrastructure for LLM agents that separates what you build (agents + environments) from how it runs (training + inference)!
๐งฉ Composable RL-as-a-Service
No more monolithic RL pipelines. OpenTinker decomposes agentic learning into lightweight, modular components with clean abstraction boundaries. Plug in new agents, environments, and interaction protocols with minimal friction.
โ๏ธ Unified Runtime for Training + Inference
A centralized scheduler manages shared compute across workloads like RL (LoRA / full-parameter), SFT, and high-throughput inference. Built for multi-tenant scaling and real-world iteration speed.
๐ค Multi-Agent Ready by Design
OpenTinker supports coordinator-driven multi-agent interaction. Each agent can optimize independently while coordination emerges through environment dynamics. This keeps MARL scalable, flexible, and system-friendly.
๐ Links:
๐ Paper (arXiv): https://arxiv.org/pdf/2601.07376
๐ป GitHub: https://github.com/open-tinker/OpenTinker
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- RollArt: Scaling Agentic RL Training via Disaggregated Infrastructure (2025)
- RLLaVA: An RL-central Framework for Language and Vision Assistants (2025)
- INTELLECT-3: Technical Report (2025)
- Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning (2025)
- SkyRL-Agent: Efficient RL Training for Multi-turn LLM Agent (2025)
- CoDA: A Context-Decoupled Hierarchical Agent with Reinforcement Learning (2025)
- Scaling Agentic Reinforcement Learning for Tool-Integrated Reasoning in VLMs (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper