Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning
Abstract
Multi-Agent Test-Time Reinforcement Learning (MATTRL) enhances multi-agent reasoning through structured textual experience injection and consensus-based decision making at inference time.
Multi-agent systems have evolved into practical LLM-driven collaborators for many applications, gaining robustness from diversity and cross-checking. However, multi-agent RL (MARL) training is resource-intensive and unstable: co-adapting teammates induce non-stationarity, and rewards are often sparse and high-variance. Therefore, we introduce Multi-Agent Test-Time Reinforcement Learning (MATTRL), a framework that injects structured textual experience into multi-agent deliberation at inference time. MATTRL forms a multi-expert team of specialists for multi-turn discussions, retrieves and integrates test-time experiences, and reaches consensus for final decision-making. We also study credit assignment for constructing a turn-level experience pool, then reinjecting it into the dialogue. Across challenging benchmarks in medicine, math, and education, MATTRL improves accuracy by an average of 3.67\% over a multi-agent baseline, and by 8.67\% over comparable single-agent baselines. Ablation studies examine different credit-assignment schemes and provide a detailed comparison of how they affect training outcomes. MATTRL offers a stable, effective and efficient path to distribution-shift-robust multi-agent reasoning without tuning.
Community
Collaborative Multi-Agent Test-Time Reinforcement Learning for Reasoning
Excellent work!
nice~
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- PRISMA: Reinforcement Learning Guided Two-Stage Policy Optimization in Multi-Agent Architecture for Open-Domain Multi-Hop Question Answering (2026)
- DRAFT-RL: Multi-Agent Chain-of-Draft Reasoning for Reinforcement Learning-Enhanced LLMs (2025)
- StackPlanner: A Centralized Hierarchical Multi-Agent System with Task-Experience Memory Management (2026)
- Multi-Agent Collaborative Reward Design for Enhancing Reasoning in Reinforcement Learning (2025)
- Beyond Monolithic Architectures: A Multi-Agent Search and Knowledge Optimization Framework for Agentic Search (2026)
- Reinforcement Learning-Augmented LLM Agents for Collaborative Decision Making and Performance Optimization (2025)
- FutureWeaver: Planning Test-Time Compute for Multi-Agent Systems with Modularized Collaboration (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper