Papers
arxiv:2601.16276

GameTalk: Training LLMs for Strategic Conversation

Published on Jan 22
· Submitted by
Max Ruiz Luyten
on Jan 26
Authors:
,
,

Abstract

GameTalk framework trains large language models to make strategic decisions through multi-turn dialogue by optimizing global objectives using reward signals across full conversations, outperforming untrained models in complex game scenarios.

AI-generated summary

Strategic decision-making in multi-agent settings is a key challenge for large language models (LLMs), particularly when coordination and negotiation must unfold over extended conversations. While recent work has explored the use of LLMs in isolated decision tasks, little attention has been given to optimizing long-term objectives through dialogue. We introduce GameTalk, a framework for training LLMs to make strategic decisions via multi-turn interactions. Unlike prior work that focuses on single-turn objectives or static action prediction, we train LLMs to optimize a global objective across full conversations. We achieve this by adapting fine-tuning methods like GRPO, DPO, and STaR to incorporate reward signals that depend on the entire interaction. We evaluate this approach on a suite of increasingly complex games, designed to stress different aspects of reasoning, coordination, and opponent modeling. Our results show that GameTalk significantly outperforms untrained models, especially under reward shaping, with DPO consistently yielding the strongest gains. These findings position conversational fine-tuning as a promising path for LLMs to reason, negotiate, and act in interactive environments.

Community

Paper submitter

Strategic decision-making in multi-agent settings is a key challenge for large language models (LLMs), particularly when coordination and negotiation must unfold over extended conversations. While recent work has explored the use of LLMs in isolated decision tasks, little attention has been given to optimizing long-term objectives through dialogue. We introduce GameTalk, a framework for training LLMs to make strategic decisions via multi-turn interactions. Unlike prior work that focuses on single-turn objectives or static action prediction, we train LLMs to optimize a global objective across full conversations. We achieve this by adapting fine-tuning methods like GRPO, DPO, and STaR to incorporate reward signals that depend on the entire interaction. We evaluate this approach on a suite of increasingly complex games, designed to stress different aspects of reasoning, coordination, and opponent modeling. Our results show that GameTalk significantly outperforms untrained models, especially under reward shaping, with DPO consistently yielding the strongest gains. These findings position conversational fine-tuning as a promising path for LLMs to reason, negotiate, and act in interactive environments.

I'm a game dev and stuff like this is exactly what I need to be reading. Glad ppl are working on this. We're looking into distilling models for things like "engine-assisted coaching" for complicated strategy games, and getting those distills working on low-end / iGpus.

Added to my personal collection of papers that'll help us figure out how to get where we need to go :) "https://huggingface.co/collections/YellowjacketGames/papers-gameplay-optimization"

·

Thank you! We really appreciate you adding this to your collection.

We noticed a massive disparity: there are many frameworks for multi-agent orchestration, but very few methods to improve the models' underlying strategic reasoning. And this is critical because, in fact, most training (Instruction Tuning, RLHF, and RLVR) is devoted to static tasks that ignore the dynamics of interaction, so we should not take for granted that LLMs will be good at this out of the box.

In this work, we wanted to see if we could get models to model the opponent—inferring intent, steering behavior, and anticipating moves—and acting accordingly, which we achieved mainly through reward shaping.

I hope GameTalk serves as a good foundation for your strategy coaching agents.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2601.16276 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2601.16276 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2601.16276 in a Space README.md to link it from this page.

Collections including this paper 2