arxiv:2601.16276

GameTalk: Training LLMs for Strategic Conversation

Published on Jan 22

· Submitted by

Max Ruiz Luyten on Jan 26

University of Cambridge

Upvote

Authors:

Abstract

GameTalk framework trains large language models to make strategic decisions through multi-turn dialogue by optimizing global objectives using reward signals across full conversations, outperforming untrained models in complex game scenarios.

AI-generated summary

Strategic decision-making in multi-agent settings is a key challenge for large language models (LLMs), particularly when coordination and negotiation must unfold over extended conversations. While recent work has explored the use of LLMs in isolated decision tasks, little attention has been given to optimizing long-term objectives through dialogue. We introduce GameTalk, a framework for training LLMs to make strategic decisions via multi-turn interactions. Unlike prior work that focuses on single-turn objectives or static action prediction, we train LLMs to optimize a global objective across full conversations. We achieve this by adapting fine-tuning methods like GRPO, DPO, and STaR to incorporate reward signals that depend on the entire interaction. We evaluate this approach on a suite of increasingly complex games, designed to stress different aspects of reasoning, coordination, and opponent modeling. Our results show that GameTalk significantly outperforms untrained models, especially under reward shaping, with DPO consistently yielding the strongest gains. These findings position conversational fine-tuning as a promising path for LLMs to reason, negotiate, and act in interactive environments.

View arXiv page View PDF Add to collection

Community

maxruizluyten

Paper submitter 1 day ago

YellowjacketGames

about 18 hours ago

I'm a game dev and stuff like this is exactly what I need to be reading. Glad ppl are working on this. We're looking into distilling models for things like "engine-assisted coaching" for complicated strategy games, and getting those distills working on low-end / iGpus.

Added to my personal collection of papers that'll help us figure out how to get where we need to go :) "https://huggingface.co/collections/YellowjacketGames/papers-gameplay-optimization"

maxruizluyten

about 8 hours ago

Thank you! We really appreciate you adding this to your collection.

We noticed a massive disparity: there are many frameworks for multi-agent orchestration, but very few methods to improve the models' underlying strategic reasoning. And this is critical because, in fact, most training (Instruction Tuning, RLHF, and RLVR) is devoted to static tasks that ignore the dynamics of interaction, so we should not take for granted that LLMs will be good at this out of the box.

In this work, we wanted to see if we could get models to model the opponent—inferring intent, steering behavior, and anticipating moves—and acting accordingly, which we achieved mainly through reward shaping.

I hope GameTalk serves as a good foundation for your strategy coaching agents.