arxiv:2606.29961

DuoMem: Towards Capable On-Device Memory Agents via Dual-Space Distillation

Published on Jun 29

· Submitted by

Authors:

Abstract

DuoMem is a dual-space distillation framework that transfers procedural problem-solving from large language models to compact student models through context-space and parameter-space distillation, achieving high performance with minimal additional parameters and improved inference speed.

Generated by Qwen/Qwen2.5-Coder-32B-Instruct

Large Language Model (LLM)-based agents can solve complex procedural tasks by interacting with environments over multiple turns, but this ability typically depends on large models, long contexts, and repeated inference calls. This makes advanced memory-augmented agents difficult to deploy on resource-constrained devices. We introduce DuoMem, a dual-space distillation framework that transfers procedural problem-solving ability from a large teacher model to compact student models. DuoMem distils in two complementary spaces: (1)context-space distillation, which replaces student-generated memories with higher-quality teacher-generated procedural memories prepended to the student's input, and (2)parameter-space distillation, which fine-tunes lightweight LoRA adapters on successful teacher trajectories. Evaluated on ALFWorld, a challenging embodied decision-making benchmark, DuoMem boosts a 4B-parameter model from 4.3% to 77.9% task success rate, closing most of the gap to a 72B teacher model (87.1%), while adding fewer than 10M trainable parameters and only a few megabytes of pre-computed teacher memories. Moreover, the DuoMem-enhanced 4B model completes tasks over 3x faster than the 72B teacher in wall-clock time, making it viable for real-time edge deployment, which would be challenging for the teacher.Extensive ablations across eight models spanning 2B-72B parameters reveal that both distillation axes contribute complementary

View arXiv page View PDF Add to collection

Community

obohdal

Paper submitter about 14 hours ago

We are excited to release DuoMem, a dual-space distillation framework for on-device AI agents!

DuoMem combines context-space and parameter-space distillation to transfer knowledge from a large teacher model into a small on-device model.
It boosts 4B model success rate from 4.3% to 77.9% on ALFWorld, achieving near-parity with a 72B teacher model (87.1%) with 3× faster inference.
Minimal overhead with <10M trainable parameters, making it suitable for edge deployment.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2606.29961

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2606.29961 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.29961 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.29961 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.