DuoMem: Towards Capable On-Device Memory Agents via Dual-Space Distillation
Abstract
DuoMem is a dual-space distillation framework that transfers procedural problem-solving from large language models to compact student models through context-space and parameter-space distillation, achieving high performance with minimal additional parameters and improved inference speed.
Large Language Model (LLM)-based agents can solve complex procedural tasks by interacting with environments over multiple turns, but this ability typically depends on large models, long contexts, and repeated inference calls. This makes advanced memory-augmented agents difficult to deploy on resource-constrained devices. We introduce DuoMem, a dual-space distillation framework that transfers procedural problem-solving ability from a large teacher model to compact student models. DuoMem distils in two complementary spaces: (1)context-space distillation, which replaces student-generated memories with higher-quality teacher-generated procedural memories prepended to the student's input, and (2)parameter-space distillation, which fine-tunes lightweight LoRA adapters on successful teacher trajectories. Evaluated on ALFWorld, a challenging embodied decision-making benchmark, DuoMem boosts a 4B-parameter model from 4.3% to 77.9% task success rate, closing most of the gap to a 72B teacher model (87.1%), while adding fewer than 10M trainable parameters and only a few megabytes of pre-computed teacher memories. Moreover, the DuoMem-enhanced 4B model completes tasks over 3x faster than the 72B teacher in wall-clock time, making it viable for real-time edge deployment, which would be challenging for the teacher.Extensive ablations across eight models spanning 2B-72B parameters reveal that both distillation axes contribute complementary
Community
We are excited to release DuoMem, a dual-space distillation framework for on-device AI agents!
- DuoMem combines context-space and parameter-space distillation to transfer knowledge from a large teacher model into a small on-device model.
- It boosts 4B model success rate from 4.3% to 77.9% on ALFWorld, achieving near-parity with a 72B teacher model (87.1%) with 3× faster inference.
- Minimal overhead with <10M trainable parameters, making it suitable for edge deployment.
Get this paper in your agent:
hf papers read 2606.29961 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper