Align-TI
Collection
This is the official set of weights for the paper “Beyond Next-Token Alignment: Distilling Multimodal Large Language Models via Token Interactions.” • 4 items • Updated
• 1
This repository contains a model distilled using Align-TI, a novel knowledge distillation (KD) framework introduced in the paper Beyond Next-Token Alignment: Distilling Multimodal Large Language Models via Token Interactions.
Align-TI is designed to compress Multimodal Large Language Models (MLLMs) by focusing on dynamic token interactions rather than just static next-token alignment. It introduces two primary components:
The framework achieves state-of-the-art performance for parameter-efficient MLLMs, with the 2B version even outperforming significantly larger models like LLaVA-1.5-7B.
If you find this work useful, please cite the paper:
@article{chen2026alignti,
title={Beyond Next-Token Alignment: Distilling Multimodal Large Language Models via Token Interactions},
author={Lin Chen and Xiaoke Zhao and Kun Ding and Weiwei Feng and Changtao Miao and Zili Wang and Wenxuan Guo and Ying Wang and Kaiyuan Zheng and Bo Zhang and Zhe Li and Shiming Xiang},
journal={arXiv preprint arXiv:2602.09483},
year={2026},
}
Base model
Qwen/Qwen2.5-0.5B