Submitted by Niels Rogge 5 VidEoMT: Your ViT is Secretly Also a Video Segmentation Model Mobile Perception Systems Lab 35 2