Open to Collab

Muhammad Umair

umair894

AI & ML interests

Multimodal Reidentification | Feature Upscaling | Cross-modal alignment | robust generalization | PhD UESTC

Recent Activity

liked a model about 22 hours ago

openai/privacy-filter

liked a Space 2 days ago

facebook/sapiens2-seg

upvoted a paper 3 days ago

From Skills to Talent: Organising Heterogeneous Agents as a Real-World Company

View all activity

Organizations

upvoted a paper 3 days ago

From Skills to Talent: Organising Heterogeneous Agents as a Real-World Company

Paper • 2604.22446 • Published 8 days ago • 114

upvoted a paper 4 days ago

Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering

Paper • 2604.08224 • Published 23 days ago • 51

upvoted a paper 16 days ago

ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents

Paper • 2604.11784 • Published 19 days ago • 141

upvoted a paper 18 days ago

WildDet3D: Scaling Promptable 3D Detection in the Wild

Paper • 2604.08626 • Published 23 days ago • 244

upvoted a paper 24 days ago

AURA: Always-On Understanding and Real-Time Assistance via Video Streams

Paper • 2604.04184 • Published 27 days ago • 50

upvoted a paper 25 days ago

A Simple Baseline for Streaming Video Understanding

Paper • 2604.02317 • Published 30 days ago • 73

upvoted an article 25 days ago

Article

Welcome Gemma 4: Frontier multimodal intelligence on device

30 days ago

•

884

upvoted 7 papers about 1 month ago

Look Before Acting: Enhancing Vision Foundation Representations for Vision-Language-Action Models

Paper • 2603.15618 • Published Mar 16 • 21

Make it SING: Analyzing Semantic Invariants in Classifiers

Paper • 2603.14610 • Published Mar 15 • 16

AI Can Learn Scientific Taste

Paper • 2603.14473 • Published Mar 15 • 426

upvoted 6 papers about 2 months ago

Spatial-TTT: Streaming Visual-based Spatial Intelligence with Test-Time Training

Paper • 2603.12255 • Published Mar 12 • 91

Geometry-Guided Reinforcement Learning for Multi-view Consistent 3D Scene Editing

Paper • 2603.03143 • Published Mar 3 • 146

OpenClaw-RL: Train Any Agent Simply by Talking

Paper • 2603.10165 • Published Mar 10 • 153

LoGeR: Long-Context Geometric Reconstruction with Hybrid Memory

Paper • 2603.03269 • Published Mar 3 • 63

Holi-Spatial: Evolving Video Streams into Holistic 3D Spatial Intelligence

Paper • 2603.07660 • Published Mar 8 • 86

EmbodiedSplat: Online Feed-Forward Semantic 3DGS for Open-Vocabulary 3D Scene Understanding

Paper • 2603.04254 • Published Mar 4 • 1

Muhammad Umair

AI & ML interests

Recent Activity

Organizations

umair894's activity

Welcome Gemma 4: Frontier multimodal intelligence on device