Anwar's picture

Anwar

abdoali5672

·

AI & ML interests

None yet

Recent Activity

upvoted a paper about 20 hours ago

The Y-Combinator for LLMs: Solving Long-Context Rot with λ-Calculus

upvoted a paper 3 days ago

MDM-Prime-v2: Binary Encoding and Index Shuffling Enable Compute-optimal Scaling of Diffusion Language Models

upvoted a paper 4 days ago

Large Language Model Reasoning Failures

View all activity

Organizations

None yet

upvoted a paper about 20 hours ago

The Y-Combinator for LLMs: Solving Long-Context Rot with λ-Calculus

Paper • 2603.20105 • Published 4 days ago • 28

upvoted a paper 3 days ago

MDM-Prime-v2: Binary Encoding and Index Shuffling Enable Compute-optimal Scaling of Diffusion Language Models

Paper • 2603.16077 • Published 8 days ago • 1

upvoted 2 papers 4 days ago

Large Language Model Reasoning Failures

Paper • 2602.06176 • Published Feb 5 • 12

Efficient Reasoning with Balanced Thinking

Paper • 2603.12372 • Published 12 days ago • 139

upvoted 2 papers 6 days ago

MiroThinker-1.7 & H1: Towards Heavy-Duty Research Agents via Verification

Paper • 2603.15726 • Published 8 days ago • 178

Qianfan-OCR: A Unified End-to-End Model for Document Intelligence

Paper • 2603.13398 • Published 13 days ago • 145

upvoted 4 papers 7 days ago

Supervised Fine-Tuning versus Reinforcement Learning: A Study of Post-Training Methods for Large Language Models

Paper • 2603.13985 • Published 10 days ago • 10

Mixture-of-Depths Attention

Paper • 2603.15619 • Published 8 days ago • 77

Attention Residuals

Paper • 2603.15031 • Published 8 days ago • 155

AI Can Learn Scientific Taste

Paper • 2603.14473 • Published 9 days ago • 397

upvoted 3 papers 8 days ago

Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning

Paper • 2505.03318 • Published May 6, 2025 • 94

Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities

Paper • 2505.02567 • Published May 5, 2025 • 82

Cheers: Decoupling Patch Details from Semantic Representations Enables Unified Multimodal Comprehension and Generation

Paper • 2603.12793 • Published 11 days ago • 37

upvoted 3 papers 9 days ago

HySparse: A Hybrid Sparse Attention Architecture with Oracle Token Selection and KV Cache Sharing

Paper • 2602.03560 • Published Feb 3 • 49

IndexCache: Accelerating Sparse Attention via Cross-Layer Index Reuse

Paper • 2603.12201 • Published 12 days ago • 52

Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning

Paper • 2603.04597 • Published 20 days ago • 206

upvoted 3 papers 12 days ago

Thinking to Recall: How Reasoning Unlocks Parametric Knowledge in LLMs

Paper • 2603.09906 • Published 14 days ago • 72

Omni-Diffusion: Unified Multimodal Understanding and Generation with Masked Discrete Diffusion

Paper • 2603.06577 • Published 18 days ago • 48

Flash-KMeans: Fast and Memory-Efficient Exact K-Means

Paper • 2603.09229 • Published 15 days ago • 79

upvoted an article 14 days ago

Article

Mixture of Experts Explained

+4

Dec 11, 2023

•

1.1k