Multimodal Benchmarking IR

university

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

ychenNLP authored a paper 15 days ago

Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation

zhangysk authored a paper 22 days ago

VeRA: Verified Reasoning Data Augmentation at Scale

zhangysk authored a paper 22 days ago

Search More, Think Less: Rethinking Long-Horizon Agentic Search for Efficiency and Generalization

View all activity

authored a paper 15 days ago

Nemotron-Cascade 2: Post-Training LLMs with Cascade RL and Multi-Domain On-Policy Distillation

Paper • 2603.19220 • Published 18 days ago • 66

authored 4 papers 22 days ago

VeRA: Verified Reasoning Data Augmentation at Scale

Paper • 2602.13217 • Published Jan 23

Search More, Think Less: Rethinking Long-Horizon Agentic Search for Efficiency and Generalization

Paper • 2602.22675 • Published Feb 26 • 23

\$OneMillion-Bench: How Far are Language Agents from Human Experts?

Paper • 2603.07980 • Published 29 days ago • 27

Understanding by Reconstruction: Reversing the Software Development Process for LLM Pretraining

Paper • 2603.11103 • Published 27 days ago • 8

authored a paper about 1 month ago

Visual-Aware CoT: Achieving High-Fidelity Visual Consistency in Unified Models

Paper • 2512.19686 • Published Dec 22, 2025

authored a paper about 2 months ago

BABE: Biology Arena BEnchmark

Paper • 2602.05857 • Published Feb 5 • 10

authored a paper about 2 months ago

Context Forcing: Consistent Autoregressive Video Generation with Long Context

Paper • 2602.06028 • Published Feb 5 • 36

authored 2 papers about 2 months ago

Context Forcing: Consistent Autoregressive Video Generation with Long Context

Paper • 2602.06028 • Published Feb 5 • 36

Retrieval-Infused Reasoning Sandbox: A Benchmark for Decoupling Retrieval and Reasoning Capabilities

Paper • 2601.21937 • Published Jan 29 • 19

authored a paper 2 months ago

ConceptMoE: Adaptive Token-to-Concept Compression for Implicit Compute Allocation

Paper • 2601.21420 • Published Jan 29 • 42

authored 6 papers 3 months ago

NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents

Paper • 2512.12730 • Published Dec 14, 2025 • 51

AutoMV: An Automatic Multi-Agent System for Music Video Generation

Paper • 2512.12196 • Published Dec 13, 2025 • 7

Encyclo-K: Evaluating LLMs with Dynamically Composed Knowledge Statements

Paper • 2512.24867 • Published Dec 31, 2025 • 1

Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space

Paper • 2512.24617 • Published Dec 31, 2025 • 65

AInsteinBench: Benchmarking Coding Agents on Scientific Repositories

Paper • 2512.21373 • Published Dec 24, 2025

The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning

Paper • 2601.06002 • Published Jan 9 • 58

authored a paper 4 months ago

Nemotron-Cascade: Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models

Paper • 2512.13607 • Published Dec 15, 2025 • 38

submitted a paper to Daily Papers 4 months ago

NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents

Paper • 2512.12730 • Published Dec 14, 2025 • 51

authored a paper 4 months ago

From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence

Paper • 2511.18538 • Published Nov 23, 2025 • 303