-
OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation
Paper • 2601.15369 • Published • 21 -
Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model
Paper • 2601.15892 • Published • 53 -
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders
Paper • 2601.16208 • Published • 55 -
NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems
Paper • 2601.11004 • Published • 30
Collections
Discover the best community collections!
Collections including paper arxiv:2603.01562
-
RubricBench: Aligning Model-Generated Rubrics with Human Standards
Paper • 2603.01562 • Published • 63 -
T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning
Paper • 2603.03790 • Published • 121 -
SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents
Paper • 2505.20411 • Published • 96 -
SWE-rebench V2: Language-Agnostic SWE Task Collection at Scale
Paper • 2602.23866 • Published • 88
-
SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise
Paper • 2602.12783 • Published • 170 -
MobilityBench: A Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios
Paper • 2602.22638 • Published • 107 -
CAR-bench: Evaluating the Consistency and Limit-Awareness of LLM Agents under Real-World Uncertainty
Paper • 2601.22027 • Published • 85 -
ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development
Paper • 2601.11077 • Published • 67
-
Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning
Paper • 2407.20798 • Published • 24 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 38 -
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 104 -
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution
Paper • 2502.18449 • Published • 75
-
Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis
Paper • 2505.13227 • Published • 45 -
facebook/natural_reasoning
Viewer • Updated • 1.15M • 1.48k • 554 -
nvidia/OpenMathReasoning
Viewer • Updated • 5.68M • 17.5k • 451 -
Search Arena: Analyzing Search-Augmented LLMs
Paper • 2506.05334 • Published • 18
-
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper • 2310.11453 • Published • 106 -
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Paper • 2310.11511 • Published • 79 -
In-Context Learning Creates Task Vectors
Paper • 2310.15916 • Published • 43 -
Matryoshka Diffusion Models
Paper • 2310.15111 • Published • 45
-
OpenVision 3: A Family of Unified Visual Encoder for Both Understanding and Generation
Paper • 2601.15369 • Published • 21 -
Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model
Paper • 2601.15892 • Published • 53 -
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders
Paper • 2601.16208 • Published • 55 -
NAACL: Noise-AwAre Verbal Confidence Calibration for LLMs in RAG Systems
Paper • 2601.11004 • Published • 30
-
Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis
Paper • 2505.13227 • Published • 45 -
facebook/natural_reasoning
Viewer • Updated • 1.15M • 1.48k • 554 -
nvidia/OpenMathReasoning
Viewer • Updated • 5.68M • 17.5k • 451 -
Search Arena: Analyzing Search-Augmented LLMs
Paper • 2506.05334 • Published • 18
-
RubricBench: Aligning Model-Generated Rubrics with Human Standards
Paper • 2603.01562 • Published • 63 -
T2S-Bench & Structure-of-Thought: Benchmarking and Prompting Comprehensive Text-to-Structure Reasoning
Paper • 2603.03790 • Published • 121 -
SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents
Paper • 2505.20411 • Published • 96 -
SWE-rebench V2: Language-Agnostic SWE Task Collection at Scale
Paper • 2602.23866 • Published • 88
-
SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise
Paper • 2602.12783 • Published • 170 -
MobilityBench: A Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios
Paper • 2602.22638 • Published • 107 -
CAR-bench: Evaluating the Consistency and Limit-Awareness of LLM Agents under Real-World Uncertainty
Paper • 2601.22027 • Published • 85 -
ABC-Bench: Benchmarking Agentic Backend Coding in Real-World Development
Paper • 2601.11077 • Published • 67
-
Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning
Paper • 2407.20798 • Published • 24 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 38 -
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 104 -
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution
Paper • 2502.18449 • Published • 75
-
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper • 2310.11453 • Published • 106 -
Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection
Paper • 2310.11511 • Published • 79 -
In-Context Learning Creates Task Vectors
Paper • 2310.15916 • Published • 43 -
Matryoshka Diffusion Models
Paper • 2310.15111 • Published • 45