-
MITS: Enhanced Tree Search Reasoning for LLMs via Pointwise Mutual Information
Paper • 2510.03632 • Published • 42 -
Alphazero-like Tree-Search can Guide Large Language Model Decoding and Training
Paper • 2309.17179 • Published • 2 -
First Finish Search: Efficient Test-Time Scaling in Large Language Models
Paper • 2505.18149 • Published • 1
Collections
Discover the best community collections!
Collections including paper arxiv:2510.03632
-
Open Data Synthesis For Deep Research
Paper • 2509.00375 • Published • 71 -
Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training
Paper • 2509.03403 • Published • 23 -
LMEnt: A Suite for Analyzing Knowledge in Language Models from Pretraining Data to Representations
Paper • 2509.03405 • Published • 24 -
SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs
Paper • 2509.00930 • Published • 5
-
Tree-Planner: Efficient Close-loop Task Planning with Large Language Models
Paper • 2310.08582 • Published • 3 -
Autonomous Tree-search Ability of Large Language Models
Paper • 2310.10686 • Published • 2 -
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
Paper • 2310.04406 • Published • 10 -
PathFinder: Guided Search over Multi-Step Reasoning Paths
Paper • 2312.05180 • Published • 10
-
Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions
Paper • 2309.10150 • Published • 25 -
A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts
Paper • 2402.09727 • Published • 38 -
MITS: Enhanced Tree Search Reasoning for LLMs via Pointwise Mutual Information
Paper • 2510.03632 • Published • 42
-
BroRL: Scaling Reinforcement Learning via Broadened Exploration
Paper • 2510.01180 • Published • 19 -
MITS: Enhanced Tree Search Reasoning for LLMs via Pointwise Mutual Information
Paper • 2510.03632 • Published • 42 -
Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation
Paper • 2509.25849 • Published • 48 -
Beyond the Exploration-Exploitation Trade-off: A Hidden State Approach for LLM Reasoning in RLVR
Paper • 2509.23808 • Published • 47
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 123 • 98 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 38 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
Ada-Instruct: Adapting Instruction Generators for Complex Reasoning
Paper • 2310.04484 • Published • 5 -
Diversity of Thought Improves Reasoning Abilities of Large Language Models
Paper • 2310.07088 • Published • 5 -
Adapting Large Language Models via Reading Comprehension
Paper • 2309.09530 • Published • 81 -
Democratizing Reasoning Ability: Tailored Learning from Large Language Model
Paper • 2310.13332 • Published • 16
-
Contrastive Decoding Improves Reasoning in Large Language Models
Paper • 2309.09117 • Published • 39 -
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models
Paper • 2310.08491 • Published • 56 -
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding
Paper • 2411.04282 • Published • 37 -
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
Paper • 2411.14432 • Published • 25
-
MITS: Enhanced Tree Search Reasoning for LLMs via Pointwise Mutual Information
Paper • 2510.03632 • Published • 42 -
Alphazero-like Tree-Search can Guide Large Language Model Decoding and Training
Paper • 2309.17179 • Published • 2 -
First Finish Search: Efficient Test-Time Scaling in Large Language Models
Paper • 2505.18149 • Published • 1
-
BroRL: Scaling Reinforcement Learning via Broadened Exploration
Paper • 2510.01180 • Published • 19 -
MITS: Enhanced Tree Search Reasoning for LLMs via Pointwise Mutual Information
Paper • 2510.03632 • Published • 42 -
Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation
Paper • 2509.25849 • Published • 48 -
Beyond the Exploration-Exploitation Trade-off: A Hidden State Approach for LLM Reasoning in RLVR
Paper • 2509.23808 • Published • 47
-
Open Data Synthesis For Deep Research
Paper • 2509.00375 • Published • 71 -
Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training
Paper • 2509.03403 • Published • 23 -
LMEnt: A Suite for Analyzing Knowledge in Language Models from Pretraining Data to Representations
Paper • 2509.03405 • Published • 24 -
SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs
Paper • 2509.00930 • Published • 5
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 123 • 98 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 38 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
Tree-Planner: Efficient Close-loop Task Planning with Large Language Models
Paper • 2310.08582 • Published • 3 -
Autonomous Tree-search Ability of Large Language Models
Paper • 2310.10686 • Published • 2 -
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
Paper • 2310.04406 • Published • 10 -
PathFinder: Guided Search over Multi-Step Reasoning Paths
Paper • 2312.05180 • Published • 10
-
Ada-Instruct: Adapting Instruction Generators for Complex Reasoning
Paper • 2310.04484 • Published • 5 -
Diversity of Thought Improves Reasoning Abilities of Large Language Models
Paper • 2310.07088 • Published • 5 -
Adapting Large Language Models via Reading Comprehension
Paper • 2309.09530 • Published • 81 -
Democratizing Reasoning Ability: Tailored Learning from Large Language Model
Paper • 2310.13332 • Published • 16
-
Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions
Paper • 2309.10150 • Published • 25 -
A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts
Paper • 2402.09727 • Published • 38 -
MITS: Enhanced Tree Search Reasoning for LLMs via Pointwise Mutual Information
Paper • 2510.03632 • Published • 42
-
Contrastive Decoding Improves Reasoning in Large Language Models
Paper • 2309.09117 • Published • 39 -
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models
Paper • 2310.08491 • Published • 56 -
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding
Paper • 2411.04282 • Published • 37 -
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
Paper • 2411.14432 • Published • 25