🧠 reasoning datasets - a saracandu Collection

saracandu 's Collections

⛳️ geometry of reasoning

🧠 reasoning datasets

🔄 STLdec - XAI

🔁 STLdec @ ECML-PKDD 2025

🧠 reasoning datasets

updated Dec 15, 2025

A collection of reasoning tasks to benchmark model abilities

openai/gsm8k

Benchmark • Updated Dec 20, 2025 • 17.6k • 487k • 1.15k
qintongli/GSM-Plus

Viewer • Updated Jul 7, 2024 • 13k • 1.72k • 17

Note for each example, it adds 8 variations - english queries
juletxara/mgsm

Viewer • Updated Oct 9, 2025 • 2.84k • 8.39k • 40

Note multiple languages; small "train" split, 250 items in the "test" one
maveriq/bigbenchhard

Viewer • Updated Sep 29, 2023 • 6.51k • 1.19k • 38

Note 23 (hard) logical/mathematical tasks
ucinlp/drop

Viewer • Updated Jan 17, 2024 • 86.9k • 5.92k • 66

Note joint task: retain the correct information on a text and perform a couple of mathematical operations to reach the result
deepmind/aqua_rat

Viewer • Updated Jan 9, 2024 • 196k • 3.4k • 72

Note (for NeuroHike) modify it and remove the "multiple choice" style of answer
HuggingFaceH4/MATH-500

Viewer • Updated Dec 15, 2025 • 500 • 106k • 285
yale-nlp/FOLIO

Viewer • Updated Dec 21, 2023 • 1.2k • 948 • 60
saracandu/implications

Viewer • Updated Dec 10, 2025 • 19.9k • 4
HuggingFaceH4/aime_2024

Viewer • Updated Jan 26, 2025 • 30 • 39.6k • 59
opencompass/AIME2025

Viewer • Updated Feb 25, 2025 • 30 • 10.6k • 50
MathArena/hmmt_feb_2025

Viewer • Updated May 14, 2025 • 30 • 3.86k • 7