openai/gsm8k
Benchmark
•
Updated
•
17.6k
•
487k
•
1.15k
A collection of reasoning tasks to benchmark model abilities
Note for each example, it adds 8 variations - english queries
Note multiple languages; small "train" split, 250 items in the "test" one
Note 23 (hard) logical/mathematical tasks
Note joint task: retain the correct information on a text and perform a couple of mathematical operations to reach the result
Note (for NeuroHike) modify it and remove the "multiple choice" style of answer