Papers
arxiv:2502.08946

The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding

Published on Feb 13
· Submitted by
AK
on Feb 14
#1 Paper of the day
Authors:
Mo Yu ,
,
,

Abstract

A study investigates whether large language models understand physical concepts through a grid-based task, showing that they significantly underperform compared to humans and highlighting the stochastic parrot phenomenon.

AI-generated summary

In a systematic way, we investigate a widely asked question: Do LLMs really understand what they say?, which relates to the more familiar term Stochastic Parrot. To this end, we propose a summative assessment over a carefully designed physical concept understanding task, PhysiCo. Our task alleviates the memorization issue via the usage of grid-format inputs that abstractly describe physical phenomena. The grids represents varying levels of understanding, from the core phenomenon, application examples to analogies to other abstract patterns in the grid world. A comprehensive study on our task demonstrates: (1) state-of-the-art LLMs, including GPT-4o, o1 and Gemini 2.0 flash thinking, lag behind humans by ~40%; (2) the stochastic parrot phenomenon is present in LLMs, as they fail on our grid task but can describe and recognize the same concepts well in natural language; (3) our task challenges the LLMs due to intrinsic difficulties rather than the unfamiliar grid format, as in-context learning and fine-tuning on same formatted data added little to their performance.

Community

arXiv lens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/the-stochastic-parrot-on-llm-s-shoulder-a-summative-assessment-of-physical-concept-understanding-9087-4a7a2379

  • Executive Summary
  • Detailed Breakdown
  • Practical Applications

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2502.08946 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 1

Collections including this paper 13