2026-06-16 · paper-digest
Li et al.: Can LLMs Play and Beat 2D Games? - Fukai Reads GVGAI-LLM
A paper by Li et al. (NYU and others) proposing GVGAI-LLM, a benchmark that has language models play 118 2D games to measure reasoning and spatial grounding. Translating boards into ASCII maps and solving zero-shot, GPT-4o-mini scored 0% on 477 of 540 levels and a 10.27% overall win rate, falling short of classic search algorithms. I unpack it as problem, method, findings, use cases, and limitations.