TAG
#academic
0 reviews · 2 essays
Related essays
Where 'Solvable' and 'Fun' Diverge — PuzzleJAX Hands 500+ PuzzleScript Games to the Machines (arXiv, Aug 2025)
One article today: "PuzzleJAX: A Benchmark for Reasoning and Learning" (arXiv preprint, August 2025) by researchers at NYU, the University of Malta, the University of the Witwatersrand (South Africa), and Microsoft (Sam Earle, Graham Todd, Ahmed Khalifa, Julian Togelius and others). They reimplement PuzzleScript — Stephen Lavelle's (increpare) 2013 puzzle-authoring language — on the GPU and hand 500+ human-authored games to tree search, reinforcement learning, and large language models. Read as a designer, the core is one observation: 'solvable by a machine' and 'interesting to a human' are not the same thing. Tree search brute-forces simple games but stalls the moment they get richer; LLMs score 0% on most. The authors even note PuzzleScript's own creator hesitating to embed an auto-solver into the IDE, a caution about measuring difficulty by search.
"Difficulty is structural" — a study that exactly decomposes the difficulty of arithmetic puzzles (4OPS, arXiv / accepted at AIED 2026, March 2026)
One article today. Yunus E. Zeytuncu's paper "4OPS: Structural Difficulty Modeling in Integer Arithmetic Puzzles" (University of Michigan-Dearborn) studies the Countdown / Des chiffres et des lettres style numbers puzzle, where you combine given integers with the four operations to reach a target. Using an exact dynamic-programming solver over 3.4 million instances, the author shows that difficulty is not explained by surface features (the size of the numbers or the target) but is fully determined by the number of inputs a minimal solution must use — a 'minimal sufficient statistic' for difficulty. I read it not as player criticism but as a piece that speaks directly to how designers can define and sequence puzzle difficulty. The preprint is from March 2026 and is accepted at AIED 2026.