TAG
#procedural-generation
0 reviews · 10 essays
Related essays
Where 'Solvable' and 'Fun' Diverge — PuzzleJAX Hands 500+ PuzzleScript Games to the Machines (arXiv, Aug 2025)
One article today: "PuzzleJAX: A Benchmark for Reasoning and Learning" (arXiv preprint, August 2025) by researchers at NYU, the University of Malta, the University of the Witwatersrand (South Africa), and Microsoft (Sam Earle, Graham Todd, Ahmed Khalifa, Julian Togelius and others). They reimplement PuzzleScript — Stephen Lavelle's (increpare) 2013 puzzle-authoring language — on the GPU and hand 500+ human-authored games to tree search, reinforcement learning, and large language models. Read as a designer, the core is one observation: 'solvable by a machine' and 'interesting to a human' are not the same thing. Tree search brute-forces simple games but stalls the moment they get richer; LLMs score 0% on most. The authors even note PuzzleScript's own creator hesitating to embed an auto-solver into the IDE, a caution about measuring difficulty by search.
Nasir et al.: Evolving the Rules of Play Themselves — Fukai Reads MORTAR
A paper on automatic game design by Nasir, Togelius and colleagues. Instead of levels, MORTAR evolves game mechanics themselves using a quality-diversity algorithm paired with a large language model, judging quality by whether stronger AI agents reliably beat weaker ones. Running on GPT-4o-mini, it generates diverse, playable games and even quantifies each mechanic's contribution.
Jiang et al.: Can a Sentence Build a Playable Game? — Fukai Reads OpenGame
A paper by Yilei Jiang et al. (CUHK) on OpenGame, an agent that generates whole 2D web games from natural language. Reusable skeletons and a 'living debug protocol' curb integration errors, setting a new state of the art across 150 tasks - though puzzles remained its weakest genre.
McConnell & Zhao: Generating Just-Right Puzzles in Real Time with a Genetic Algorithm — Fukai Reads
A paper by McConnell and Zhao on adaptive puzzle generation using a genetic algorithm. It generates Cosmic Express-style path puzzles in real time (about 7 seconds each) to match a player model built from how the player solves, and shows in an 18-person study that a time-only version lags on felt difficulty and sense of progression.
Li et al.: Can LLMs Play and Beat 2D Games? - Fukai Reads GVGAI-LLM
A paper by Li et al. (NYU and others) proposing GVGAI-LLM, a benchmark that has language models play 118 2D games to measure reasoning and spatial grounding. Translating boards into ASCII maps and solving zero-shot, GPT-4o-mini scored 0% on 477 of 540 levels and a 10.27% overall win rate, falling short of classic search algorithms. I unpack it as problem, method, findings, use cases, and limitations.
Kar: Using Autonomous Agents to Check at Runtime Whether Generated Levels Are Actually Playable — Fukai Reads
A PCG (procedural content generation) paper by Rishabh Kar of King's College London. It proposes Momentum, a mechanism that validates whether a generated course is actually traversable inside the same runtime loop, without pausing the game. Two autonomous agents run ahead of the player and inspect the path via geometric checks from the air and NavMesh checks on the ground. The evaluation is presented as structural estimates derived from the code.
Xu et al.: Promoting Game Mechanics to Coordinates to Generate Solvable Levels — Fukai Reads
A PCG (level generation) paper by Xu and Verbrugge of McGill University. Against geometry-first prior methods, it proposes HDPCG, which runs pathfinding on a dimensional-expanded graph that promotes mechanics such as gravity inversion and moving platforms to a coordinate, guaranteeing solvability during generation, and reproduces playable levels in Unity.
Feng et al.: Can AI Generate Counter-Intuitive Chess Puzzles? — Fukai Reads
A study, led by a Google DeepMind team, on generating creative chess puzzles with AI. A generative model trained on Lichess data is tuned with reinforcement learning, raising the rate of counter-intuitive puzzles from 0.22% to 2.5% (about tenfold). The highlight is how they reduce creativity to numbers a machine can measure.
Can AI Build a Whole Puzzle Game? ScriptDoctor and Its Generate-Playtest-Repair Loop
ScriptDoctor has a large language model write an entire puzzle game — rules, sprites, levels — then lets a compiler and a search-based agent inspect the result and demand revisions. The testbed is PuzzleScript, a language indie developers know well. I walk through the paper in five parts — problem, method, findings, where you can use it, limitations — covering why human-authored examples boost success rates, why reasoning models win, and the distance between 'solvable' and 'fun'.
How to make 'just-right' difficulty — letting a machine fit it to the player (a Canadian study) vs. a human authoring it through meaning (a US developer)
A version rebuilt with credible sources only. Two pieces today, both answering 'how do you deliver just-right difficulty?' from opposite directions. The first is a research paper by Canadian researchers Matthew McConnell and Richard Zhao (September 2025, arXiv): a system that generates puzzles in real time with a genetic algorithm and auto-tunes difficulty per player, validated in a user study. Its key finding: using 'time-on-task' alone as the adaptivity metric fails. The second is an interview with game designer Michael Hicks (Game Developer): churning out hard, time-consuming puzzles is easy; the truly hard part is finding interesting ideas to explore. A machine fitting difficulty to the player, and a human authoring difficulty through meaning. Both sources are peer-reviewed research and professional media - the kind makers can cite with confidence.