TAG
#research
0 reviews · 10 essays
Related essays
Li et al.: AutoBG, an AI that supports board game design end-to-end from ideation to finish — Fukai Reads
A paper (arXiv preprint) by Zizhen Li et al. on AutoBG, a board game design assistant that covers the whole workflow—ideation, rulebook generation, and individualized feedback—via Verifier-Gated Iteration that splits the generator from the critic; the critic, BG-Critic, is reported to outperform GPT-5.4 on diagnostic quality.
Nasir et al.: Evolving the Rules of Play Themselves — Fukai Reads MORTAR
A paper on automatic game design by Nasir, Togelius and colleagues. Instead of levels, MORTAR evolves game mechanics themselves using a quality-diversity algorithm paired with a large language model, judging quality by whether stronger AI agents reliably beat weaker ones. Running on GPT-4o-mini, it generates diverse, playable games and even quantifies each mechanic's contribution.
Jiang et al.: Can a Sentence Build a Playable Game? — Fukai Reads OpenGame
A paper by Yilei Jiang et al. (CUHK) on OpenGame, an agent that generates whole 2D web games from natural language. Reusable skeletons and a 'living debug protocol' curb integration errors, setting a new state of the art across 150 tasks - though puzzles remained its weakest genre.
McConnell & Zhao: Generating Just-Right Puzzles in Real Time with a Genetic Algorithm — Fukai Reads
A paper by McConnell and Zhao on adaptive puzzle generation using a genetic algorithm. It generates Cosmic Express-style path puzzles in real time (about 7 seconds each) to match a player model built from how the player solves, and shows in an 18-person study that a time-only version lags on felt difficulty and sense of progression.
Li et al.: Can LLMs Play and Beat 2D Games? - Fukai Reads GVGAI-LLM
A paper by Li et al. (NYU and others) proposing GVGAI-LLM, a benchmark that has language models play 118 2D games to measure reasoning and spatial grounding. Translating boards into ASCII maps and solving zero-shot, GPT-4o-mini scored 0% on 477 of 540 levels and a 10.27% overall win rate, falling short of classic search algorithms. I unpack it as problem, method, findings, use cases, and limitations.
Kar: Using Autonomous Agents to Check at Runtime Whether Generated Levels Are Actually Playable — Fukai Reads
A PCG (procedural content generation) paper by Rishabh Kar of King's College London. It proposes Momentum, a mechanism that validates whether a generated course is actually traversable inside the same runtime loop, without pausing the game. Two autonomous agents run ahead of the player and inspect the path via geometric checks from the air and NavMesh checks on the ground. The evaluation is presented as structural estimates derived from the code.
Xu et al.: Promoting Game Mechanics to Coordinates to Generate Solvable Levels — Fukai Reads
A PCG (level generation) paper by Xu and Verbrugge of McGill University. Against geometry-first prior methods, it proposes HDPCG, which runs pathfinding on a dimensional-expanded graph that promotes mechanics such as gravity inversion and moving platforms to a coordinate, guaranteeing solvability during generation, and reproduces playable levels in Unity.
Sun et al.: Why Do Players Lose Themselves in Punishingly Hard Games? — Fukai Reads
A paper by Sun et al. on difficulty design in Soulslike games. Through a qualitative analysis of 600 Steam reviews it asks why players immerse themselves in punishingly hard games, and proposes 'resilient flow' — absorption sustained by meaningfully framing frustration.
Feng et al.: Can AI Generate Counter-Intuitive Chess Puzzles? — Fukai Reads
A study, led by a Google DeepMind team, on generating creative chess puzzles with AI. A generative model trained on Lichess data is tuned with reinforcement learning, raising the rate of counter-intuitive puzzles from 0.22% to 2.5% (about tenfold). The highlight is how they reduce creativity to numbers a machine can measure.
Can AI Build a Whole Puzzle Game? ScriptDoctor and Its Generate-Playtest-Repair Loop
ScriptDoctor has a large language model write an entire puzzle game — rules, sprites, levels — then lets a compiler and a search-based agent inspect the result and demand revisions. The testbed is PuzzleScript, a language indie developers know well. I walk through the paper in five parts — problem, method, findings, where you can use it, limitations — covering why human-authored examples boost success rates, why reasoning models win, and the distance between 'solvable' and 'fun'.