TAG
#llm
0 reviews · 5 essays
Related essays
Li et al.: AutoBG, an AI that supports board game design end-to-end from ideation to finish — Fukai Reads
A paper (arXiv preprint) by Zizhen Li et al. on AutoBG, a board game design assistant that covers the whole workflow—ideation, rulebook generation, and individualized feedback—via Verifier-Gated Iteration that splits the generator from the critic; the critic, BG-Critic, is reported to outperform GPT-5.4 on diagnostic quality.
Nasir et al.: Evolving the Rules of Play Themselves — Fukai Reads MORTAR
A paper on automatic game design by Nasir, Togelius and colleagues. Instead of levels, MORTAR evolves game mechanics themselves using a quality-diversity algorithm paired with a large language model, judging quality by whether stronger AI agents reliably beat weaker ones. Running on GPT-4o-mini, it generates diverse, playable games and even quantifies each mechanic's contribution.
Jiang et al.: Can a Sentence Build a Playable Game? — Fukai Reads OpenGame
A paper by Yilei Jiang et al. (CUHK) on OpenGame, an agent that generates whole 2D web games from natural language. Reusable skeletons and a 'living debug protocol' curb integration errors, setting a new state of the art across 150 tasks - though puzzles remained its weakest genre.
Li et al.: Can LLMs Play and Beat 2D Games? - Fukai Reads GVGAI-LLM
A paper by Li et al. (NYU and others) proposing GVGAI-LLM, a benchmark that has language models play 118 2D games to measure reasoning and spatial grounding. Translating boards into ASCII maps and solving zero-shot, GPT-4o-mini scored 0% on 477 of 540 levels and a 10.27% overall win rate, falling short of classic search algorithms. I unpack it as problem, method, findings, use cases, and limitations.
Can AI Build a Whole Puzzle Game? ScriptDoctor and Its Generate-Playtest-Repair Loop
ScriptDoctor has a large language model write an entire puzzle game — rules, sprites, levels — then lets a compiler and a search-based agent inspect the result and demand revisions. The testbed is PuzzleScript, a language indie developers know well. I walk through the paper in five parts — problem, method, findings, where you can use it, limitations — covering why human-authored examples boost success rates, why reasoning models win, and the distance between 'solvable' and 'fun'.