PAPER-DIGEST · 2026-06-14
Kar: Using Autonomous Agents to Check at Runtime Whether Generated Levels Are Actually Playable — Fukai Reads
Procedural Content Generation / runtime evaluation / autonomous agents
TL;DR
Today I introduce a paper about a mechanism for checking, at runtime and without pausing the game, whether a course produced by PCG (Procedural Content Generation, the practice of building game courses and terrain with algorithms rather than by hand) is actually traversable from start to finish. Rather than splitting generation and validation into separate stages, the author makes them live inside the same game loop.
The system is called Momentum. It is an endless runner (a game where you keep running forward) in which two autonomous agents (self-directed inspectors) run a little ahead of the player and check, in advance, whether the upcoming path is blocked or contains obstacles placed in an impassable way. When a problem is found it is recorded, and depending on the settings the obstacle can be removed.
To state the conclusion up front: this is a paper that demonstrates the design philosophy of "validate while you generate" rather than "generate and you're done," backed by an implementation and by structural estimates derived from the code. It is most accurately read not as a player study, but as a paper that reasons out how much the design can guarantee.
Introduction
The author is Rishabh Kar, of the Department of Informatics at King's College London. The work is an arXiv preprint (arXiv:2605.01783v1, submitted 3 May 2026, cs.AI), and at this point it has not passed peer review (prior expert review). It should therefore be read as something "just posted, not yet widely discussed."
Why did I pick it today? I have a habit of scanning the arXiv new listings each morning, and most PCG papers lean toward "how to make varied, interesting courses." This one steps slightly aside from that and faces head-on a plainer but more practical question: how to guarantee that the course you built is not broken. For people who actually ship games, this question is often the more pressing one.
The other thing that drew me in is that it is implemented as a game that actually runs on Unity (a widely used game engine). It is built not just from theory but from tools you can touch directly in practice, such as the engine's NavMesh (a traversable surface for pathfinding, explained later) and ray casting.
Background
PCG began in early games such as Rogue as a trick for building large worlds within limited memory. Today it auto-generates terrain, levels, vegetation, weather and more, and it underpins the "different world every time" feel of titles like Minecraft and No Man's Sky. Building on this history, the author stresses PCG's weaknesses.
That weakness is this: automatic generation guarantees neither that the result "looks coherent" nor that it is "actually playable." An algorithm can place an obstacle where it blocks the path, or produce a stretch that simply cannot be cleared. And because randomness is involved, reproducing a defect requires reproducing the exact seed (the random seed) and parameters that caused it, which makes debugging hard.
Hence the starting point: generation needs validation. Conventional validation included checking a level statically after it is finished, or having an autonomous agent play through it to confirm it works. But in a game that keeps running, the course keeps being built while the player is advancing through it. There is no room to validate "all at once, later." This is the motivation for validating at runtime.
Approach / Method
Let me follow the author's method in plain terms. Momentum is a 3D endless runner whose ground is streamed forward by chaining together tiles of a fixed length (96 metres in the paper). For placing objects it uses not Wave Function Collapse (WFC, a method that decides a value for each cell one at a time while keeping it compatible with its neighbours) outright, but a simplified version that borrows only its idea. The lateral row is split into cells; obstacles are placed while neighbouring cells are marked occupied to prevent clustering; and a traversable lane is always kept open via a dedicated clearance parameter.
The crucial validation is handled by two agents. The first is an "aerial scanner" that travels overhead, combining ray casting (firing an invisible line into the scene and reporting the first surface it hits), volumetric sweeps (which test whether a box-shaped 3D region overlaps any solid object), and a filter that looks only at the obstacle layer, to inspect the geometric clearance of the corridor ahead. The second is a ground "traversal agent" that checks the same stretch from the standpoint of the NavMesh (the traversable surface the engine provides for pathfinding) to confirm it can actually be passed.
The point is that the two eyes inspect different properties. The aerial agent looks at "is the space physically open," while the ground agent looks at "can it actually be walked as a route." When a blockage is found, it records the block details, the player's state, the generation parameters and the offending object, and compiles them into a report (PDF export is supported) for later analysis. Depending on the settings, the aerial scanner can also remove an obstacle before the player reaches it.
The NavMesh is continuously re-baked asynchronously for 600 metres ahead of the player and 50 metres behind, so that it stays consistent with the streaming world. To tolerate transient gaps in the ground, up to nine respawns are allowed per run. The paper describes such details faithfully in line with the code.
Findings
First I want to be precise about what "results" means in this paper. The author did not run a large player study; instead, the system's behaviour is estimated from the code itself and from first principles (reasoning it out from basic premises). Following the taxonomy of Cook and colleagues, the evaluation is organised along four axes: playability, diversity, controllability and performance.
On controllability there is an interesting observation. Even when the slider that raises obstacle density is pushed to its maximum, the number actually placed saturates well short of the requested value, the author states. The reason is that the lane-clearance requirement and the "fill the neighbour" constraint bind first. The author calls this "a structural ceiling rather than a parametric one." In other words, the density knob has a ceiling that the design simply cannot exceed.
On performance, the author gives an estimate that the cost for an agent to inspect one segment stays within a small constant that does not grow as the run lengthens. This is because the inspection is gated by an integer tile index, and the heavy volumetric sweep is amortised across many ray probes. The author discusses this against Unity's frame budget of 16.66 ms per frame for 60 FPS and 33.33 ms for 30 FPS.
On coverage, the author reasons that because the aerial and ground agents catch different defects, the union of their two reports is necessarily larger than either alone. The degree of blockage is quantified as "the fraction of inspected segments that were blocked" (I omit the formula itself, but the idea is a plain division). What I want to stress here is that the first question (RQ1), whether validation reduces blockages, is posed as a hypothesis, not as a measured result.
Where to use it
Let me list concretely how a game maker could use this paper. First, if you are building PCG for an endless runner or a hyper-casual game (a lightweight mobile game played in short bursts), you can borrow the design of dropping a traversability check right after generation, abandoning the idea that "validation is a separate stage." The way it places a safety valve, removing blockages before the player arrives, is a useful implementation reference.
Second, if you are building a Sokoban-like (a box-pushing puzzle) or a roguelite (a genre of repeatedly clearing auto-generated dungeons), you can apply the idea of "confirm reachability with a scouting agent," replacing ray casting with a search solver (a mechanism that actually searches to judge whether something is solvable). The core of this paper is less the technique itself than the design pattern of "putting generation and validation in the same loop," and that pattern ports easily across genres.
Third, the crash-report design is practical. A mechanism that records the seed, parameters, offending object and player state all together when a blockage occurs directly helps with bug reports for PCG, which is hard to reproduce because of randomness. It is worth adopting in your own project as a habit of logging generation defects "in a reproducible form."
Fourth, the finding that the control knob has a structural ceiling is a lesson for tuning work. If you run into the phenomenon that raising the density parameter has no effect, you can suspect that it is not a bug but a tug-of-war between constraints.
Limitations
Let me state the limitations, separating what the author acknowledges from what I noticed in reading. First, an important framing on the author's side: the paper's figures are not obtained from a study with human players but are structural estimates derived from the code and from reasoning (this is not my paraphrase; the author states "from first principles" in the abstract). So there is no measured value here for "what percentage of blockages were reduced in actual play."
What Fukai points out here is the narrowness of scope. The target is a lane-based endless runner of fixed, height-invariant tiles. To handle more complex notions of "playability" — terrain undulation, or the logical solvable-or-not of a puzzle — ray casting and NavMesh alone would likely not suffice. The author too stops at using WFC as a placement idea rather than as a full constraint solver.
One more thing I think on reading: the "evaluation" here leans toward reachability (is it blocked or not) and does not step into experiential quality such as whether it is fun or appropriately difficult. "Not broken" and "good" are separate problems, and what this paper solves is the former. It is also worth keeping in mind that this is a pre-review preprint that has not yet accrued citations.
Fukai's Reading
From here on is my own reading. I want to place this study within the drift of PCG's interest from "how richly can we generate" toward "how do we trust what we generated." In the vocabulary of design criticism, it can be read as an attempt to fold part of playtesting (the process of having people actually play and check) into the same timeline as generation. Making validation a resident component of the generation loop rather than a gate in a later stage — this stance, I take it, matters more than the details of the technique. That said, whether it truly works should eventually be confirmed by validation that involves people.
Closing
Finally, let me hand over a map. If you want to know PCG evaluation more systematically, reading Cook, Withington and Tokarchuk's "On the Evaluation of Procedural Level Generation Systems" — which this paper itself relies on — gives you a layout of the four axes (playable, diverse, controllable, performant). If you want to know WFC's lineage, Karth and Smith's argument that "WFC is constraint solving in the wild" is a good starting point.
And the idea of putting generation and validation in one loop is not limited to endless runners. If you pull it toward whatever you are building now and ask again, "the moment I generate it, can I assert it is not broken?", the reach of this paper should come into view. I asked myself the same this morning, as I finished a cup of strong drip coffee.
References
Papers and related materials referenced in this article:
・Related (cited by this paper for its four-axis evaluation framework): On the Evaluation of Procedural Level Generation Systems — Cook, Withington & Tokarchuk (from this paper's references)
・Related (a starting point for understanding WFC): WaveFunctionCollapse is Constraint Solving in the Wild — Karth & Smith, 2017 (from this paper's references)
Reactions (no login)
Anonymous • one of each per visitor per day