Isla Verde
What it is testing
Isla Verde is a small interactive world built to probe a specific question: what happens to a game when the NPCs are not running scripted dialogue trees, but a language model that has been told who they are, what they know, and what they are trying to hide? You walk around a top-down map, interview characters, examine objects, and have to piece together what happened in a mystery whose suspects all have their own agendas.
The framing is a detective game because detective games are exactly the place this question gets interesting. Every conversation has stakes: you are not just chatting with an NPC, you are trying to extract a fact from a character who has a reason to lie. Whether a game-shaped LLM can hold up under that pressure is the experiment.
Bartering and persuading the model
The thing that makes the project interesting to play, not just to watch, is that the LLMs do not just answer your questions. Each NPC has a private secret and a cooperation level that moves as you talk to them. Approach the suspect with bare accusations and they shut down. Bring up something they care about (their daughter, the broadcast tape, the chemical plant) and they soften. Confront them with a piece of evidence you found in someone else's testimony and they may flip the entire scenario open.
That dynamic is the design centre of the project. You are bartering for information against a model that is trying to protect its character's secret, persuading it the way you would persuade a person, and watching it respond in kind. The richer the persuasion, the more cooperative the model gets. The flimsier the pretext, the more it stonewalls. It is the closest I have seen a generative NPC come to feeling like a real social interaction inside a game.
Holding the model accountable
The risk with a free-form LLM detective is that the player can talk a model into agreeing with any plausible-sounding accusation. That kills the genre. Isla Verde solves this by separating writing from judging. The LLM writes every line of dialogue; a structured case file (suspects, motives, evidence) judges the final accusation against the scenario's hidden truth. The model can be persuaded to say a lot of things, but it cannot decide who actually did it. That is determined by which evidence the player surfaced.
The split matters more than it sounds. It is what lets the world feel alive (the model is free to improvise) without letting the player win by simply being a good talker. Underneath the conversations is a deterministic logic problem; on top is an LLM-driven cast.
Building worlds and simulations this way
The longer-term reason I built this is that it points at a much larger design space than detective games. Once you accept that an NPC's behaviour can be specified as "a personality, a knowledge boundary, a memory, and a private agenda," you can build all kinds of worlds the same way. Town simulations where every shopkeeper has commercial interests they are trying to advance. Strategy games whose advisors will push you toward decisions that suit their own factions. Negotiation training tools whose counterparties have real reservation prices. Therapeutic role-play scenarios where a patient figure has a history they are reluctant to disclose.
The Phaser + Python wiring underneath is small enough that the actual creative work is in the scenario brief, which is a few hundred words of structured JSON. A new scenario takes me an evening. A new game genre, in principle, takes about the same time once the brief format is fixed. The bottleneck moves from engineering to writing, which is exactly the trade I want.
What I want to push on next
The piece I most want to improve is the planner inside each NPC. Right now characters answer turn-by-turn from a static brief, which means they react well but do not proactively pursue their goals. The next version gives them explicit goals (protect Caldwell, find out what the journalist knows, get the player to leave town) and a small per-scene action budget, so they can choose to lie, deflect, or change the subject in service of those goals rather than only respond to the question in front of them. That moves the project from "NPCs with good dialogue" toward something closer to a generative-agent setup, where the persuasion problem gets meaningfully harder because the cast is playing its own game in parallel with yours.