I finished Tactical Breach Wizards. 1. It’s a lovely game with a fun setting, sardonic dialogue, and excellently paced progression. Its gameplay scratches my XCOM itch: you have up to 5 characters moving through a small grid and using their abilities to deal with enemies à la Chimera Squad but with a great variety and more targeted puzzles. Also quite similar to Into the Breach though again I find the curated levels more rewarding than the procedurally generated stuff.
So you might have the grizzled Zan prepare a “Predictive Shot” which then Jan pops around the board and uses her chain shot ability to push an enemy into Zan’s path to be vaporized and then pushes another enemy out the window (racking up another defenestration). Those missions are puzzles in disguise: to make it even more fun you have sub-objectives which keep you from cheesing the scenarios.
As I played I thought about how one would write a program to solve the puzzles and comparing that to my own cognition. The obvious brute force approach would be to do the full tree search trying every single sequence of actions. This could probably work: there is a pretty big branching factor: each character can move maybe 25 tiles and take one action from a couple choices at maybe half a dozen enemies. But with most of the missions being fairly straightforward the depth is quite limited. Indeed the game naturally suggests this tree exploration: they provide an undo button you can use liberally which effectively pops you up a node in the tree.
We could of course be slightly smarter in our exploration budget. This reminds me of Antithesis’s series of playing NES games using their testing framework. They use a hypervisor to turn playing the game into a tree search: they can record and restart the program from any point and provide new inputs. Their testing framework then provides an exploration strategy for what states to continue exploring from. With ever smarter exploration strategies they can beat an impressive series of games.
You could imagine a similar approach here setting up heuristics to explore the tree. Q-learning is a way to pick paths down the tree that are the most promising. You use reinforcement learning to estimate the potential reward from a state: in this case whether we solve the puzzle. Then when exploring the tree you prioritize moves which have higher values of Q.
The problem with Q-learning is that formulating the reward is tricky. It would work for one level / or maybe one level type. But then when faced with a different objective all your past learning won’t help. Your best bet is maybe some transfer learning but you’d still need to retrain on this new objective 2. More damningly you also have to relearn each time you get a new or upgraded ability.
As I play the game I find myself approaching the game from a different angle. Instead of starting at the top of the tree I look for intermediate states: often ones suggested by the sub-objectives: “I need to be here to switch this terminal but can’t be shot by this guy but there is a nice window he could be defenestrated from but only Jan has sufficient pushback to do it.” I’d label this the strategy layer: very similar to Q-learning I am prioritizing states but at the start rather than as I go down the tree. This gives me a natural set of subgoals to try to achieve in the “tactics” phase where I do more brute force checking how to reach those states. There is natural slippage: I just need to be in a close enough state to my target state which often reveals itself further in this process: “maybe that last enemy can be shot by Zan because I need the mana boost.”
What this exercise has revealed to me is that I definitely need to read up more on reinforcement learning and game AI.