Alloy GGP

UCT Charts

2017-03-12T23:36:00.000-07:00

Today’s topic is a little different: it’s as much an art project as it is engineering or science. Pretty graphs below, but first, an introduction.

UCT is messy by computational standards. It wends its way towards perfect play, but the answers you get depend on how long you waited before asking and how fast the hardware was running, not to mention a stream of pseudo-random numbers. An antivirus program running in the background, a network blip on another continent, bubbles in your CPU’s thermal paste – they can all affect what a general game player ends up returning as its move.

So what happens if we take some of that messiness away?

Most of these factors impact the general game player by increasing or decreasing the number of iteration UCT goes through in a game. Suppose we use a single-threaded player and execute an exact number of UCT iterations each turn; then the random numbers become the only factor that change the outcome.

All of a sudden, the choices of moves become probability distributions. And statistics gives us plenty of tools for understanding a distribution by sampling it.

(Technical note: There is still the potential problem of using a random number generator that is sufficiently flawed to skew the results. I took most of the samples in the following charts with Java’s default RNG, which is known to have problems. I’ll be taking future samples with a better source of randomness.)

So here’s a simple question based on that observation: How does the number of iterations of UCT by each player affect the average goal values for a given game? If I double one player’s time to think, how much does that improve their chances of winning?

Statistical theory says that if we get enough samples for each point of interest, we’ll get a decent approximation of that expected value. So we can get a good approximate answer to this question by simply running lots of games under these conditions; and GGP makes this easy to generalize across lots of different games.

Some notes on execution:

All the games shown here are two-player zero-sum games, as these lend themselves perfectly to this type of visualization. The axis on the left is the iteration count of the first player, and the axis on the right is the iteration count of the second player. (I’ve also tried this with some game-theoretic games, but those can only display the goal result of one player and so may be misleading. I’ve omitted them here.)

I've been playing around with the color scheme; currently I have a patriotic blue-to-white-to-red, where blue indicates the first player winning, red the second player winning, and white a draw (on average).

These graphs use a logarithmic scale. Each step to the right or down doubles the iteration count. The choice of 10 as a starting point is somewhat arbitrary, but partly it is because the results of UCT are not very interesting when there are fewer iterations than legal moves each turn.

These graphs may also be sensitive to the exact implementations of UCT used. The one used here is a fairly simple version without transposition tables, used in some versions of Alloy 0.10.

Each cell in each graph is the average of at least 93 sample matches.

So let’s take a look:

Tic-Tac-Toe: This is a good chart for seeing how intuition maps to the graph. The first player has a clear advantage, as we would expect; but we can also see the region of playing ability where the players will draw in every match.

Connect Four Suicide: If anything, I had expected the bias to be more obvious; but we can see that it reaches about a 4:1 time handicap by the bottom of the graph.

Connect Four: This game, on the other hand, does appear to give an advantage to the first player at this "skill level", though slighter. (With perfect play, this board size is known to favor the second player.)

Sheep and Wolf: Unfortunately, this doesn’t quite show what I really wanted to show: At some point, the second player should consistently get a forced win, resulting in a huge curve in the results. This graph would make it even more obvious that only the skill of the sheep player, not the wolf player, matters (as pointed out in the game’s Tiltyard statistics). However, this graph mostly just shows the region where the wolf player dominates.

Bidding Tic-Tac-Toe: This is by far the weirdest graph from this set. The second player gets better as expected until around 320-640 iterations, at which point they get worse again. I suspect that this comes down to the second player choosing an appropriate first move of the game in that middle section, then switching to favor a losing first move as its guess of X’s bid changes. (The game is a draw under perfect play by both players.)

As mentioned before, this first move would have a well-defined distribution for any given iteration count, and I could even generate UCT charts conditional on certain starting moves; but I haven’t done this yet.

Dots and Boxes: Of the games I’ve examined so far, this has the “steepest” shift in advantages in the lower right, suggesting that it is the most responsive to increasing iteration counts. It would be nice to know if this continues further down as iteration counts continue to rise.

Dots and Boxes Suicide: By contrast, this variant sees more upset wins; it has no cells in which one player won every game.

You can currently find the full set of generated graphs here: https://alexlandau.github.io/alloybox/abfResultCharts.html

This project has already seen months of computation time on an 8-core machine, and each additional row at the edge of the chart roughly doubles the amount of time needed, so care is needed in many cases to extend this to more games or higher counts. (I had to abandon Speed Chess in particular for being too slow to simulate so many times.)

On a practical level, after working on this project, I’d argue that we should consider using a fixed number of iterations per turn when testing variants of a player against each other. First, it eliminates a number of hardware concerns that could subtly bias results or make them vary from machine to machine. Second, it allows players to be handicapped, to check if a new approach can overcome even an opponent with a computational advantage.

Third, and perhaps most importantly, it helps us test out a new idea – perhaps radically different – and see if it leads to more wins in the games we care about. Often new ideas aren’t as effective as they seem, or require drastic modifications to become useful. Using fixed-iteration tests helps because we can write the new approach with little regard for speed – even console logging can be left in if desired. This way, we can see whether the technique will be effective before we bother to fine-tune it.

2016 Tiltyard Open format

2016-12-01T23:21:00.001-08:00

It's time to reveal the types of matches at this year's Tiltyard Open tournament. A few things have changed from last year, so use this as your guide to what's ahead.

Swiss rounds

As before, players will earn a number of points in each match equal to their goal value times a pre-determined weight for that match. The four players with the highest number of points at the end move to the finals.

Matches in these rounds will use a play clock of 15 seconds. The single-player game and "two-player puzzle game" will use a start clock of 240 seconds; the start clock for the remaining games will be 120 seconds.

In some cases, the number of players will not divide evenly among the match assignments (unless we get lucky with the number of participants). In that case, some players will receive byes. In most cases, a player receiving a bye will be among the lowest-performing players at that point, and will receive the highest score of any player in that round (usually 100). Some non-fixed-sum games have their players matched up randomly; in those cases, the points for a bye are instead the average of what other players received.

Single-player game 1: One match, weight 2
Two-player game 1: Three rounds with two matches each, weight 0.5 per match
Two-player, non-zero-sum game ("two-player puzzle"): One match with each player in each role. One role is guaranteed a score of 100 in each match, one is not. Weight 2 per match
Two-player game 2: Three rounds with two matches each, weight 0.5 per match
Single-player game 2: One match, weight 2
Two-player game 3: Three rounds with two matches each, weight 0.5 per match
Three-player game: Four matches, weight 0.5 per match
Two-player game 4: Three rounds with two matches each, weight 0.5 per match
Four-player game: Four matches, weight 0.5 per match
Two-player game 5: Three rounds with two matches each, weight 0.5 per match

The total weights for two-player zero-sum games are 3 each, and for the remaining games are 2 each, plus an automatic 200 points from the third game. The highest possible weighted total score would be 2700.

Final rounds

The four players that rank highest after the Swiss rounds will play on Saturday for the championship in a single-elimination bracket. Matches in these rounds will use a slightly longer start clock of 180 seconds, but the play clock will remain at 15 seconds.

The semifinals are a "best-of-five" and the finals are a "best-of-seven", but there is some ambiguity about what that means when players can draw matches. In this case, the winner of the semifinals is the first player to accumulate at least 251 points in their matches and have more points than the opponent. The finals work similarly, but with a goal line of 351 points.

There will be specific games lined up for the players to play; in the event that they finish the first five (or seven) matches tied, additional matches will be scheduled until one player outperforms the other. (All games will be fixed-sum.)

The winner gets bragging rights and Internet immortality; if they RSVPed, they may also receive a small prize.

Alloy 0.10.0 graphical log streaming is live

2016-05-25T21:06:00.003-07:00

You can now get some insight into what Alloy is thinking by watching a live graphical representation of its logs as it plays on Tiltyard.

I decided to do this primarily as a way to better understand my player and UCT. I had moved Alloy to run on a dedicated computer so it can run continuously, but this made it difficult to view information directly from the program. Streaming this publicly makes it easier to see regardless of where I am, and it also allows others to watch this information and gather their own insights about game AI.

By moving from a few lines of logging at a time to a full screen of graphical information, I can also afford to include a wider array of information useful for debugging or for understanding my player's weaknesses.

I now expect to do a lot of tweaking to make the UI useful and easy for a trained eye to parse quickly, so I won't explain all the current aspects of the UI. (Two things I should point out are Alloy's role index in the top row -- that's zero-indexed, so 0 is the first player in the game, 1 is the second, and so on -- and the E[v] graph in the top left that tracks Alloy's current expected value.)

Instead, I want to explain today the technical aspects of how I did this for anyone interested in applying this to their own players.

The player's backend process generates logs in a structured format, taking advantage of a library I wrote called Escape Rope. This makes it relatively easy to go back and forth between objects and "ropes" (each rope is a string or a list of ropes) and between ropes and individual strings. Those strings are then placed in the logs in lieu of custom, non-machine-readable log lines.

The UI is displayed by a Java program in a separate process that reads these logs as they come in using the Apache Commons IO Tailer class. (Warning: Before version 2.5 of the library, Tailer does not correctly handle files containing any real variety of characters, such as the outputs of Escape Rope.) It uses one master log file to determine when a new game has started, so it can switch to new games automatically. The log lines are converted back into their original objects for easy handling.

The display itself is just a giant JPanel with a custom paint() method. As it doesn't need to accept any user input, I can get away with using fixed coordinates and low-level graphical commands in lieu of actual Swing components, which are difficult to arrange. (This is also likely easier on the CPU.)

This is all running on a Linux box, so as far as I can tell there's only one good option for streaming it: OBS Studio. (I haven't found any libraries that would allow me to directly stream video to a website from Java.) Unfortunately, this is GUI-based, and I'm trying to run this on a headless machine that I can restart and control remotely. Fortunately, I can do the required setup once with a monitor attached, and do subsequent runs from the command line using the --startstreaming argument. The one non-obvious aspect of my OBS setup is that I reduced the frame rate from 30 FPS to 10 FPS to try to reduce my bandwidth consumption.

Now we get into the really "fun" technical difficulties. Take my explanations below with a grain of salt, and my solutions with a shakerful, because I don't know nearly enough about these aspects of the operating system to speak with any authority.

First is an easy one: OBS Studio requires OpenGL 3.2 or higher, which generally requires a graphics card. No big problem there: I repurposed an old graphics card from another machine for that purpose. (And by "old", I mean my main desktop now has a shiny new graphics card.) By default the machine will use an open-source driver called Nouveau for Nvidia cards, which only supports earlier versions of OpenGL; but after installing the official drivers from the nvidia-current package, it can handle OBS Studio just fine. (The mesa-utils package includes the glxinfo command, which is very useful for determining the driver in use and the supported OpenGL version.)

Now we start running into a bunch of problems that are harder to solve, because our operating system is too clever and won't load the resources needed for graphical programs if it doesn't think anyone is going to use them.

First that means installing a bunch of software that isn't included in the Ubuntu Server distribution by default. That includes, at the minimum, xorg and a window manager of some kind. I used fluxbox, but I wouldn't be surprised if all this were easier just using Gnome and GDM, e.g. by just starting from an Ubuntu Desktop installation. (I hear that allows you to set up automatic login, which bypasses some of the later steps here.)

Next we have to trick the system into thinking it is hooked up to a monitor. It appears a software-only solution for this exists, but as I was having problems beyond that, I just got a professionally made dummy monitor plug. (It's actually quite possible to make one of these yourself, and is one of the cooler hacks I've seen to fix a software problem.)

Now we have to actually get the X server running. Unfortunately, it doesn't want you to start it (via the startx command) over SSH, and will fail if you try to do so. Since I'm too lazy to stick an actual keyboard and monitor into the machine every time I restart it, instead I have the computer's startup scripts do this for me. This is fraught with additional problems, as the X server has its own permissions system that will cause failures if it is run by the wrong user -- and the way I'm running things on startup will run it as the wrong user. But I did find a way to hack around this.

So I have a script in the /etc/init.d/ folder called startx-custom, copied from "skeleton" and with the following line at the end (you'll also need to chmod a+x the file to make it executable):

XAUTHORITY=/home/<my username>/.Xauthority startx

Then, to have this run during normal startup (runlevel 5), I have a symlink to that script in /etc/rc5.d/ named S05startx-custom. (Naming it S05 has it run after all the other scripts in that folder, which are called S04 or earlier in my case.)

I mentioned that running startx here causes the X server and fluxbox to be run by root. (This could possibly cause security concerns, but I'm not really worried about this particular server.) There's an "X authority" file that the server and any applications that want to run against it need to agree on, which is usually the problem in this case; the X server will pick some random file that's inaccessible to my user if it is run as root. As you can see in the script above, I instead have it use ~/.Xauthority for that purpose, which is what it would be if my user were running it, and so all my applications will use it. This leaves me with one, much more solvable problem: After startup, the ~/.Xauthority file will be inaccessible to my user due to being owned by root. This is solved by a simple manual command: sudo chown <my username> ~/.Xauthority. (This doesn't work when I stick it in the startup script for some reason, but I don't mind typing in one extra line over SSH.)

After that, I can run my programs in screen sessions by telling them the X display to use:

DISPLAY=:0 ./gradlew runViz
DISPLAY=:0 obs --startstreaming

(If I had an actual monitor hooked up to the computer, this would cause the programs to pop up on that monitor.)

So with those steps, I was able to get the stream up and running from a headless Linux box, offering more insights into my new player's flaws and shortcomings. Hopefully I can use it to improve the player a bit before I start sinking time into my next bright idea.

The GDL engine performance-testing framework

2016-04-01T21:22:00.002-07:00

The Game Description Language is declarative, not imperative. It defines the rules of a game, but not the process by which a program should apply those rules to produce facts. This has led to a wide variety of implementations of GDL interpreters, or engines, that handle this computation, with a wide range in their speed and accuracy across games.

A paper presented at the GIGA'13 conference by Yngvi Björnsson and Stephan Schiffel compared the performance of six such engines on 12 games. (To access the paper, select "proceedings" on the conference website and scroll to page 55.) To follow up on this experiment, and to test my own creations, I decided to write an open-source framework for this type of performance testing.

The gdl-perf project

The following were design goals of the framework:

It should support engines written in any programming language.

Corollary: The engine being tested should run in a separate process from the framework code.

It should be able to set up testing and let it run for as long as it needs with minimal handholding from the user. If the computer it runs on is shut down, it should be able to pick up from where it left off.
Results should be stored in a format usable by different kinds of analysis.
It should be easy to test a large number of games.

The framework itself is written in Java. The main loop for running tests:

Find an engine/game pair that hasn't been tested yet.
Start a process that will run a test for the given engine. As arguments, give it the name of a file containing the game to run, the name of a file to write results to, and the length to run the test for.

It is the test process's responsibility to run the test and write results to the file.
The time taken to run does not include one-time initialization. The framework assumes that this may take some additional time. The test process is responsible for recording and reporting the actual time taken.

The test waits for the created process to complete. If it takes much longer than expected to receive a response, the framework terminates the created process and records an error. (This prevents a single bad case from causing the overall testing to hang.)

Tests themselves consist of continually exploring the outcomes of random series of moves and computing the goal values of the end states. This is done for 30 seconds, to provide hopefully adequate time for cache warmups and on-the-fly optimizations (including Java's JIT compiler). This is slightly different from the original paper, where there was a similar "Monte Carlo" (MC) statistic but also a "Minimax" (MM) statistic that had a different balance of operations between finding legal moves and exploring future states. As Monte Carlo methods are currently dominant, and as this form of testing explores a better balance of early- and late-game states, I've used only a Monte Carlo-type method for testing here.

The statistics collected include the number of state changes made, the number of rollouts made, and the length of time taken. The main statistic used on the analysis pages is the average number of state changes per second.

The games used are nearly all of the games available from ggp.org's game repositories, including the Base, Dresden, and Stanford repositories. Some games are blacklisted for violating parts of the GDL specification (for example, having unsafe rules, or having terminal states with more than one goal value for a role). Since these games may have their rules changed while their names remain the same, a hash is attached to the game ID to ensure we do not conflate two different versions.

There are instructions for trying out the framework yourself in the repository's README.

Findings

I recently ran a set of tests with the current version of my framework on dedicated hardware. The results are here. This includes most of the engines tested in the GIGA'13 paper, including Fluxplayer and CadiaPlayer. There are some substantial differences in my findings, due to a mix of new engines, updates to old engines, and the significantly broader range of games.

I should note that it's also possible that I've misconfigured some engines, or wrote their test code poorly. CadiaPlayer in particular required a bit of finagling to get it to build on my machine, and it currently experiences segmentation faults on many games. Anyone experienced with these players is free to update the code I used to test them (Fluxplayer, CadiaPlayer) or to install and run the players on their own machines.

Another caveat is that this currently does not test the accuracy of these engines' results; I'm experimenting with a separate approach to verifying correctness across engines, but I haven't done any serious testing with this method.

That said, here are the qualitative impressions from the results:

The most stable engine on all games tested is the GGP-Base Prover, which had no errors on any game tested. This is at least in part due to changes made since 2013 to improve its correctness, and in particular make it handle complex types of recursion correctly. Its performance has also improved in this range of time, though not enough to make it competitive with the high-performance engines.

Most engines have trouble on at least a few games, and unsurprisingly, these games often overlap. Even when only checking for crashing and not for correctness bugs, this framework is quite useful for tracking down these bad cases.

The highest-performance game engine on nearly all games is Sancho's engine, based on propositional networks and written in Java.

This particular build was taken from the version used in the 2015 Tiltyard Open. A bug in propnet optimization has been found and fixed since then, so the current version may have fewer errors.

After Sancho, the next two fastest engines are from Alloy: one is based on propositional networks, and the other is a "compiled prover". Both are written in Java. The compiled prover generates additional Java code implementing the game rules that is then compiled and run.

The compiled prover engine, in particular, may be suitable for open-sourcing as a ready-to-use component for other gamers.

2015 Tiltyard Open format

2015-12-01T22:18:00.000-08:00

Here's a somewhat more in-depth explanation of the sets of matches that will be played in the 2015 Tiltyard Open. Hopefully this will help make sense of the overall tournament structure and how match results affect the final results.

We expect to use version 0.0.7 of the ggp-tournament library for scheduling the matches, so you can check out that library for details beyond the explanations here.

Swiss rounds

The Swiss format will use a number of 1-, 2-, and 4-player games. Players will earn a number of points in each match equal to their goal value times a pre-determined weight for that match. The four players with the highest number of points at the end move to the finals.

There will be a page to keep an eye on the rankings, but the link isn't quite available yet.

Matches in these rounds will use a start clock of 120 seconds and a play clock of 15 seconds.

In some cases, the number of players will not divide evenly among the match assignments (unless we get lucky with the number of participants). In that case, some players will receive byes. In most cases, a player receiving a bye will be among the lowest-performing players at that point, and will receive the highest score of any player in that round (usually 100). Some non-fixed-sum games have their players matched up randomly; in those cases, the points for a bye are instead the average of what other players received.

Single-player game 1: One match, weight 2
Two-player game 1: Three rounds with two matches each, weight 0.5 per match
Single-player game 2: One match each of two variants, weight 1 per match
Two-player game 2: Three rounds with two matches each, weight 0.5 per match
Single-player game 3: One match, weight 2
Two-player game 3: Three rounds with two matches each, weight 0.5 per match
Four-player game 1: Four matches, weight 0.5 per match
Two-player game 4: Three rounds with one match each, weight 1 per match
Four-player game 2: Four matches, weight 0.5 per match
Two-player game 5: Three rounds with one match each, weight 1 per match

Note that the total weights for two-player games are 3 each, and for non-two-player games are 2 each. The highest possible weighted total score would be 2500.

One more thing: I'm using an unorthodox definition of fixed-sum games when more than two players are involved. There are some details in the README of the ggp-tournament library. The upshot is that some games that have fixed-sum goal values but also have kingmaking-like interactions between the players will be treated as non-fixed-sum, and so will have random player assignments.

Final rounds

The four players that rank highest after the Swiss rounds will play on Saturday for the championship in a single-elimination bracket. Matches in these rounds will use a slightly longer start clock of 180 seconds, but the play clock will remain at 15 seconds.

The semifinals are a "best-of-five" and the finals are a "best-of-seven", but there is some ambiguity about what that means when players can draw matches. In this case, the winner of the semifinals is the first player to accumulate at least 251 points in their matches and have more points than the opponent. The finals work similarly, but with 351 points.

There will be specific games lined up for the players to play; in the event that they finish the first five (or seven) matches tied, additional matches will be scheduled until one player outperforms the other. (All games will be fixed-sum.)

The winner gets bragging rights; if they RSVPed, they may also receive a small prize (TBD).

Bases and inputs

2015-11-04T21:19:00.000-08:00

I don't know if there are any clear full descriptions of how base and input sentences work in GDL. (They were added to GDL since the last full specification was written.) So, here's a brief write-up of how they work.

The base and input keywords are used to specify the possible true sentences (i.e., components of the game state) and legal moves that could occur over the course of the game.

Unlike true sentences and legal moves, the sentences that begin with base and input do not depend on the state of the game; they are constant. This means they can be easily and quickly queried.

If a game description uses base sentences, then every game state (true) sentence that could be true over the course of the game must have a corresponding true base sentence. For example, if some series of events in a game of chess could lead (true (cell d 5 black knight)) to be true, then (base (cell d 5 black knight)) must always be true regardless of the state of the game.

Similarly, if it's possible for (legal white (castle queenside)) to ever be true, then (input white (castle queenside)) must always be true (if the game uses input sentences).

What about the converse, if a base or input sentence is true but the corresponding game state sentence or move can't ever be true in game play? This is discouraged but may be unavoidable, depending on the game. Game designers should keep the bases and inputs as close to the actual set of possibilities as is reasonably achievable.

By contrast, if a base sentence is missing but its corresponding true sentence can be true, this is a bug in the game description. (Unless, of course, the game does not include any base sentences.) This is not intended to be a way to impose new restrictions on possible sentences or moves.

The Validator in GGP-Base includes a BasesInputsValidator that looks for validations of this type, where a base or input sentence is missing for a true sentence or legal move.

What's the value of being able to ask about these? Well, there are approaches to analyzing the game (including propnets, but also heuristic methods) where it helps to start with a complete set of possibilities. Furthermore, it can be difficult or time-consuming to compute this directly from the game description, and this process could result in a prohibitively high false positive rate. The OptimizingPropNetFactory, for example, takes advantage of base sentences as a starting point for propnet construction. The alternative would be to create a much larger initial propnet and pare it down afterwards, which is both slower and unreliable.

Although this is not in the specification, the level of adoption is fairly high; I believe these are used in all games in rotation on Tiltyard, and are used consistently in Stanford's tournaments. They will also be used in games in the upcoming Tiltyard Open.

GDL perf framework preview

2015-11-04T00:26:00.000-08:00

I've been working on and off on a performance-testing framework for interpreters of GDL games (i.e. rule engines). There's still more work to be done before it's sufficiently complete and well-documented for others to use, but here's a sneak peek of output. Suggestions are welcome on the Github issues page.

A few notes:

Although all the engines currently represented are Java-based, it's designed to support engines written in any language. Engines are run in separate processes and all communication is through either the file system or standard output.
There's a separate mechanism for testing correctness. This is less polished and not yet represented in the output.
Collected data is stored in CSV-like files. There are some built-in tools for analyzing this data, like the one that generated the web pages I linked to above.
Engines are versioned; you could collect statistics from different versions of an engine over time and compare them.
Similarly, game IDs include hashes of their contents, so if a game is changed we won't get results from different versions mixed up. All games come via the game repositories on ggp.org, including its mirrors of the Dresden and Stanford Gamemaster game servers.
Games that are known to have bugs are excluded via a blacklist. (STANFORD/kono might need to go on the blacklist; an earlier version is already on it.)
To capture the effects of caching and JIT compiler optimizations, each game is run in each engine for 30 seconds before returning performance statistics. This means it takes a few hours per engine, and a few days to collect the full set of results.
There are many more engines I hope to add, especially those used by Fluxplayer, Cadiaplayer, and Sancho. The current set are mostly those that were easiest to add.
This was inspired by a paper in the GIGA '13 proceedings by Yngvi Björnsson and Stephan Schiffel on relative performance of GDL interpreters. Hopefully this will make future comparison efforts much easier and enable wider adoption of faster engines.

RuleEngine interface

2015-08-02T18:44:00.000-07:00

I've mostly been forgoing work on the current version of Alloy in favor of a newer, better-factored framework with better logging support. Partly I'm working to support a new interface to replace StateMachine.

Here's what the RuleEngine interface looks like (note the use of generics for Move and State), along with the interfaces it relies on:

public interface RuleEngine<Move, State extends RuleEngineState<Move, State>> {
 State getInitialState();

 int getNumRoles();
 List<Role> getRoles();

 boolean isTerminal(State state);

 int getGoal(State state, int roleIndex) throws GameDescriptionException;
 default ImmutableIntArray getGoals(State state) throws GameDescriptionException;

 List<Move> getLegalMoves(State state, int roleIndex) throws GameDescriptionException;

 State getNextState(State state, List<Move> jointMoves) throws GameDescriptionException;

 Translator<Move, State> getTranslator();

 default State toNativeState(RuleEngineState<?, ?> otherState);

 default List<Move> getRandomJointMove(State state) throws GameDescriptionException;
 default State getRandomNextState(State state) throws GameDescriptionException; 
default List<List<Move>> getLegalMovesByRole(State state) throws GameDescriptionException;
 default List<List<GdlTerm>> getGdlLegalMovesByRole(State state) throws GameDescriptionException;

 default ImmutableIntArray doRandomPlayout(State state) throws GameDescriptionException;
}

public interface RuleEngineState<Move, State extends RuleEngineState<Move, State>> {
 /**
  * It is recommended that every state return the same Translator instance
  * when this is called, so that there is exactly one Translator per
  * originating RuleEngine. RuleEngine#toNativeState relies on this to
  * operate efficiently.
  */
 Translator<Move, State> getTranslator();

 default Set<GdlSentence> toGdlState();
}

public interface Translator<M, S> {
 public GdlTerm getGdlMove(M move);
 public M getNativeMove(GdlTerm move);

 public Set<GdlSentence> getGdlState(S state);
 public S getNativeState(Set<GdlSentence> state);

 default List<GdlTerm> getGdlMoves(List<M> moves);
 default List<M> getNativeMoves(List<GdlTerm> moves);
}

Note that the implementations of the default methods are omitted for brevity.

What's different from StateMachine? The first change I made (and have done in my StateMachine implementations for a while) is to remove the initialize() method. Java concurrency is tricky, and usually the best way to deal with it is to use immutable objects wherever possible. In addition, an initialize() method introduces the possibility of initializing a type twice or not at all before trying to use it. Instead, I use static create() methods to do whatever processing is needed to create the internal state, and a RuleEngineFactory anywhere we need to specify a type of engine to be created. This lets most of the internals of RuleEngine implementations be immutable.

The method inputs are also different. Instead of requiring use of the Move, Role, and MachineState classes, we use inputs that don't have the associated overhead of extra classes. Role inputs become role indices, which most implementations use internally anyway, and are faster to convert into Roles than vice versa.

Moves and states use the RuleEngine's own preferred types. With MachineState as the input, methods would typically begin with an instanceof check to see if the state was the preferred subclass, and then either try to translate the state itself or throw an exception to tell the caller to do so. Now methods can jump into actually using the state.

A single GameDescriptionException type covers three different exception types used in the StateMachine. They all boil down to the same thing: the GDL did something bad.

Finally, to deal with the fact that we have multiple types of moves and states, we introduce a Translator object that is responsible for converting between native and standard representations for moves and states. The standard type for moves is GdlTerm; the standard type for states is Set<GdlSentence>. Each RuleEngine instance is expected to have its own Translator object, and each state a reference to that object, so that determining if a state is native to a RuleEngine is a simple instance equality check. (Unlike the MachineState/instanceof approach, this also lets us translate between multiple instances of the same class that have different encodings of state.)

One downside is that clients have to use parameterized methods to pass native moves and states into the engine, and these can be tricky to write if you're unfamiliar with the syntax. (This also causes Eclipse to disable many of its auto-fix features, like making a method signature for a missing method.)

Amazons and Efficient Recursion

2015-06-20T21:17:00.000-07:00

I recently rewrote Amazons to make it more efficient, and thus more attractive as a GGP game. I've also been doing some unrelated work on a compiled prover rules engine, which is much faster than the ProverStateMachine GGP-Base. In the process, I've had some ideas about how to deal with recursion efficiently in provers, which could have implications for many other important games.

Amazons is a two-player board game ostensibly related to chess, though the gameplay and goals are quite different. Both players have four pieces resembling chess queens; these are called "amazons". They start scattered across a 10x10 board. Unlike in chess, there are no captures; each turn, the player moves one of their amazons (in the manner of a non-capturing queen move) and then fires an arrow from that amazon (anywhere a queen could move, possibly into the space just vacated). Arrows stay in their target square for the remainder of the game and block both movement and fired arrows, gradually cluttering the board. The first player to be unable to move loses.

The current version of amazons (and its variants amazonsSuicide and amazonsTorus) is rather difficult for rules engines to handle, with this difficulty almost entirely coming from the number of legal moves available on each turn. (amazonsTorus has a couple of other peculiarities, in its use of recursion and a bug allowing pieces to effectively not move, or fire an arrow into their own space.) The new version separates the movement and arrow-firing portions of the move into separate turns; the game remains the same, but simulation is much faster, and the rules are simpler. (On the other hand, the game tree expands due to the additional intermediate nodes; but gamers have more freedom to cope with this as part of their playing algorithms.)

However, there's one part of the rules that still works inefficiently for a backward-chaining prover like the ProverStateMachine. That is the step of simply determining which moves are legal. The rule needs to look for empty spaces adjacent to the piece, then empty spaces next to those, and so on.

This is a problem that's also seen in chess and Reversi. (For backward-chaining reasoners, the difficult part of Reversi is not the complex-looking goal calculation -- which works well as long as query answers are cached* -- but determining which pieces get "flipped" after each move.)

Recursion and Backwards-Chaining

If this were written in an imperative language, the pseudocode for a linear implementation would be easy:

val result = []

for (Direction dir : allDirections()) {

    var (x, y) = (xInput, yInput)

    while (empty(nextSpaceInDir(x, y, dir))) {

        (x, y) = nextSpaceInDir(x, y, dir)

        result.add(move(xInput, yInput, x, y))

    }

}

However, the "direct" translation for that approach into GDL will often break provers, which will get stuck in an infinite loop:

(<= (openPath ?x1 ?y1 ?x2 ?y2 ?dir) ; rule 1.1

    (oneInDir ?x1 ?y1 ?x2 ?y2 ?dir)

    (not (occupied ?x2 ?y2)))

(<= (openPath ?x1 ?y1 ?x3 ?y3 ?dir) ; rule 1.2

    (openPath ?x1 ?y1 ?x2 ?y2 ?dir) ; site of infinite recursion

    (oneInDir ?x2 ?y2 ?x3 ?y3 ?dir)

    (not (occupied ?x3 ?y3)))

Instead, we tend to use this:

(<= (openPath ?x1 ?y1 ?x2 ?y2 ?dir) ; rule 2.1

    (oneInDir ?x1 ?y1 ?x2 ?y2 ?dir)

    (not (occupied ?x2 ?y2)))

(<= (openPath ?x1 ?y1 ?x3 ?y3 ?dir) ; rule 2.2

    (oneInDir ?x1 ?y1 ?x2 ?y2 ?dir) ; start with oneInDir instead

    (not (occupied ?x2 ?y2))

    (openPath ?x2 ?y2 ?x3 ?y3 ?dir))

What's wrong with this second technique? Well, consider the set of openPath sentences we need to compute, say, in the diagonal starting at (1, 1). (For convenience, we can use just one of the coordinates.)

In the first technique, the values of openPath that we compute are (1, 2), (1, 3), (1, 4), ..., (1, 10). The first coordinate is always the same. This number is linear in the length of the line.

For the second, (1, 3) relies on (2, 3); (1, 4) relies on (2, 4), which in turn relies on (2, 3) and (3, 4); and so on. In other words, the number of sentences we have to discover is quadratic in the length of the line, so even an ideal prover will have quadratic performance here.

(Note that forward-chaining techniques -- like most propnet implementations -- will maintain even those openPath sentences that aren't being asked for by the current state's move legality query, so there's no real difference in implementation for those, unless we add the piece's presence as a precondition for both rules.)

Recursion Handling in the ProverStateMachine

The ProverStateMachine got an upgrade in March that allows it to handle recursion like that used in the first approach. It gives correct results according to the semantics of GDL, even though it maintains the order of those conjuncts. However, its approach is not efficient.

When we come across a sentence that we're already asking, we want to return (in the recursive subquery) any sentences that can be produced by other methods of asking the original query. The way we do that is essentially writing down that the original query is recursive, which tells us we need to revisit its subqueries later with additional results. (This happens when we run into rule 1.2 above. Note that this doesn't happen for rule 2.2, as the "query" includes the first two arguments to openPath, which are different for the recursive subquery in approach 2, as opposed to being the same in approach 1.)

When we see that a query we've just finished asking is recursive, we usually need to revisit it to gather additional results, by plugging any new answers we've obtained into the identical subqueries. We repeat this until the set of answers obtained is the same as the set of answers we originally had. (GDL language restrictions ensure that this set only gets larger, not smaller, and will eventually converge.) This means when we do come across a query we're already asking, our return value is whatever's in previousResults.

(Also, caching is disabled while we're in the middle of handling a recursive query, since we'll be getting incomplete results most of the time.)

So how efficient will this method be for that first approach? Unfortunately, each iteration of this process adds only one new value to the inputs, and each time it retries every value it's found so far; again, this is a quadratic procedure relative to the line length.

There's a solution to this, already used by the forward-chaining reasoner GGP-Base uses in its ConstantChecker generation: Only feed in new values to the rules. Indeed, that would work in this case to give linear behavior. So why don't we use this approach? Because it's not that simple. Handling a single recursive call in a rule is easy; handling multiple recursive calls is where the trouble lies.

Let's say we only run the rule with new sentences as the input to any recursive references we see. Then consider what happens if that rule makes multiple recursive references. There might be answers to the query that are only generated if one of the references takes on a new value, and another takes on an older value. This variable assignment wouldn't have been made the first time around (with only old values), and it wouldn't be made the next time either (with only new values). If we're going to limit ourselves to new values, we need to limit this constraint to just one recursive reference at a time; and that could be any recursive reference, not just the first one we come across. This is made extra-complicated by the fact that these recursive references may be indirect or appear in other rules.

The forward-chaining reasoner used by the ConstantChecker isn't currently set up to handle more complex recursion across rules and sentence types. The ProverStateMachine has to meet a somewhat higher standard, as the reference state machine; it should be able to give correct results in all cases, which more efficient implementations can be checked against. This is why I preferred the slower, but more widely applicable approach.

But there's certainly room for faster provers that don't have to work that way. So what approach should they use?

Continuations

My idea (which I'm sure is already in the theorem-prover literature somewhere) also uses this notion of revisiting recursive subqueries later, but in a slightly different way, with major performance implications. Whenever we encounter a recursive call to something we're already asking, we store a continuation -- a set of the information necessary to essentially recreate that point in the query process, with an answer to that query filled in. (This is my own name for the idea; it's something like a closure, and may be able to be implemented using closures.) Before the original call ends, we ensure that each continuation has been run with each available input, and the results collected.

Why doesn't this encounter the problem I mentioned before about only looking at new sentences? It's because if multiple recursive subqueries are encountered, each one will get its own set of continuations -- and some of these will reflect the older sentences having been passed in to earlier queries.

Meanwhile, it only provides each sentence once to each continuation, so it should give linear performance for the first approach, where there is only one continuation.

There's still a little more room for improvement: not all inputs will match the conditions of all continuations. A constant or bound variable in the recursive call can conflict with the actual contents of the sentence. In this case, there should be some efficient filtering (probably involving HashMaps) to deliver each new sentence to only the set of continuations for which it is appropriate. New continuations may also need their own efficient filters over the set of sentences seen so far.

I still haven't implemented this yet, but the compiled prover should be a good place to try it out. It's actually a natural fit; the part of the code corresponding to the rule leading up to the recursion is in one place, and the code afterwards is split out later, where we're processing the continuations. That probably sounds confusing, but I may have a code sample later to better illustrate what's going on.

Numbers

Anyway, here are perf numbers to compare the Amazons versions (and highlight the performance of the compiled prover). As usual, numbers are approximate state changes per second during a 30-second period of repeated random playthroughs. "Rule 1" and "rule 2" refer to the two different implementations of the recursive rule discussed above; "rule 2" is the version currently on Tiltyard.

Rules engine	Old Amazons	New Amazons, rule 2	New Amazons, rule 1
Prover	0.2	165	40
Tuple Prover	14	890	1090
Compiled Prover	640	22500	error
Propnet	172	10500	11800
Propnet gen time	486 s	5 s	10 s
Propnet size, no opt.	3076k links, 1502k components	78k links, 39k components	78k links, 39k components
Propnet size, with opt.	1814k links, 585k components	64k links, 26k components	64k links, 26k components

Note that two state changes in new Amazons are equivalent to one state change in old Amazons, and the numbers haven't been scaled to reflect that. Even so, it's clear that the simpler approach is a huge improvement. (Aside from the numbers above, propnet generation for the old Amazons game required a memory allocation of around 3 GB to succeed.)

I expect to post again sometime later to explain why the compiled prover I'm using is so much more efficient than the other provers, despite using the same core approach to handling rules. Suffice it to say that it is, on average, about 20 times faster than the GGP-Base prover (the first one listed above), and can greatly exceed this on complicated games like the Amazons variants. This makes it competitive with the propnet rule engine I'm using for these comparisons, though certainly much faster propnet-based rule engines exist. (Alloy used that propnet engine in versions before 0.8, though it now has a faster propnet-based approach; I may also do some comparisons with Sancho's engine in the future.)

* The rules could, and perhaps should, be slightly rewritten so the caching is unnecessary.

Griddle: a GDL editor for Eclipse

2015-06-07T20:04:00.000-07:00

Griddle is an Eclipse plugin that provides a specialized GDL editor for game files. The editor makes it easier to write new games while avoiding typos and syntax errors.

In addition to syntax highlighting, Griddle includes a range of errors and warnings that can detect issues like mismatched parentheses, inconsistent capitalization, and undefined sentence types, as well as more complex GDL syntax errors like unsafe rules.

It even assists with good indentation style; new lines are prepended with four spaces per level of parenthetical nesting.

More information, including installation instructions, is available in the README shown on its Github page.

Implication deduction: Logic over propnets

2015-05-10T23:22:00.001-07:00

Today I'd like to share a simple and surprisingly useful algorithm for making logical inferences about games using propositional networks.

I developed this as part of the implementation for a more specific feature (one of the applications listed below). Although it was developed in the context of a larger feature, I wrote the code for this step to stand alone, and it has proved useful in quickly writing other features that involve understanding games more deeply.

The question that it answers is simple: given a set of components that we know to be true or false in a propnet, which other components are also guaranteed to be true or false? I call this implication deduction.

The algorithmically astute among you may notice that a complete solution to this problem is equivalent to the Satisfiability problem, and so would be computationally intractable. The good news is that we don't need to solve this problem completely. We can usually make do with a subset of the components that we could, in theory, identify as having fixed values. In fact, it can make sense to use multiple implementations with different tradeoffs between speed and completeness. In some applications, this will be run once per game; in others, it may be run once per legal move, or per state of the game.

There's also a twist we apply to make this more useful: We want to be able to track these implications across multiple turns. This involves pairing our references to components with logical turn numbers. (These don't have to match real turn numbers; if I'm considering two consecutive turns, I'll always use 0 to indicate the earlier turn and 1 the later turn.) Luckily, due to the nature of the propnet's construction, this does not make the algorithm significantly more complicated. We can take advantage of the transition components that mark the boundary between turns, and use these to note when we cross from one turn to another (or when we reach the boundary of our time span of interest).

The algorithm

We refer to the possible component types in a propnet with the capitalized names Proposition, Transition, Or, And, and Not, to match the class names used in GGP-Base. (Support for Constants can easily be added.)

We use a TimedComponent type that pairs a reference to a component in the propnet with a logical turn number. Its inputs and outputs are the TimedComponents corresponding to the component's inputs and outputs; these are on the same turn, except where a Transition is involved. Moving from a Transition to its input decreases the turn number by one, and vice versa.

Inputs to the algorithm are an initial assignment of TimedComponents to values and a range of turn numbers of interest.

Initialize an “assignmentSoFar” mapping TimedComponents to boolean values. Initialize this with the inputs to the algorithm, i.e. what we already know is true.
Initialize a “componentsToExamine” set. (This could be a queue or stack, but it's advantageous to remove duplicates as new components are added.) The initial values are the TimedComponents in assignmentSoFar.
While componentsToExamine is non-empty, remove a “curComponent” from it and apply the following operations:
1. If curComponent is outside the time range of interest, ignore it and move on to the next component.
2. If curComponent is not in assignmentSoFar, apply the following techniques to check for reasons it might be assigned a value. If one applies, add that value to the assignmentSoFar mapping.
  1. If a Proposition or Transition output of curComponent has a value, curComponent takes the same value.
  2. If a Not output of curComponent has a value, curComponent takes the opposite value.
  3. If an And output of curComponent is true, curComponent is also true. Optionally, if the And output is false and all its other parents are true, curComponent is false.
  4. If an Or output of curComponent is false, curComponent is also false. Optionally, if the Or output is true and all its other parents are false, curComponent is true.
  5. If curComponent is a Proposition or Transition and it has an input with a value, curComponent takes the same value.
  6. If curComponent is a Not and its input has a value, curComponent takes the opposite value.
  7. If curComponent is an And and one of its inputs is false, curComponent is false. If all of its inputs are true, curComponent is true.
  8. If curComponent is an Or and one of its inputs is true, curComponent is true. If all of its inputs are false, curComponent is false.
  9. Optionally, if curComponent is a legal Proposition and the corresponding does Proposition is true, curComponent is true.
  10. Optionally, if curComponent is a does Proposition and the corresponding legal Proposition is false, curComponent is false.
3. If assignmentSoFar contains curComponent (either because it was in the initial mapping or just added), do the following:
  1. Add to componentsToExamine any inputs to curComponent that are not already in assignmentSoFar.
  2. Add to componentsToExamine any outputs of curComponent that are not already in assignmentSoFar.
  3. Optionally, if the assigned value is true, for any output of curComponent that is an And gate assigned the value false, check if all but one input to the child is true. If so, add the last parent to componentsToExamine if it's not already in assignmentSoFar. (This corresponds with the optional technique in 3.2.3 above.)
  4. Optionally, if the assigned value is false, for any output of curComponent that is an Or gate assigned the value true, check if all but one input to the child is false. If so, add the last parent to componentsToExamine if it's not already in assignmentSoFar. (This corresponds with the optional technique in 3.2.4 above.)
  5. Optionally, if curComponent is a legal Proposition and the assigned value is false, add the corresponding does Proposition to componentsToExamine if it's not already in assignmentSoFar. (This corresponds with the optional technique in 3.2.10 above.)
  6. Optionally, if curComponent is a does Proposition and the assigned value is true, add the corresponding legal Proposition to componentsToExamine if it's not already in assignmentSoFar. (This corresponds with the optional technique in 3.2.9 above.)

This is a bit long, but it should be clear what each step accomplishes. We start with the initial assumptions and spread across the directed graph in both directions, upstream and downstream, looking for components with values that can be deduced using one of the techniques in step 3.2.

So what are some of the applications of this technique?

Winning/losing move detection

The original reason I wrote this algorithm was to generate a “generalized endgame book” (GEB); that is, a way to look at a state and see from a subset of the state that a given move will lead to victory. (This is contrasted with a normal endgame database, which requires a complete match on the entire state.) The simplest incarnation of this idea is finding moves that in every case immediately end the game and give the player a score of 100. The benefit here is that this can be applied not just during the first phase of an MCTS playthrough, but during the random descent through previously unexplored states.

(I was planning to do a larger post on GEBs, but when I (finally) re-added GEBs to the latest version of Alloy, I was unable to reproduce my past results showing their benefits in earlier versions. It can be very tricky to tell which advanced features will actually result in higher winning percentages in the context of a larger game player.)

To generate GEBs, implication deduction is normally just one part of a larger, long-running algorithm. However, if we're just looking for moves that immediately cause a player to win or lose, we can use a trick with logical converses to detect these moves quickly.

If you want to find the set of moves that will cause xplayer to win immediately, run the algorithm in the following two configurations:

Time range: Turns 0 and 1

Inputs: <terminal, 0> = false (optional), <terminal, 1> = false

Outputs to watch for: <

(does xplayer
?move)

, 0> = false

Time range: Turns 0 and 1

Inputs: <terminal, 0> = false (optional), <(goal xplayer 100), 1> = false

Outputs to watch for: <

(does xplayer
?move)

, 0> = false

The logic here is that if ?move is a winning move, then <(does xplayer ?move), 0> being true will cause terminal and (goal xplayer 100) to be true on the following turn. The converse also holds: either of the latter being false will cause the original move to be false. By taking advantage of this converse, we can test all moves at once instead of running tests one at a time.

Joint latch detection

We can also use this algorithm to easily find the subset of a given state that will not change for the rest of the game. Unlike most methods of latch detection, this will handle combinations of state that are not latches individually, but keep each other in place due to their combination.

This can be used to reduce the size of a propnet for more efficient simulations; reduce the size of game states as certain sentences no longer need to be included; or identify states that are unwinnable (or unloseable).

To do this latch detection, run the algorithm repeatedly with the turn range [0, 1]. In the first step, set the sentences in turn 0 according to the state of interest (setting the base propositions should suffice). In subsequent steps, take the intersection of the components in turns 0 and 1 and put that in step 0. (If a component is included in both turns but its truth values differ, ignore it.) Repeat this process until the components in 0 and 1 are identical and have the same values; these are the latches.

Goal proximity heuristics

This will only be useful for a few games, but it's also possible to use this type of ternary logic to determine the minimum number of turns before (goal player 100) can become true. This can be used as an admissible heuristic for A* search in puzzles.

A somewhat contrived example (but potentially interesting as a puzzle for GGP in any case) is the made-up “Knight's Journey” puzzle. Unlike the “Knight's Tour”, this simply requires a knight to cross the board from one corner to the other in the minimum number of turns, giving 100 points if successful and 0 otherwise. If the board is large enough, MCTS alone will not find a solution, but goal proximity (measured by number of turns) is one of several heuristic approaches that can guide the player to the solution quickly.

The process for generating the heuristic is like the joint latch detection algorithm, but the output is the number of iterations needed until (goal player 100) is no longer guaranteed to be false. (Note that you'll need to check for the convergence of the algorithm, as that sentence being false may be part of the latch.)

Imperfect information games

I don't have plans just yet to extend Alloy to work in GDL-II. However, this technique would certainly have important applications in that realm. It would allow players to squeeze extra information about past and future turns out of the information they've collected over the course of the match. For example, a Minesweeper player would be able to rule out certain initial mine placement locations in the early turns of the game, then use the remaining possible initial sequences to consider possible placements.

I believe this technique is also easier to reason about (for most programmers) than traditional logic techniques that need to be applied directly to the game rules, like most GDL-II reasoning techniques so far. For example, the current extent of the implications found can be visualized by displaying the propnet colored according to its assignments. This visualization can suggest further refinements to more fully capture what is known.

Extensions

There are many ways this algorithm could be improved further. As mentioned above, a perfect implication deducer would require solving the Satisfiability problem completely. This is difficult in the general case, but it's possible that using a good SAT solver directly would be reasonable and effective.

As a perhaps more practical matter, there are probably relatively simple changes to the algorithm that can make it more efficient or powerful. In particular, there must be more efficient ways to recompute the implication set after slightly tweaking the inputs, which would be useful for several applications. There are also game-specific facts that could assist the algorithm, such as mutex awareness.

Rewriting chess (again)

2014-05-16T20:55:00.002-07:00

One of the embarrassing questions about GGP: "So, how good are they at chess?"

Not very good. Not very good at all.

There are several reasons for this.
First, traditional chess programs rely strongly on heuristic-based state evaluation. No one in GGP seems to have cracked the problem of creating and evaluating similarly nuanced heuristics for arbitrary games at runtime, nor is it trivial to integrate heuristic results with UCT. (I believe Fluxplayer does make effective use of heuristics -- I doubt anyone could beat it at Breakthrough Suicide for precisely this reason -- but it seems to have other weaknesses against strong simulation-based players.)

Without the use of heuristics, we run into several factors that hurt players' performance as game complexity increases from tic-tac-toe to connect four to checkers to chess. First, there are more possible moves on a given turn. Second, games take more moves to finish. Third, and more subtly and insidiously, the game rules get more complicated and difficult to process. Not only does a single additional state of the game tell you less and less, it takes longer to compute.

All these reasons, and perhaps more (e.g. bugginess), have led to chess being excluded from the otherwise-comprehensive list of games in the Dresden server lineup. (A variant that leaves out the concept of check, capture_the_king, is included. Fluxplayer, using heuristics, looks like a poor amateur human there -- which is to say, far superior to every other general game player.)

We can't reduce the branching factor of chess or the length of the game (barring a draconian step counter). We can, however, do something about the rules complexity. I realized this a while ago, when I tried to write a version of chess that would be safe for propnets (and, most likely, faster for theorem prover-based implementations as well).

The basic design principle here was that it would be essentially safe to have relations that expressed relationships between two squares because there were only 64 * 64 = 4096 of them, a relatively small number. Relations that were related to three squares would instead have 64 * 64 * 64 = 262144 possibilities, a much scarier number, and one that pretty much doomed any propnet generator to OOM.

The mistake I made was in my decision to give the pieces IDs, rather than reference them primarily by their squares. This was originally an attempt to reduce the number of possible relations of each type, but it backfired.

How many squares are on a chess board? 64.

How many IDs would I need? 32, one for each piece on the board.

Except that each piece is associated with a type in a hard-coded way, and promoting a pawn meant adding a new ID to the board. If a player promoted eight pawns to queens, I had to support that -- meaning eight more IDs per player.

In addition, one change I wanted to make from the original chess game was that it only supported pawns being promoted to queens; there are certainly situations where a knight is needed instead. (Situations calling for bishops or rooks are significantly rarer.)

How many IDs did I end up needing? 64. There was no benefit from using IDs over locations.

This led to a point of failure when I needed a "between" relation saying that one piece was collinear with and between two other pieces. If the pieces were specified by their locations, this actually wouldn't be so bad; for each piece in the middle and each direction, there are at most 3 * 4 possibilities for the locations of the other two pieces. This would give an upper bound of 64 * 8 * 12 = 6144 relations of this type.

Instead, we have IDs, which could be combined pretty much arbitrarily. This gives us the full 262144 possibilities, for each direction. That pretty much ruins the potential of the game description, at least for propnets.

Take two

I figured I would write a replacement for chess in two steps: first, write a "speed chess" version that ignores check and allows direct capture of the king; then, write a full version of chess. (Players' strategies should in theory be roughly the same as in normal chess, though this does depend on players reaching a certain level of play.)

I've finished the first of these versions, and run it through some performance testing relative to other versions of chess. (Caveat: It's still possible there are bugs that will be exposed by additional play.)

Here are some of the design decisions and implementation details for this version:

Piece names are given by two separate constants instead of one: "white pawn" instead of "wp", for example. This eliminates the need for relations that specify "ownerOf" and "pieceType". (This doesn't matter for propnets, but helps provers and overall readability, and may assist in the discovery of heuristics.) An example cell relation is "(true (cell a 8 black rook))".
Empty spaces are not explicitly represented.
Since we don't use piece IDs, we can freely include promotions to any piece type, not just queen or knight. (Players are free to choose incorrectly at their leisure.)
The type of piece listed in the move is the type of piece that will end up on the given square. This greatly simplifies the logic for handling promotion. Castling involves separate types of moves.
The game description uses an "affected" relation to handle piece persistence. (That is, there are a few rules to specify which spaces are affected by the current move, and any piece in a square that is not affected stays where it is.) This greatly reduces the size of propnets.
There are a few inexpensive extra bits of state for dealing with castling and en passant.
This version of the game (in accordance with the "speed chess" theme) has a flat 200-step counter for ensuring termination. The "full" version of chess will likely use a more official rule, i.e. counting steps since the last capture or pawn move.
Base and input relations are included.

Let's look at how the new version, speedChess, compares to other versions and variants of chess available, performance-wise (all figures are average state changes per second):

Game	Prover	TupleProver	Propnet	Propnet construction stats
~~speedChess~~	~~170~~	~~770~~	~~10500~~	~~5 seconds, 27K components, 56K links~~
speedChess (fixed, see edit 2 below)	55	490	4050	7 seconds, 35K components, 82K links
capture_the_king	100	280	4600	17 seconds, 205K components, 779K links
brawl (only kings, queens, rooks)	145	340	3600	15 seconds, 97K components, 566K links
chess	15	100	N/A	Propnet creation ran out of memory
endgame (both kings, 1 rook, 15 steps)	570	1150	9500	2.4 seconds, 34K components, 144K links
skirmish	40	180	2600	8 seconds, 117K components, 344K links
slaughter	20	145	N/A	Propnet creation ran out of memory

(Note that these numbers for the ProverStateMachine and TupleProverStateMachine include recent improvements to the ProverStateMachine. This makes them faster relative to the propnet-based version, compared to earlier comparisons.)

speedChess is behind endgame's performance in most respects, but not by much; it's a small price to pay for getting the other 29 pieces on the board. Performance is better than all the other chess variants (though capture_the_king isn't too shabby).

The ProverStateMachine still won't work very well (the step counter limit is 200), but testing this with Alloy and using propnets, it managed to do a couple of things that looked vaguely rational. Maybe now we'll be able to get this running on Tiltyard and encourage the use of techniques that work better than basic UCT on games of this complexity.

Next up will be adding check to the game.

EDIT 2014-05-28: A couple of issues have been discovered since the original post, one in the game description itself and one in the OptimizingPropNetFactory. If your gamer is using the OPNF, you'll need to get a version of GGP-Base with this commit or apply the fix yourself for propnet generation to work properly on speedChess. (The bug is non-deterministic; the performance stats above were taken while it happened to be working correctly, and should still be accurate.)

EDIT 2014-06-05: A bigger issue came up: apparently bishops, rooks, and queens were unable to move more than one space due to an undefined relation type being used in a rule. I've added a new warning type to the StaticValidator in GGP-Base to prevent this kind of error from happening again. The problem has been fixed, and the fixed version of the game is now in rotation on Tiltyard. (This changes the performance statistics.)

Sancho gets a blog

2014-05-13T22:19:00.001-07:00

Steve Draper and Andrew Rose have put together a blog describing their work on their (very good) general game player, Sancho. Sancho is the successor to Quixote, the winner of the first MOOC iteration of Stanford's General Game Playing class.

I fully expect their blog will update more frequently than mine.

The Annotated Tic-Tac-Toe

2014-04-02T23:11:00.002-07:00

Tic-tac-toe is a classic and widely-understood game. Introductions to GDL typically use it as a sample "first game", as the rules are fairly simple and usually already known by the reader. However, the GDL representation can still be foreign enough to be difficult to understand. This post is my attempt to flesh it out with explanations of what each component of the rules is doing. This isn't my first attempt at an introduction to GDL, but different people will find different approaches better.

There are a number of different GDL representations of Tic-Tac-Toe out in the wild. For this exercise, I've chosen the one that's seen the most use lately, Tiltyard's Tic-Tac-Toe. I've taken some liberties with rearranging the rules and improving the formatting, but the rules themselves are untouched. GDL keywords and language features (aside from the <= operator) are in bold. Enjoy!

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; Tictactoe
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; Roles
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

; There are two roles to be assigned to players in tic-tac-toe: one that plays
; X marks and one that plays O marks. The ordering that we list these roles is
; important to the GDL communication protocol, but has no impact on the rules
; of the game; turn ordering, for example, is defined later. As a game author,
; we could swap these two statements to no effect. 

(role xplayer)
(role oplayer)

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; Initial State
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

; In the initial state of the game, each cell of the game board is blank, and
; xplayer is about to move. (Note that "cell" and "control" don't have intrinsic
; meaning in GDL; their meaning comes from how they affect the game in rules
; below.)

(init (cell 1 1 b))
(init (cell 1 2 b))
(init (cell 1 3 b))
(init (cell 2 1 b))
(init (cell 2 2 b))
(init (cell 2 3 b))
(init (cell 3 1 b))
(init (cell 3 2 b))
(init (cell 3 3 b))
(init (control xplayer))

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; Dynamic Components
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

; If the cell (x, y) is blank, and it's a player's turn, then that player may
; play the move "(mark x y)", which represents drawing their mark in the cell
; (x, y). x and y are variables, so actual examples of moves will look like
; (mark 1 1), (mark 2 3), and so on.

(<= (legal ?w (mark ?x ?y))
    (true (cell ?x ?y b))
    (true (control ?w)))

; If it's oplayer's turn, then xplayer can make a noop move. (Noop comes from the
; phrase "no operation" from assembly language, so a noop move does nothing. Games
; are required to give each player at least one possible move each turn, so players
; with nothing to do are given noop moves.)

(<= (legal xplayer noop)
    (true (control oplayer)))

; If it's xplayer's turn, then oplayer can make a noop move.

(<= (legal oplayer noop)
    (true (control xplayer)))

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

;; Cell

; If xplayer takes the action (mark m n), and the cell (m, n) is blank this turn,
; then next turn the cell (m, n) will contain an X. 

(<= (next (cell ?m ?n x))
    (does xplayer (mark ?m ?n))
    (true (cell ?m ?n b)))

; If oplayer takes the action (mark m n), and the cell (m, n) is blank this turn,
; then next turn the cell (m, n) will contain an O.

(<= (next (cell ?m ?n o))
    (does oplayer (mark ?m ?n))
    (true (cell ?m ?n b)))

; If the cell (m, n) has a mark w in it this turn, and the mark is not "blank",
; the same mark will still be in cell (m, n) next turn.

(<= (next (cell ?m ?n ?w))
    (true (cell ?m ?n ?w))
    (distinct ?w b))

; If either player takes the action (mark j k) to mark the cell (j, k), and the
; cell (m, n) is blank this turn, and the two cells are different (either m != j
; or n != k), then next turn (m, n) will still be blank.

(<= (next (cell ?m ?n b))
    (does ?w (mark ?j ?k))
    (true (cell ?m ?n b))
    (or (distinct ?m ?j)
        (distinct ?n ?k)))

; If this turn oplayer was in control, then next turn xplayer will be in control.

(<= (next (control xplayer))
    (true (control oplayer)))

; If this turn xplayer was in control, then next turn oplayer will be in control.

(<= (next (control oplayer))
    (true (control xplayer)))


; If the cells (m, 1), (m, 2), and (m, 3) all contain the same mark, then we say
; there's a row of that mark at row m.

(<= (row ?m ?x)
    (true (cell ?m 1 ?x))
    (true (cell ?m 2 ?x))
    (true (cell ?m 3 ?x)))

; If the cells (1, n), (2, n), and (3, n) all contain the same mark, then we say
; there's a column of that mark at column n.

(<= (column ?n ?x)
    (true (cell 1 ?n ?x))
    (true (cell 2 ?n ?x))
    (true (cell 3 ?n ?x)))

; If the cells (1, 1), (2, 2), and (3, 3) all contain the same mark, then we say
; there's a diagonal with that mark.

(<= (diagonal ?x)
    (true (cell 1 1 ?x))
    (true (cell 2 2 ?x))
    (true (cell 3 3 ?x)))

; If the cells (1, 3), (2, 2), and (3, 1) all contain the same mark, then we say
; there's a diagonal with that mark.

(<= (diagonal ?x)
    (true (cell 1 3 ?x))
    (true (cell 2 2 ?x))
    (true (cell 3 1 ?x)))


; If there's a row, column, or diagonal with a mark, then we say there's a line
; with that mark.

(<= (line ?x)
    (row ?m ?x))
(<= (line ?x)
    (column ?m ?x))
(<= (line ?x)
    (diagonal ?x))


; If there's some cell (m, n) on the board that's blank this turn, then we say
; the board is open.

(<= open
    (true (cell ?m ?n b)))

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

; If there's a line of X marks, then xplayer gets a score of 100 (win).

(<= (goal xplayer 100)
    (line x))

; If the board is not open and there is neither a line of Xs nor a line of Os,
; then xplayer gets a score of 50 (draw).

(<= (goal xplayer 50)
    (not (line x))
    (not (line o))
    (not open))

; If there's a line of O marks, then xplayer gets a score of 0 (loss).

(<= (goal xplayer 0)
    (line o))

; If there's a line of O marks, then oplayer gets a score of 100 (win).

(<= (goal oplayer 100)
    (line o))

; If the board is not open and there is neither a line of Xs nor a line of Os,
; then oplayer gets a score of 50 (draw).

(<= (goal oplayer 50)
    (not (line x))
    (not (line o))
    (not open))

; If there's a line of X marks, then oplayer gets a score of 0 (loss).

(<= (goal oplayer 0)
    (line x))

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

; If there's a line of X marks or a line of O marks, the game is over.

(<= terminal
    (line x))
(<= terminal
    (line o))

; If the board is not open, the game is over.

(<= terminal
    (not open))

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;; Base & Input
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

; This is an optional part of the game description listing all the types of things
; that can be part of the state of the game on some turn of the game. These rules
; have no effect on the game rules, but some gamers can take advantage of definitions
; like these.

; Note that we only list things that are part of the game state: things that are
; defined using "init" and "next" and referenced using "true". Things that are
; logical consequences of the state, such as "row", "line", and "open", don't get
; listed here.

; 1, 2, and 3 are possible indices of spaces on the game board. We define "index" here
; as a constant set of possibilities that we can reference elsewhere in the game
; description.

(index 1)
(index 2)
(index 3)

; Here we take advantage of the indices listed above to quickly define a set of true
; statements. (We could have done the same when defining the cells that are initially
; blank.)

; A cell (x, y) can be blank, for any valid index x and any valid index y.

(<= (base (cell ?x ?y b))
    (index ?x)
    (index ?y))

; A cell (x, y) can have an x mark or an o mark, for any valid index x and any valid
; index y.

(<= (base (cell ?x ?y x))
    (index ?x)
    (index ?y))
(<= (base (cell ?x ?y o))
    (index ?x)
    (index ?y))

; For each role, it may be that role's turn.

(<= (base (control ?p))
    (role ?p))

; This is another optional part of the game description. Where "base" relations dealt
; with states of the game, "input" relations deal with possible moves. 

; For any role, it is possible for that role to play (mark x y) for any valid
; index x and any valid index y.

(<= (input ?p (mark ?x ?y))
    (index ?x)
    (index ?y)
    (role ?p))
    
; For any role, it is possible for that role to play the noop move.
    
(<= (input ?p noop)
    (role ?p))

The General Game Playing Forum

2014-04-02T22:04:00.000-07:00

Just a short note and link: There is now a long-term forum for discussion of General Game Playing, born from the discussions out of the GGP Coursera course last fall. (Speaking of which, a new session is starting right now.)

Improving Chinese checkers performance

2013-12-22T17:23:00.000-08:00

This is another post on improving the performance of game simulations by modifying the GDL by hand. Today, we look at chineseCheckers4, a game known for its particularly poor performance, especially for propnets.

I wanted to study this game's performance in a more step-by-step manner than previous posts, so I stopped to gather performance statistics after each step. Each number represents the number of state changes that can be computed per second when running random depth charges over the course of 60 seconds. (The numbers aren't quite as consistent as the number of significant figures implies, so take the exact values with a grain of salt.)

Here are the baseline values for my three rules engines:

Prover: 410
Tuple-based prover: 1100
Propnet: 3800
Propnet size: 282K components, 984K links, 20 seconds

The prover numbers are roughly in line with performance on English Draughts and better than Reversi; of course, this is also a relatively simple game, much simpler than Reversi. The propnet numbers, on the other hand, are much worse, and the generated propnet is enormous. This led me to believe that the game was simply written in a way that leads to poor performance.

One reasonable question here is whether the GDL can be automatically rewritten along the lines of what I'm going to do. As we'll see, that isn't always so easy.

The first thing I tried changing is the following couple of rules for deciding when a piece should be left alone:

(<= (next (cell ?x ?z))
    (true (cell ?x ?z))
    (does ?player (move ?b ?e))
    (distinct ?x ?b)
    (distinct ?x ?e))

(<= (next (cell ?x ?z))
    (true (cell ?x ?z))
    (does yellow noop)
    (does green noop)
    (does blue noop)
    (does magenta noop))

Rules written like the first of this pair are a common cause of bloated propnets. This is because the rule has to be instantiated once for every combination of possible cell, possible role, and possible move. The following is much more propnet-friendly:

(<= (next (cell ?x ?z))
    (true (cell ?x ?z))
    (not (affected ?x)))

(<= (affected ?x)
    (does ?player (move ?x ?z)))
(<= (affected ?z)
    (does ?player (move ?x ?z)))

This version has one set of "affected" nodes that tell whether cells have been affected by the move made (with size linear to the number of possible moves). These are then used to check if a cell should keep its old state (with size linear to the number of cells). No unfortunate cross-products are necessary.

Since this is such a problem, can this kind of substitution be made automatically by a player at runtime? Actually, it would be quite difficult. This substitution is only safe to make because of additional information I have about the game; namely, that only one sentence of the form (does ?player (move ?b ?e)) can be true each turn. If there were two or more such sentences true, then the two sets of rules would behave differently. Automatically extracting and exploiting this kind of "mutual exclusion" information is beyond my capabilities at the moment.

Anyway, what happens to the performance when this substitution is made? (All percentages throughout this post are relative to the original version of the game.)

Prover: 350 (-15%)
Tuple-based prover: 940 (-15%)
Propnet: 6750 (+78%)
Propnet size: 44K components (-84%), 279K links (-72%), 8 seconds (-60%)

As expected, propnet size has drastically decreased, and propnet performance has also improved. (The propnet rules engine being used is differential, so it can skip over unchanged sections of the propnet; otherwise, the speed improvement would be even more dramatic.) However, the prover performance has actually gotten worse. What gives?

The prover implementation in GGP-Base treats sentences in rules as queries, and in particular, will fill in variables in the sentences if their values are already known. In addition, within a single query, it will cache the results of subqueries and reuse them if it needs to repeat them. Let's take a look at that first version of the rule again:

(<= (next (cell ?x ?z))
    (true (cell ?x ?z))
    (does ?player (move ?b ?e))
    (distinct ?x ?b)
    (distinct ?x ?e))

For the first conjunct (true (cell ?x ?z)), the prover will find all the available values and then, for each one, ask (does ?player (move ?b ?e)), finding the results in the cache. Then it will confirm whether the ?x value is distinct from the cells in the single move found, a quick comparison. As for the other version:

(<= (next (cell ?x ?z))
    (true (cell ?x ?z))
    (not (affected ?x)))

(<= (affected ?x)
    (does ?player (move ?x ?z)))
(<= (affected ?z)
    (does ?player (move ?x ?z)))

For each cell, it asks for the affected-ness of a particular cell, checking each one individually. A query for (affected c3) will be treated differently from a query for (affected c2), so we never find answers in the cache. Instead, we check two separate rules for affected-ness, with additional function call overhead and possibly checks against noop moves.

Okay, enough about how the prover works. What's the next thing to try?

One other inefficiency I noticed is how blank cells are explicitly represented. This generally isn't necessary. Starting over again from the original set of rules (to isolate their effect), I removed the init and base propositions associated with them, and then had to make some subtle changes to how pieces in a bucket are counted to avoid unsafe rules. That led to the following performance:

Prover: 540 (+32%)
Tuple-based prover: 1450 (+32%)
Propnet: 4400 (+16%)
Propnet size: 244K components (-13%), 870K links (-12%), 13 seconds (-35%)

It's worth noting that in addition to the minor performance improvement, this version of the game uses less state to represent each turn of the game, which means MachineStates will use less memory.

Combining both of the above approaches gives us:

Prover: 525 (+28%)
Tuple-based prover: 1410 (+28%)
Propnet: 7000 (+84%)
Propnet size: 40K components (-86%), 262K links (-73%), 7 seconds (-65%)

Note that the negative impact on the prover's performance isn't as large with the blank cells removed. This would affect it in a couple of ways: firstly, we get to skip all the blank cells when deciding which values to preserve; secondly, we get to drop one of the affected rules, since we no longer have to worry about the cell that a piece is moving into.

The next thing I noticed was that the base and input rules were incorrect, including many states and moves that were impossible. This wouldn't affect the provers, and my propnet rules engine uses a post-optimization that fixes this anyway, but it has an impact on the size and speed of the initial propnet construction. With fixed bases and inputs:

Propnet size: 22K components (-92%), 150K links (-85%), 5 seconds (-75%)

Finally, from looking at the output from the verbose mode of the OptimizingPropNetFactory, I discovered that it was spending most of its time generating rules for when players could make noop rules, as defined by the following rule:

(<= (legal ?player noop)
    (true (control ?player))
    (true (cell ?b1 ?player))
    (stuck ?b1 ?player)
    (true (cell ?b2 ?player))
    (distinct ?b1 ?b2)
    (stuck ?b2 ?player)
    (true (cell ?b3 ?player))
    (distinct ?b2 ?b3)
    (distinct ?b3 ?b1)
    (stuck ?b3 ?player))

This is a rather roundabout way of saying that a player should be able to make a noop move if they can't make any other moves. Specifically, this rule says they can make such a move if they have three pieces and all of them are stuck (have no legal moves associated with them). In the propnet, this rule needs to be instantiated for every combination of three cells on the board.

The much simpler way to express this is by asking if the player has any legal moves at all, directly:

(<= (legal ?player noop)
    (true (control ?player))
    (not (hasAnyLegalMove ?player)))

(<= (hasAnyLegalMove ?player)
    (movable ?player ?b ?e))

This turns out to be much more efficient across the board. (Note that like before, this is only a valid substitution because we have extra knowledge of the game. If a player could have more or fewer than three pieces at once, the two sets of rules wouldn't be equivalent.) Adding this on top of all our other improvements:

Prover: 700 (+71%)
Tuple-based prover: 1800 (+64%)
Propnet: 90000 (+2300%)
Propnet size: 3152 components (-99%), 5554 links (-99.4%), 2 seconds (-90%)

I'll work on getting these changes applied to all the Chinese checkers games in the Tiltyard Base repository.

Arithmetic in GDL

2013-10-17T20:43:00.000-07:00

As a rather minimalist language, GDL doesn't have any native support for arithmetic. This hasn't stopped games from making use of arithmetic as needed.

In practice, arithmetic is built up in games from a successor function. Everything in GDL has to work on a limited range of values (so we don't end up with an infinite number of true sentences), so the successor function defines the range our arithmetic will work over. It's usually given the name succ (and in practice, is often copied from game to game). It looks like this:

(succ 0 1)
(succ 1 2)
(succ 2 3)
; etc.

Once we have our successor function defined, we can use recursive rules to build up the other arithmetic functions we need, in much the same way they're built up in set theory or analysis courses. Most of these have already been created for various games.

From Reversi, we get a lessThan function:

(<= (lessThan ?x ?y)
    (succ ?x ?y))
(<= (lessThan ?x ?z)
    (succ ?x ?y)
    (lessThan ?y ?z))

And from ad_game_2x2, we get a summation function:

(<= (add ?x 0 ?x)
 (number ?x))
(<= (add ?x ?y ?z)
 (succ ?x ?x1)
 (succ ?y1 ?y)
 (add ?x1 ?y1 ?z))
; number can look like this:
(number 0)
(<= (number ?y)
    (succ ?x ?y))

I don't know of any games that currently use multiplication, but it's not much different from summation:

(<= (mult ?x 0 0)
    (number ?x)
(<= (mult ?x ?y ?z)
    (succ ?ym1 ?y) ; read as "y minus 1"
    (add ?x ?zmx ?z) ; read as "z minus x"
    (mult ?x ?ym1 ?zmx))

Note that these are generally constant throughout a game, so a good rules engine should be able to precompute these values once and avoid the costs associated with recursion. GGP-Base has pre-built functionality that works particularly well with recursive arithmetic rules like these: see the ConstantCheckerFactory.

The one place where arithmetic gets complicated is counting. There is no keyword to find the number of currently true sentences that match a given template, such as the number of one player's pieces on the board. It's easy to check if this number is 0 or not, but for any other comparison it gets harder. (This is the one place where I might advocate adding functionality to GDL to make arithmetic easier. It's an open question how this could be added while keeping recursion in check.)

The first approach that can be used for counting a turn-dependent value is keeping track of the value each turn and modifying it according to the expected change in value each turn. Here's an example from checkers-newgoals (adding some comments to explain):

; Each player starts with twelve pieces. 
(init (piece_count red c12))
(init (piece_count black c12))

; The active player's piece count doesn't change. 
; (This could have been written more simply by
; checking against (true (control ?player)).) 

(<= (next (piece_count ?player ?n))
(or (does ?player (move ?p ?u ?v ?x ?y))
(does ?player (doublejump ?p ?u ?v ?x3 ?y3 ?x ?y))
(does ?player (triplejump ?p ?u ?v ?x3 ?y3 ?x4 ?y4 ?x ?y)))
(true (piece_count ?player ?n)))

; If black made a non-capture move, red's piece count stays the same.
(<= (next (piece_count red ?n))
(does black (move ?p ?x1 ?y1 ?x2 ?y2))
(kingmove black ?x1 ?y1 ?x2 ?y2)
(true (piece_count red ?n)))

; If black made a single capture, red's piece count drops by one.
(<= (next (piece_count red ?lower))
(does black (move ?p ?x1 ?y1 ?x2 ?y2))
(single_jump_capture black ?x1 ?y1 ?x ?y ?x2 ?y2)
(true (piece_count red ?higher))
(minus1 ?higher ?lower))

; If black made a double capture, red's piece count drops by two.
(<= (next (piece_count red ?lower))
(does black (doublejump ?p ?u ?v ?x3 ?y3 ?x ?y))
(true (piece_count red ?higher))
(minus2 ?higher ?lower))

; If black made a triple capture, red's piece count drops by three.
(<= (next (piece_count red ?lower))
(does black (triplejump ?p ?u ?v ?x3 ?y3 ?x4 ?y4 ?x ?y))
(true (piece_count red ?higher))
(minus3 ?higher ?lower))

; ... and similarly for black. We could have used variables for the player names with (opponent _ _) sentences to keep this shorter.

The second approach can do the counting within a single turn. The idea here is that we have some ordering over the possible sentences that can be true: for example, if we want to count the pieces of a certain color on a board, we order all the spaces on the board. Then, we write rules for a class of sentences that basically says, for each possible n, "Of the sentences 0 through n, m of them are true." It's easy to tell for sentence 0, and given the value for n - 1, it's easy to tell the value for n. Then we can just ask for the value once all have been determined.

I used this approach in Reversi, partly due to the fact that many pieces can change colors in a given turn, so the amount to change values in a given turn is already hard to determine. Here's how it looked there (with some extra comments):

; The count starts out at either 0 or 1 at space (1, 1), depending on if it's occupied by our color.
(<= (pieceCount 1 1 ?color 1)
    (true (cell 1 1 ?color)))
(<= (pieceCount 1 1 ?color 0)
    (role ?color)
    (not (true (cell 1 1 ?color))))

; Add one more space along the current column. Add one to n if it's occupied by our color.
(<= (pieceCount ?x ?yp1 ?color ?np1)
    (add ?y p1 ?yp1)
    (pieceCount ?x ?y ?color ?n)
    (succ ?n ?np1)
    (true (cell ?x ?yp1 ?color)))
(<= (pieceCount ?x ?yp1 ?color ?n)
    (add ?y p1 ?yp1)
    (pieceCount ?x ?y ?color ?n)
    (not (true (cell ?x ?yp1 ?color))))

; When we get to the end of a column (y = 8), go to the first space in the next column.
(<= (pieceCount ?xp1 1 ?color ?np1)
    (add ?x p1 ?xp1)
    (pieceCount ?x 8 ?color ?n)
    (succ ?n ?np1)
    (true (cell ?xp1 1 ?color)))
(<= (pieceCount ?xp1 1 ?color ?n)
    (add ?x p1 ?xp1)
    (pieceCount ?x 8 ?color ?n)
    (not (true (cell ?xp1 1 ?color)))) 

; The total count is the value at the last space, (8, 8). 
(<= (totalCount ?player ?n)
    (pieceCount 8 8 ?player ?n))

The Dresden server, plus rewriting Othello

2013-05-04T16:02:00.002-07:00

I recently found that the Dresden server is up and capable of hosting round-robin matches again. (It doesn't have a hostname at the moment, and I don't want to link directly to an IP address, so reach it via the first link on this page.)

The Dresden server is different from Tiltyard in that it has a wider variety of games, somewhat less curated than Tiltyard's selection in terms of the GDL and ease of processing the game descriptions. Unlike Tiltyard, Dresden will have gamers play pretty much every game it has.

I've found it to be a good stress test in terms of making sure that a gamer won't crash under a wide variety of conditions. One game in particular, laikLee_hex, has a tendency to crash not only my propnet generation code, but my simplest theorem-prover-based rules engines as well. Ultimately, I added a feature to blacklist the game and prevent Alloy from trying to play it at all!

I put Alloy back on the site with its new cross-process framework in a new configuration, and I'm already seeing some stability issues worth figuring out. A few of its crashes so far have involved the two remaining versions of Othello on the site. It looks like most of the issues boil down to: 1) the propnet for the game is very large; and 2) the state computations are very slow.

There are no doubt solutions I can find to keep my player more responsive and improve, but given the success of rewriting checkers, I thought I'd give Othello another go. The new version is called Reversi.

Here are the performance gains this time, in the same format as last time (average state changes per second, during 60 seconds of random playthroughs):

Rules engine	othelloSuicide	reversi
Prover	1	10
Tuple-based prover	14	71
Propnet	320	10,500

Note that, like last time, the choice of rules engine matters more than the choice of game description. Even so, we see a huge increase in performance with the new GDL, even moreso than last time: the speedup is between 5 and 30. (Propnet generation now takes 5 seconds instead of 2 minutes, and the result is about 1/35th the size; this is mainly due to the piece count computation dealing with one player at a time instead of both.)

I believe forward-chaining rules engines have a large advantage on Othello because of the heavy use of recursion. When a player plays a tile, the engine has to compute every tile that must be flipped as a consequence. In a forward-chaining rules engine, this computation can start at the tile that was played and expand outward, only covering the relevant tiles. In a backwards-chaining rules engine, we instead have to start at each tile on the board and ask, "Can I get from this tile to the tile that was played?" This is a recursive question that will often require looking at multiple tiles. (We could speed this up by checking immediately if the tile and the played tile are on the same line or diagonal, but only if the prover precomputes all constants, which the ProverStateMachine does not.)

Rewriting checkers

2013-04-21T00:19:00.002-07:00

I had a suspicion a while back that some GDL games were poorly written in terms of performance, and could be improved upon. Recently, I was looking over the checkers rules and decided they would be a good test case for this treatment.

For example, to figure out when a piece should stay in place undisturbed, we have the following fairly complicated rule, where single_jump_capture is itself substantial and dependent on the current state of the board:

(<= (next (cell ?x ?y ?p))
    (does ?player (move ?piece ?x1 ?y1 ?x2 ?y2))
    (true (cell ?x ?y ?p))
    (not (single_jump_capture ?player ?x1 ?y1 ?x ?y ?x2 ?y2))
    (different_cells ?x ?y ?x1 ?y1)
    (different_cells ?x ?y ?x2 ?y2))

Checkers was a simple enough case for me to try rewriting it in a fairly short time (unlike chess). I had already considered rewriting it to deal with the messiness of how it implements triple jumps. Dealing with sentences with upwards of eight variables can be difficult for a rules engine to deal with, and the old rules don't allow for more than three jumps in a row. The new approach deals with chained jumps over multiple turns: if a player makes a jump and can make additional jumps with the same piece, they retain control and can make those jumps.

I used the rules from the World Checkers Draughts Federation to decide how to handle edge cases. For example, if a pawn is kinged during a capture, it can't continue a capture chain, even if it's available. The ending conditions for the game have also changed: the game now only ends in a draw after 20 turns with no pawn moves or captures (loosely resembling the real rules). When it is a draw, there is no advantage for the player with more pieces; victories must be complete.

Here's a link to the new set of game rules. To avoid confusion with the old rulesheets and keep a more international perspective, I've called this version of the game English draughts.

Here are the performance results. The first rules engine is the out-of-the-box ProverStateMachine from GGP-Base. The second is a modified, heavily optimized version of the ProverStateMachine. The third is the propnet-based rules engine that Alloy uses. Each was run for 60 seconds on random depth charges through the games in question. The unit is average number of state changes per second:

Rules engine	checkers	checkers-mustjump	englishDraughts
Prover	35	32	190
Tuple-based prover	310	280	1100
Propnet	9300	8500	22000

Note that the prover-based and propnet-based rules engines here chain their logic in different directions, so it's not obvious that an improvement for one would be an improvement for the other. In this case, there are substantial improvements for both kinds, roughly between 2.5x and 5x.

(The generated propnet is also smaller and takes less time to build than the propnets for the other two games, even though I neglected to add base and input rules.)

What techniques and principles led to performance improvements? Without A/B testing of one feature at a time, I don't have an evidence-based answer. However, any of the following may have helped:

Track only which spaces have pieces in them, not which spaces are blank
Don't have rules involving a large number of variables at once (thanks to eliminating the double and triple jumps)
Stop using a turn-dependent "single_jump_capture" relation all over the place; use constants and minimize state checking
Generally, keep the way the prover works in mind when writing rules, to make sure it short-circuits quickly and only goes down a minimal number of paths
Don't keep track of the number of pieces on the board (thanks to the new rules for ending the game)

I suspect that chess also has opportunities for performance improvements of this sort. I tried to rewrite it a few years back, which went poorly, but now I know more and I hope to give it another go at some point.

A quick guide to GDL terminology

2013-02-25T23:21:00.002-08:00

I'm writing another post which is pretty heavy on the GDL terminology. I may not have the time to make every such post readable for someone who doesn't know a sentence from a term. Hopefully, this will also be of use to anyone trying to sort out the Gdl classes in GGP-Base.

Sentence: A sentence is something that can be true or false on a given turn of a game. It is made up of a name (which is a constant) and (optionally) a body consisting of terms. In GGP-Base, a sentence with no body is called a proposition, while a sentence with a body is called a relation. (Note that this is different from the sense of "proposition" used in propositional network, which is equivalent to a sentence in this terminology.)

Term: A term is a constant, variable, or function. It's what fills one "slot" in a sentence body.

Constant: A constant is a "word" of the language for a particular game, or one of the GDL keywords. If it's a single word without parentheses (and not a variable) it's a constant.

Variable: A "word" in a sentence starting with ?. It can represent a constant or a function.

Function: A complex term that contains other terms. It has a name (which is a constant) and a body consisting of other terms, much like a sentence. Unlike a sentence, it does not have a truth value.

Arity: The number of terms in a sentence body or function body. This should be constant across every use of a sentence name or function name.

Rule: A rule in a GDL description starts with the <= operator, follows it with a sentence (whatever is implied by the rule), and then has a body consisting of literals.

Literal: A literal is a condition for a rule. This can be a sentence; a negated sentence; a declaration that two terms are distinct; or a disjunction of other literals. These last three get their own types in GGP-Base: GdlNot, GdlDistinct, and GdlOr, respectively.

And a few examples:

terminal can be either a constant/term or a sentence/proposition, depending on the context. The constant terminal is the name of the sentence terminal.

(init (cell 1 1 b)) is a sentence/relation. Its name is init. Its arity is one, because the only term in its body is the function (cell 1 1 b). This function in turn has arity three, and its body consists of the constants 1, 1 again, and b.

(true (cell 1 1 b)) is also a sentence/relation. Its name is true. Note that (cell 1 1 b) is still just a function, even though it looks like we're talking about its truth value.

(line ?player) could be either a function/term or a sentence/relation, depending on the context and the game description it appears in. If it appears on its own in the head or tail of a rule, it's a sentence. If it is part of a larger sentence, such as (next (line ?player)), then it's a function instead. Whichever it is, it should be consistently one or the other throughout the game description.

(distinct ?x king) is a literal/distinct, and its two terms are a variable and a constant, respectively. Note that it doesn't get to count as a sentence; instead, it's considered just a language construct.

(Edited on 8/25/13: Added an example of an ambiguous sentence/function.)

What is a propnet?

2013-02-17T20:33:00.001-08:00

Propositional networks, or propnets, are an alternative way of expressing the rules of a game, covered in Stanford's CS 227B course.

A propnet is a graph containing a node for every sentence that can be true according to the game rules. A propnet for tic-tac-toe would contain a node for each of (cell 1 3 x)*, (line o), and (legal xplayer (mark 2 2)). (There are no variables; the rules have been "flattened" into a form dealing only with fully grounded sentences.)

In addition to the proposition nodes, propnets contain nodes representing logic gates. Three types of logic gates suffice: AND gates, OR gates, and NOT gates. The graph is directed, so these gates have inputs and outputs. A gate takes on the value of For example, (row x), (col x), and (diag x) are all inputs to an OR gate, which has (line x) as an output. All the rules of the game are expressed through these logic gates.

In addition, we have transition nodes, which act like flip-flops on a circuit board: they collect outputs from one state that become inputs in the next state. For example, there is a transition node between (next (cell 1 1 o)) and (cell 1 1 o).

If we want to figure out what's going on in a particular state, first we set the base propositions to the appropriate values, then we go through all the other nodes in the graph and set them according to their input values. Then we can go ahead and query the terminal, legal, and goal nodes.

The main use of propnets is to build a rules engine that can be To spread a bit of the conventional wisdom from the aforementioned Stanford class: The players that did best in the class competition tended to involve generating a propnet, then turning it into code which would then be compiled and used as the rules engine.

The propnet-based state machine I originally wrote for the class was roughly as efficient as the ProverStateMachine from GGP-Base, and often slower. (The ProverStateMachine is rather slow relative to Prolog-based rules engines.) When I wrote a new one for Alloy, I knew what I was doing, so the result is in most cases 20 to 100 times faster. When I did an experiment with Java-based compilation, I got similar speedups even though the generated code was much less polished. Sam Schreiber tells me that TurboTurtle's state machine is even faster, by a significant margin.

The main problem with using propnets is building them. This can take a long time and use a lot of memory. In many cases, propnet generation will fail, leaving the gamer to rely on a backup rules engine (assuming it can recover successfully from running out of memory).

Efficient propnet generation can be a difficult problem. I spent a chunk of the summer of 2010 writing the OptimizingPropNetFactory, which will get its own post(s) someday. It can create smaller propnets for a wider variety of games than the original PropNetFactory (both are in GGP-Base), but still runs out of memory on large games like chess or amazons**. There's still room for improvement, but new propnet generation code should leverage what's already been tried and tested instead of starting from scratch. Unfortunately, the OptimizingPropNetFactory code as it stands is poorly factored and difficult to read, and it may be a while before I can adapt it into something more extensible.

Along with using propnets for faster rules engines, I've also experimented with using propnets to reason about games and found it a bit easier to think about than dealing with GDL directly or through a theorem prover. This has included making some reusable tools for reasoning: for example, when a certain set of nodes have certain values, what are the easy-to-find consequences of that (perhaps across multiple turns)? With that tool in place, it's easy to write code to, for example, detect latches (nodes that, once true, are true for the rest of the game). Some of this reasoning has made its way into Alloy.

In short, propnets are a powerful tool for fast rules engines and reasoning about games, but they won't work for every game.

* In the current "standard" for propnet generation, base propositions omit the "true" that normally identifies them. I'm hoping to change this to remove some confusion between sentences and functions. You can do this yourself by commenting out the call to normalizePropositions() in OptimizingPropNetFactory.create().

** The reason it fails on amazons is because of the huge number of moves that are possible: there are 100 squares, and a move specifies three of them (square moved from, square moved to, square that an arrow gets fired at). I believe that if a version of amazons separated the movement and arrow-shooting into separate turns, building a propnet would

Writing a game with GDL: Sim

2013-01-07T22:02:00.001-08:00

I described the Game Description Language in an earlier post, but didn't show an example of a complete game description in practice. Today, I'll write a new game and record the process I use to make sure it's correct.

The game I'll write today is Sim, a simple graph coloring game I found on Wikipedia.
(Its list of abstract strategy games is an excellent source of new games for GGP.)

The first step is to figure out the exact rules of the game as you'd like to write it: figure out if any house rules should be applied, and how to resolve any ambiguities with scoring and the end of the game. (For example, certain games end when both players agree that no more useful moves can be made; this isn't suitable for a GDL game that is required to be finite. Such a game would probably need a step counter to end the game after a set number of turns in its GDL incarnation.)

Fortunately, the rules of Sim are straightforward, and it automatically ends after a certain number of turns, with a guaranteed winner. Here are the game rules, as explained in the Wikipedia entry:

The game of Sim is played by two players on a board consisting of six dots ('vertices'). Each dot is connected to every other dot by a line.

Two players take turns coloring any uncolored lines. One player colors in one color, and the other colors in another color, with each player trying to avoid the creation of a triangle made solely of their color; the player who completes such a triangle loses immediately .

Ramsey theory shows that no game of Sim can end in a tie.

With a clear set of rules, we have to go through a certain number of steps to turn it into a working .kif file.

1. Write a header

The first step I take when creating a .kif file is to write a header explaining what the game is, where to find the game rules, and who the author is. Anything after a semicolon is ignored in a .kif file, so we can include whatever human-readable text we want.

; Sim
; by Alex
; Rules of Sim: http://en.wikipedia.org/wiki/Sim_(pencil_game)

2. Add the roles

The next step is to figure out the roles for the players participating in the game. These are generally at the beginning of the game description. The Sim rules don't specify the colors, so I'll choose a couple arbitrarily.

(role red)
(role blue)

3. Figure out how game state will be expressed

Now it's time to get into the heart of the game. This game state has to include every piece of information that can distinguish between one state of the game and another, but it should also be as minimal as possible in most cases.

(Aside: As an example of why minimalism is good, the currently widespread chess.kif file expresses check as part of the game state, even though it can be computed from the rest of the game state. This means it gets computed on the previous turn while looking ahead to the state on the next turn. This approach left room for a bug in the game: if a player achieves checkmate due to a pawn promotion, the game fails to notice the check and treats the result as a stalemate.)

(Extra aside: When can minimalism be bad? As it turns out, counting and summation are difficult to get right in GDL, and sometimes it may be easier to maintain a running count throughout the game than to calculate the sum all at once, even if all the information is there.)

In the case of Sim, there are only two things we need to keep track of: which player's turn it is and what lines have been colored. Even so, there is a choice to make: Do we explicitly or implicitly keep track of which lines are uncolored?

Explicitly keeping track would mean that we start with true sentences like (line blank 1 3), which remain true until replaced by a sentence representing a color like (line red 1 3). The alternative would be to start with no line state and just add the colors as we go. This makes it a little harder to compute which moves are available, but also results in some tricky rules to determine whether a blank line should stay blank. I find the latter approach to be simpler overall, so let's stick with that.

4. Write the initial state

This gives us enough information to at least figure out the initial state of the game. Choosing to not include sentences for uncolored lines makes this easy. We'll let red be the first player.

(init (control red))

5. Figure out what moves players can make

Our players are going to be coloring lines on each of their turns. A simple way to identify a line is by its two endpoints (with the points numbered 1 through 6). We'll know what color to use based on which player's turn it is, so moves can look like (color 2 5).

However, there's something that could go wrong with the scheme I just suggested. As it stands, a player could make either the move (color 2 5) or (color 5 2). It would be much better to have a unique identifier for any given line (and lead to less confusion down the road). We'll add the requirement that the first number be less than the second number when referring to a line.

There's one more thing: remember, in GDL, every player must have an available move on every turn. That means the non-active player needs to have a move available to make, even if it does nothing. There is a standard name used for this type of move: noop, short for "no operation", a term borrowed from assembly programming.

6. Write the move legality rules

Now that we have some notion of the game state and the possible moves, we can start writing the rules that express how they interact with each other. The first piece to figure out: Given a state of the game, which moves are legal for each player?

Here we have to deal with the problem mentioned before of determining whether a line has already been colored. Your first instinct might be to write a rule that looks like this:

(<= (legal ?player (color ?i ?j))

(true (control ?player)) ; It must be the player's turn

(lt ?i ?j) ; i is less than j
(not (true (line ?anyColor ?i ?j)))) ; The line isn't already colored

There is a problem with this approach: it isn't a valid GDL rule. The StaticValidator from the GGP-Base project will tell us why:

GDL validation error: Unsafe rule [...]: Variable ?anyColor is not defined in a positive relation in the rule's body

We need to define ?anyColor in a non-negated sentence. Now, consider what would happen if we naively added a source for the color as follows:

(<= (legal ?player (color ?i ?j))

(true (control ?player)) ; It must be the player's turn


    (lt ?i ?j) ; i is less than j

    (role ?anyColor) 

    (not (true (line ?anyColor ?i ?j)))) ; The line isn't already colored

The problem here is that the rule will apply for any value of the ?anyColor variable. This means that the active player can color a line unless every player has colored it, certainly not the intent here.

The way to solve this problem (which comes up frequently) is to break the notion of "this line has some color" into its own relation, so that we can reference its negation:

(<= (legal ?player (color ?i ?j))

    (true (control ?player))

    (lt ?i ?j)

    (not (isColored ?i ?j)))

(<= (isColored ?i ?j)

    (true (line ?anyColor ?i ?j)))

Note here that the less-than function provides our sources for the ?i and ?j variables. We would normally need some other relation to define which lines are valid, but in this case the less-than relation is enough.

Similarly, we need to remember to provide a source for the ?player variable in the noop rule:

(<= (legal ?player noop)

    (role ?player)

    (not (true (control ?player))))

7. Write the next-state rules

Now we need to say what the players' moves actually do. Rules defining next sentences will tell us how one state leads to the next. First, let's get the turn-taking rules out of the way:

(<= (next (control red))

    (true (control blue)))

(<= (next (control blue))

    (true (control red)))

We can define the next turn's state in terms of the current one using true sentences. We can also define it in terms of players' moves using does sentences:

(<= (next (line ?player ?i ?j))

    (does ?player (color ?i ?j)))

One last thing: Keep in mind that we need to explicitly say that colored lines stay colored. The only way to keep parts of the game state persistent is to make them so with explicit rules.

(<= (next (line ?player ?i ?j))

    (true (line ?player ?i ?j)))

8. Write the terminal and goal rules

The rules saying when the game ends and what the score is for each player are often closely connected, so I deal with them in a single logical step.

In this case, much like tic-tac-toe, we have a single possible feature of the game state that leads to the end of the game and a player's victory. (Unlike tic-tac-toe, we don't have to deal with the possibility of a closed-board draw at the end of the game; so much the better.) That's a good sign that it should have its own relation:

(<= (triangle ?player)

    (true (line ?player ?a ?b))

    (true (line ?player ?b ?c))

    (true (line ?player ?a ?c)))

Note the advantage that we gain here from keeping our line endpoints in a consistent ordering. Also note that we don't bother to include the location of the triangle in the result: what matters is who made the triangle, not where it is. This makes the remaining rules for ending the game easy:

(<= terminal

    (triangle ?player))



(<= (goal ?player 0)

    (triangle ?player))

(<= (goal ?player 100)

    (triangle ?opponent)

    (role ?player)

    (distinct ?player ?opponent))

9. Write any missing constants

We still haven't defined the less-than relation. Someday I'll write a post on doing arithmetic in GDL, but for now we use some arithmetic functions (successor and less-than) grabbed from other games.

(succ 1 2)

(succ 2 3)

(succ 3 4)

(succ 4 5)

(succ 5 6)

(<= (lt ?a ?b)

    (succ ?a ?b))

(<= (lt ?a ?c)

    (succ ?a ?b)

    (lt ?b ?c))

10. (Optional) Write base and input rules

Writing base and input rules can give players more information about the possible states of the game. This tends to be more relevant when the variety of sentences in the game state and legal moves is very large, so they are less useful here; but we can add them anyway for extra style points.

Notice that the base and input sentences are constants, so they must be defined in terms of constants.

(<= (base (control ?player))

    (role ?player))

(<= (base (line ?player ?i ?j))

    (role ?player) 

    (lt ?i ?j))

(<= (input ?player noop)

    (role ?player))

(<= (input ?player (color ?i ?j))

    (role ?player)

    (lt ?i ?j))

11. Test the game with validation tools

Now that we have finished writing the game, we must test it if we want to be confident that it is correct. I mentioned the StaticValidator tool before; I wrote it a few years ago as a way of catching the fiddly little syntax errors that tend to work their way into games. (The original paper describing GDL had a syntax error in its sample implementation of tic-tac-toe.) Theorem-prover-based rules engines will tend to handle those cases "correctly", but anything dealing with GDL in a more complicated way could choke on the errors.

Our version of Sim passes with flying colors, but we're not done yet:

12. Think through semantic error cases

Although static validation can catch syntax errors, there are a variety of other errors that can't be caught through static analysis:

Is the game always guaranteed to terminate? (If the game state can get stuck in a loop, the answer is "no".)
Does every player have at least one move in every non-terminal state?
In every terminal state, does each player have one and only one goal value?

The first step to catching these errors is to carefully think through each of them. Hopefully you made sure that the answer is "yes" in every case, but it's always possible that your interpretation of the rules changed over the course of writing the game and opened up some room for error.

To eliminate the possibility of further errors, we need to take one more step:

13. Test the game with real gamers

I don't consider my testing of a game to be complete until I've had my gamer play it several times. This lets me look through some sample games and verify that the game behaves the way I expected. If your gamer provides error messages when reaching a violation of one of the semantic errors above, it can also search a wide swath of the game tree for errors.

There are some other valuable things to learn about a game by having a good gamer try it out. For example, it's difficult to know whether a game heavily advantages one player or the other without having a reasonably strong computer player play it against itself. (Sim is supposed to favor the second player, and my testing confirms this.) It's also a test of whether the game rules can be interpreted efficiently enough for intelligent-looking computer play.

(Remember, if you want to test a corner case -- like checkmating your opponent through pawn promotion -- sometimes the best gamer is... you! The PlayerPanel in GGP-Base includes a Human gamer option.)

I wouldn't send new software to a customer without testing it out myself, and neither would I publish a game without testing it. (Indeed, I found a dumb mistake in the rules I had written that I didn't find with simpler validation checks.)

Here is the end result, all together:

; Sim
; by Alex
; Rules of Sim: http://en.wikipedia.org/wiki/Sim_(pencil_game)

(role red)
(role blue)

(init (control red))

(<= (legal ?player (color ?i ?j))
    (true (control ?player))
    (lt ?i ?j)
    (not (isColored ?i ?j)))

(<= (isColored ?i ?j)
    (true (line ?anyColor ?i ?j)))
    
(<= (legal ?player noop)
    (role ?player)
    (not (true (control ?player))))
    
(<= (next (control red))
    (true (control blue)))
(<= (next (control blue))
    (true (control red)))

(<= (next (line ?player ?i ?j))
    (does ?player (color ?i ?j)))

(<= (next (line ?player ?i ?j))
    (true (line ?player ?i ?j)))

(<= (triangle ?player)

    (true (line ?player ?a ?b))

    (true (line ?player ?b ?c))

    (true (line ?player ?a ?c)))

(<= terminal
    (triangle ?player))

(<= (goal ?player 0)
    (triangle ?player))
(<= (goal ?player 100)
    (triangle ?opponent)
    (role ?player)
    (distinct ?player ?opponent))

(succ 1 2)
(succ 2 3)
(succ 3 4)
(succ 4 5)
(succ 5 6)
(<= (lt ?a ?b)
    (succ ?a ?b))
(<= (lt ?a ?c)
    (succ ?a ?b)
    (lt ?b ?c))

(<= (base (control ?player))
    (role ?player))
(<= (base (line ?player ?i ?j))
    (role ?player)
    (lt ?i ?j))
(<= (input ?player noop)
    (role ?player))
(<= (input ?player (color ?i ?j))
    (role ?player)
    (lt ?i ?j))

The Game Description Language

2012-12-27T12:32:00.000-08:00

The Game Description Language is tasked with being able to describe a wide variety of games: chess, checkers, connect four, the iterated Prisoner's Dilemma, and so on. It requires only that the games have a few particular properties:

The game is turn-based.
The game is finite: a player's choice of moves on any given turn is finite, and the game will always end in a finite number of turns.
The game is perfect-information: On every turn, every player knows all there is to know about the state of the game. In practice, this is the biggest restriction on GDL's expressiveness; consider card games where players' hands are hidden, Battleship, or Minesweeper.
The game is deterministic, with no random elements. In practice, there are workarounds for this.

(A variant of GDL, called GDL-II, eliminates the last two requirements, but that will be left for a later post.)

Given this level of generality, what characteristics do all these games still share? Put another way, what questions will a gamer be able to ask the rules?

Game roles: How many players are there, and what are their names, as used by the rest of the rules?
The initial state of the game: What is the game state at the beginning of the game?
Terminality: At a particular game state, is the game over?
Legal moves: At a particular non-terminal game state, what are the legal moves for each player?
Next state: Given a state and a move for each player, what is the next state of the game?
Goal values: At a given terminal state, did each player win, lose, draw, or place somewhere in between? (In fact, goal values may be any integral value between 0 and 100, inclusive.)

It should be a bit clearer now that the language is capable of describing a wide variety of games.

One lingering question is how game state is described. After all, what works for chess (pieces on a board) won't work for a bidding game (quantities of money and resources). The simple solution is to allow each game to specify whatever state it needs in the form of logical propositions. For example, the fact that there is a white knight on f3 may be expressed by the proposition (cell wn f 3). A player's money count may be expressed as (money player1 17). Each game's propositions need only make sense in the context of the rest of the game rules.

GDL is nothing but a set of propositions and rules about propositions that can answer the questions listed above. In fact, the list of keywords is very nearly the same as my list of questions. Here they are, with sample propositions that might appear in chess:

Role: (role white)
Init: (init (cell wp e 2))
Terminal: terminal
Legal: (legal white (move wp e 2 e 4))
Next: (next (cell wp e 4)) -- note the similarity to init.
Goal: (goal white 100)
True: When computing the terminal, legal, next, or goal propositions, we supply the propositions representing the current state of the game in a form like (true (cell wp e 4)). These can be taken from sets of init or next propositions.
Does: When computing next propositions, we supply the moves of each player using does propositions, for example (does white (move wp e 2 e 4)).

My description thus far has left out one (major) aspect of the language: How do the rules work? How do we get from a set of inputs to a set of outputs?

The short answer is, "like Prolog", because GDL is at its heart a variant of Datalog, itself a variant of Prolog. I assume that this answer will be unsatisfactory to most of my readers, so let's dig in.

A GDL rulesheet consists of propositions and rules. The propositions listed this way will always be true over the course of the game. For example, the number of players can't change midway through a game, so role propositions are always listed in this way. If the game uses any sort of arithmetic, a successor function is generally listed. Many init propositions are listed explicitly (though often, rules with variables are a more concise way to specify them).

Rules are far more expressive. Let's look at an example from tic-tac-toe:

(<= (goal xplayer 50)
    (not (line x))
    (not (line o))
    (not open))

Note that the rule is a list with five components: the operator <=, which indicates that it is a rule; the head of the rule, which is the sentence (goal xplayer 50); and three literals, which make up the body of the rule.

In rules without variables, like this one, the head is true if every literal in the body is true. Because of the rule's use of the not operator, this means that (goal xplayer 50) is true if (line x), (line o), and open are all false. In other words, if there's no place left to play on the board and neither x nor o has formed a line, then xplayer will receive only 50 of 100 points -- a typical way to indicate a draw.

There may also be multiple rules for the same sentence, as for terminal in the same game:

(<= terminal
(line x))

(<= terminal
(line o))

(<= terminal
(not open))

Here, terminal is true if any one of the rules is fulfilled. The game ends if either player makes a line or if there is no open space on the board.

Relations and rules provide a way to specify that a sentence is true, but the logicians out there may be asking: How do we know when a sentence is false? Here, we rely on a rule known as negation as failure in the context of logical systems. It says that a sentence is false if it cannot be shown to be true. (In addition, GDL has some restrictions on negation and recursion (not described here) to prevent contradictory rules along the lines of (<= p (not p)).)

Finally, we have the use of variables in rules and the distinct operator, again from tic-tac-toe:

(<= (next (cell ?m ?n b))
(does ?player (mark ?j ?k))
(true (cell ?m ?n b))
(or (distinct ?m ?j) (distinct ?n ?k)))

First, note the use of the or operator. This is not actually part of the GDL standard as originally written, but is an "unofficial standard" of sorts, used widely across GDL rulesheets. As expected, it is true if any of its contained literals are true.

Next we have the distinct operator. This takes two arguments and is true when the arguments are not the same value.

Finally, we have the use of variables throughout the rule. Variables have names starting with a question mark. For every possible assignment of variables to values, the rule applies. For example, in the case where ?m = 2, ?n = 1, ?player = xplayer, ?j = 2, and ?k = 3:

(<= (next (cell 2 1 b))
(does xplayer (mark 2 3))
(true (cell 2 1 b))
(or (distinct 2 2) (distinct 1 3)))

The general form of the rule says that if any cell is blank ("b") and either player makes a move that differs from the cell in either coordinate, then the cell stays blank. (Note that the rule, as written, relies on the assumption that one and only one player makes a mark each turn.)

Every variable must be defined in a sentence literal in the rule, as opposed to a not or distinct literal. This restriction means that all the values of the variables are grounded, and came from somewhere in the original rules of the game. This keeps the set of possible sentences that can be in the game finite. (There is in fact a minor qualification of that statement, which I may post about at some point in the future.)

There's one last pair of keywords to mention: the base and input keywords, which are optional when constructing a game. (These are also not yet an official part of GDL.) These have no effect on the game itself, but provide some metadata which may be useful to a gamer. They explain, respectively, what sentences could possibly be part of the game state and what moves could possibly be made. They would be structured like the true and legal propositions, respectively: (base (cell wp e 4)) and (input white (move wp e 2 e 4)).

Why is this useful? Take the moves of checkers as an example. How many triple jumps are possible for each player? Well, there are 32 squares from which the piece may start, and each jump could be made in one of four directions (as it may be a king). This gives an upper bound of 32 * 4 * 4 * 4 = 2048 triple jumps. By contrast, a player doing simple static analysis might know only that each of the eight indices listed is one of eight values, and consider there to be 8^8 = 16,777,216 triple jumps! Techniques that rely on keeping all these possibilities in memory (like propositional networks) can greatly benefit from the more accurate specification.

There are a number of additional restrictions on games. The GGP-Base project includes a StaticValidator program that can check most of the potential syntactic errors. More important are the semantic restrictions:

The game always ends in a finite number of moves. As a consequence of this, there cannot be any loops in the game state. Many games include an explicit "step counter" that ends the game after a certain number of moves to meet this requirement.
Every player has at least one legal move in every possible state of the game. When players take turns, there is typically a "noop" move for the non-active player(s) to make.
In every possible terminal state of the game, each player has exactly one goal value specified. (A stronger version of this requirement says that this should also be true in non-terminal states, though this isn't always followed. The GDL paper's stated requirement that goal values only increase over time is no longer useful and can be safely ignored.)
A game should be weakly winnable for each player; that is, each player should be theoretically capable of getting a score of 100, though it might require the other players' cooperation.

Note that these deal with "possible" states of the game: These are the states that are actually possible to reach in the game tree. These restrictions don't apply to arbitrary combinations of game state propositions that are unreachable in practice.

That covers most of the syntax and requirements of the language, but there is still much to cover on how to write and test games in practice. That will be the subject of future posts.

What is GGP?

2012-12-26T19:43:00.000-08:00

This blog is aimed mainly at those who are already somewhat familiar with General Game Playing, but for those who are not, an explanation:

General Game Playing is an artificial intelligence (AI) research problem in which computer programs play games against each other. It differs from other similar research problems in that a program must be able to play any game -- rules are specified when the game starts, much too late for the programmer to intervene. This means that techniques used by the AI must be general, not limited to a single game. The intelligence is in the program, not the programmer.

A GGP match is run by a game server, which contacts the game players. At the beginning of the match, the players receive the rules of the game, their role assignment (e.g. white or black), and how much time they'll have to make their moves. Game rules are specified in the Game Description Language, a highly flexible way of expressing games using mathematical logic. Games may have any number of players; may be competitive, cooperative, or anywhere in between; can have arbitrary notions of game state; and, though they must be turn-based, may have simultaneous actions.

An International General Game Playing Competition is held yearly at major AI conferences. In the past two years, other conferences have hosted additional tournaments, all allowing remote competition. Continuously running game servers (Tiltyard and the Dresden server) also offer the chance for programmers to test their gamers' skills against each other.

Some universities are also offering courses in GGP. Stanford's version, taught by Mike Genesereth, the creator of the GGP project and Game Description Language, will be offered on Coursera in April 2013.