Alloy GGP: Rewriting chess (again)

One of the embarrassing questions about GGP: "So, how good are they at chess?"

Not very good. Not very good at all.

There are several reasons for this.
First, traditional chess programs rely strongly on heuristic-based state evaluation. No one in GGP seems to have cracked the problem of creating and evaluating similarly nuanced heuristics for arbitrary games at runtime, nor is it trivial to integrate heuristic results with UCT. (I believe Fluxplayer does make effective use of heuristics -- I doubt anyone could beat it at Breakthrough Suicide for precisely this reason -- but it seems to have other weaknesses against strong simulation-based players.)

Without the use of heuristics, we run into several factors that hurt players' performance as game complexity increases from tic-tac-toe to connect four to checkers to chess. First, there are more possible moves on a given turn. Second, games take more moves to finish. Third, and more subtly and insidiously, the game rules get more complicated and difficult to process. Not only does a single additional state of the game tell you less and less, it takes longer to compute.

All these reasons, and perhaps more (e.g. bugginess), have led to chess being excluded from the otherwise-comprehensive list of games in the Dresden server lineup. (A variant that leaves out the concept of check, capture_the_king, is included. Fluxplayer, using heuristics, looks like a poor amateur human there -- which is to say, far superior to every other general game player.)

We can't reduce the branching factor of chess or the length of the game (barring a draconian step counter). We can, however, do something about the rules complexity. I realized this a while ago, when I tried to write a version of chess that would be safe for propnets (and, most likely, faster for theorem prover-based implementations as well).

The basic design principle here was that it would be essentially safe to have relations that expressed relationships between two squares because there were only 64 * 64 = 4096 of them, a relatively small number. Relations that were related to three squares would instead have 64 * 64 * 64 = 262144 possibilities, a much scarier number, and one that pretty much doomed any propnet generator to OOM.

The mistake I made was in my decision to give the pieces IDs, rather than reference them primarily by their squares. This was originally an attempt to reduce the number of possible relations of each type, but it backfired.

How many squares are on a chess board? 64.

How many IDs would I need? 32, one for each piece on the board.

Except that each piece is associated with a type in a hard-coded way, and promoting a pawn meant adding a new ID to the board. If a player promoted eight pawns to queens, I had to support that -- meaning eight more IDs per player.

In addition, one change I wanted to make from the original chess game was that it only supported pawns being promoted to queens; there are certainly situations where a knight is needed instead. (Situations calling for bishops or rooks are significantly rarer.)

How many IDs did I end up needing? 64. There was no benefit from using IDs over locations.

This led to a point of failure when I needed a "between" relation saying that one piece was collinear with and between two other pieces. If the pieces were specified by their locations, this actually wouldn't be so bad; for each piece in the middle and each direction, there are at most 3 * 4 possibilities for the locations of the other two pieces. This would give an upper bound of 64 * 8 * 12 = 6144 relations of this type.

Instead, we have IDs, which could be combined pretty much arbitrarily. This gives us the full 262144 possibilities, for each direction. That pretty much ruins the potential of the game description, at least for propnets.

Take two

I figured I would write a replacement for chess in two steps: first, write a "speed chess" version that ignores check and allows direct capture of the king; then, write a full version of chess. (Players' strategies should in theory be roughly the same as in normal chess, though this does depend on players reaching a certain level of play.)

I've finished the first of these versions, and run it through some performance testing relative to other versions of chess. (Caveat: It's still possible there are bugs that will be exposed by additional play.)

Here are some of the design decisions and implementation details for this version:

Piece names are given by two separate constants instead of one: "white pawn" instead of "wp", for example. This eliminates the need for relations that specify "ownerOf" and "pieceType". (This doesn't matter for propnets, but helps provers and overall readability, and may assist in the discovery of heuristics.) An example cell relation is "(true (cell a 8 black rook))".
Empty spaces are not explicitly represented.
Since we don't use piece IDs, we can freely include promotions to any piece type, not just queen or knight. (Players are free to choose incorrectly at their leisure.)
The type of piece listed in the move is the type of piece that will end up on the given square. This greatly simplifies the logic for handling promotion. Castling involves separate types of moves.
The game description uses an "affected" relation to handle piece persistence. (That is, there are a few rules to specify which spaces are affected by the current move, and any piece in a square that is not affected stays where it is.) This greatly reduces the size of propnets.
There are a few inexpensive extra bits of state for dealing with castling and en passant.
This version of the game (in accordance with the "speed chess" theme) has a flat 200-step counter for ensuring termination. The "full" version of chess will likely use a more official rule, i.e. counting steps since the last capture or pawn move.
Base and input relations are included.

Let's look at how the new version, speedChess, compares to other versions and variants of chess available, performance-wise (all figures are average state changes per second):

Game	Prover	TupleProver	Propnet	Propnet construction stats
~~speedChess~~	~~170~~	~~770~~	~~10500~~	~~5 seconds, 27K components, 56K links~~
speedChess (fixed, see edit 2 below)	55	490	4050	7 seconds, 35K components, 82K links
capture_the_king	100	280	4600	17 seconds, 205K components, 779K links
brawl (only kings, queens, rooks)	145	340	3600	15 seconds, 97K components, 566K links
chess	15	100	N/A	Propnet creation ran out of memory
endgame (both kings, 1 rook, 15 steps)	570	1150	9500	2.4 seconds, 34K components, 144K links
skirmish	40	180	2600	8 seconds, 117K components, 344K links
slaughter	20	145	N/A	Propnet creation ran out of memory

(Note that these numbers for the ProverStateMachine and TupleProverStateMachine include recent improvements to the ProverStateMachine. This makes them faster relative to the propnet-based version, compared to earlier comparisons.)

speedChess is behind endgame's performance in most respects, but not by much; it's a small price to pay for getting the other 29 pieces on the board. Performance is better than all the other chess variants (though capture_the_king isn't too shabby).

The ProverStateMachine still won't work very well (the step counter limit is 200), but testing this with Alloy and using propnets, it managed to do a couple of things that looked vaguely rational. Maybe now we'll be able to get this running on Tiltyard and encourage the use of techniques that work better than basic UCT on games of this complexity.

Next up will be adding check to the game.

EDIT 2014-05-28: A couple of issues have been discovered since the original post, one in the game description itself and one in the OptimizingPropNetFactory. If your gamer is using the OPNF, you'll need to get a version of GGP-Base with this commit or apply the fix yourself for propnet generation to work properly on speedChess. (The bug is non-deterministic; the performance stats above were taken while it happened to be working correctly, and should still be accurate.)

EDIT 2014-06-05: A bigger issue came up: apparently bishops, rooks, and queens were unable to move more than one space due to an undefined relation type being used in a rule. I've added a new warning type to the StaticValidator in GGP-Base to prevent this kind of error from happening again. The problem has been fixed, and the fixed version of the game is now in rotation on Tiltyard. (This changes the performance statistics.)

4 comments:

Steve DraperMay 20, 2014 at 4:01 PM
Alex, will this be available through any of the online repositories in the near future? (I realize I can get it by importing to my local repository, but I mean specifically the public repositories)
Steve DraperMay 21, 2014 at 9:56 AM
Looks like you did a pretty good job with this ruleset! I ran Sancho's performance analyser against it and got the following results:

Propnet sizes:
X-net: 8574, O-net : 8606 (one or the other of these will be used for most purposes - which depends on the state)
goal-net: 4556 (only used for goal evaluation on terminal states)

Single threaded statemachine playouts/sec from initial: 4891
Multi-threaded MCTS searching from initial - 13772 playouts/sec, 7092 node expansions per sec

Those are better numbers than checkers, so I think your rule optimizations have worked really well.
AlexAugust 6, 2014 at 11:48 PM
I was curious why I was getting worse performance than in capture_the_king, particularly in the Prover, so I set up a test to run through the same sample game using both sets of rules. Here's how time was spent in each type of method within the state machine:

CTK_LEGAL: 1378
CTK_NEXT: 198
CTK_TERMINAL: 6
CTK_GOALS: 0
SC_LEGAL: 423
SC_NEXT: 149
SC_TERMINAL: 442
SC_GOALS: 0

At a glance, it seems the issue is that the terminality check takes longer (much longer). This is because speedChess correctly ends the game when a player controls their king but has no legal moves; capture_the_king does not. (This would be extremely rare, but it's a bug not to handle this case.)

This also explains why speedChess is still faster with the TupleProver. That includes an optimization where it will reuse its internal query result cache if it receives the same state multiple times in a row. If it gets an isTerminal() call followed by a getLegalMoves() call, the latter will take advantage of the computations already performed by the former. This optimization might be worth transferring back to the standard Prover.

Alloy GGP

Friday, May 16, 2014

Rewriting chess (again)

Take two

4 comments:

About Me