Sunday, December 22, 2013

Improving Chinese checkers performance

This is another post on improving the performance of game simulations by modifying the GDL by hand. Today, we look at chineseCheckers4, a game known for its particularly poor performance, especially for propnets.

I wanted to study this game's performance in a more step-by-step manner than previous posts, so I stopped to gather performance statistics after each step. Each number represents the number of state changes that can be computed per second when running random depth charges over the course of 60 seconds. (The numbers aren't quite as consistent as the number of significant figures implies, so take the exact values with a grain of salt.)

Here are the baseline values for my three rules engines:
  • Prover: 410
  • Tuple-based prover: 1100
  • Propnet: 3800
  • Propnet size: 282K components, 984K links, 20 seconds
The prover numbers are roughly in line with performance on English Draughts and better than Reversi; of course, this is also a relatively simple game, much simpler than Reversi. The propnet numbers, on the other hand, are much worse, and the generated propnet is enormous. This led me to believe that the game was simply written in a way that leads to poor performance.

One reasonable question here is whether the GDL can be automatically rewritten along the lines of what I'm going to do. As we'll see, that isn't always so easy.

The first thing I tried changing is the following couple of rules for deciding when a piece should be left alone:

(<= (next (cell ?x ?z))
    (true (cell ?x ?z))
    (does ?player (move ?b ?e))
    (distinct ?x ?b)
    (distinct ?x ?e))

(<= (next (cell ?x ?z))
    (true (cell ?x ?z))
    (does yellow noop)
    (does green noop)
    (does blue noop)
    (does magenta noop))

Rules written like the first of this pair are a common cause of bloated propnets. This is because the rule has to be instantiated once for every combination of possible cell, possible role, and possible move. The following is much more propnet-friendly:

(<= (next (cell ?x ?z))
    (true (cell ?x ?z))
    (not (affected ?x)))

(<= (affected ?x)
    (does ?player (move ?x ?z)))
(<= (affected ?z)
    (does ?player (move ?x ?z)))

This version has one set of "affected" nodes that tell whether cells have been affected by the move made (with size linear to the number of possible moves). These are then used to check if a cell should keep its old state (with size linear to the number of cells). No unfortunate cross-products are necessary.

Since this is such a problem, can this kind of substitution be made automatically by a player at runtime? Actually, it would be quite difficult. This substitution is only safe to make because of additional information I have about the game; namely, that only one sentence of the form (does ?player (move ?b ?e)) can be true each turn. If there were two or more such sentences true, then the two sets of rules would behave differently. Automatically extracting and exploiting this kind of "mutual exclusion" information is beyond my capabilities at the moment.

Anyway, what happens to the performance when this substitution is made? (All percentages throughout this post are relative to the original version of the game.)
  • Prover: 350 (-15%)
  • Tuple-based prover: 940 (-15%)
  • Propnet: 6750 (+78%)
  • Propnet size: 44K components (-84%), 279K links (-72%), 8 seconds (-60%)
As expected, propnet size has drastically decreased, and propnet performance has also improved. (The propnet rules engine being used is differential, so it can skip over unchanged sections of the propnet; otherwise, the speed improvement would be even more dramatic.) However, the prover performance has actually gotten worse. What gives?

The prover implementation in GGP-Base treats sentences in rules as queries, and in particular, will fill in variables in the sentences if their values are already known. In addition, within a single query, it will cache the results of subqueries and reuse them if it needs to repeat them. Let's take a look at that first version of the rule again:

(<= (next (cell ?x ?z))
    (true (cell ?x ?z))
    (does ?player (move ?b ?e))
    (distinct ?x ?b)
    (distinct ?x ?e))

For the first conjunct (true (cell ?x ?z)), the prover will find all the available values and then, for each one, ask (does ?player (move ?b ?e)), finding the results in the cache. Then it will confirm whether the ?x value is distinct from the cells in the single move found, a quick comparison. As for the other version:

(<= (next (cell ?x ?z))
    (true (cell ?x ?z))
    (not (affected ?x)))

(<= (affected ?x)
    (does ?player (move ?x ?z)))
(<= (affected ?z)
    (does ?player (move ?x ?z)))

For each cell, it asks for the affected-ness of a particular cell, checking each one individually. A query for (affected c3) will be treated differently from a query for (affected c2), so we never find answers in the cache. Instead, we check two separate rules for affected-ness, with additional function call overhead and possibly checks against noop moves.

Okay, enough about how the prover works. What's the next thing to try?

One other inefficiency I noticed is how blank cells are explicitly represented. This generally isn't necessary. Starting over again from the original set of rules (to isolate their effect), I removed the init and base propositions associated with them, and then had to make some subtle changes to how pieces in a bucket are counted to avoid unsafe rules. That led to the following performance:
  • Prover: 540 (+32%)
  • Tuple-based prover: 1450 (+32%)
  • Propnet: 4400 (+16%)
  • Propnet size: 244K components (-13%), 870K links (-12%), 13 seconds (-35%)
It's worth noting that in addition to the minor performance improvement, this version of the game uses less state to represent each turn of the game, which means MachineStates will use less memory.

Combining both of the above approaches gives us:
  • Prover: 525 (+28%)
  • Tuple-based prover: 1410 (+28%)
  • Propnet: 7000 (+84%)
  • Propnet size: 40K components (-86%), 262K links (-73%), 7 seconds (-65%)
Note that the negative impact on the prover's performance isn't as large with the blank cells removed. This would affect it in a couple of ways: firstly, we get to skip all the blank cells when deciding which values to preserve; secondly, we get to drop one of the affected rules, since we no longer have to worry about the cell that a piece is moving into.

The next thing I noticed was that the base and input rules were incorrect, including many states and moves that were impossible. This wouldn't affect the provers, and my propnet rules engine uses a post-optimization that fixes this anyway, but it has an impact on the size and speed of the initial propnet construction. With fixed bases and inputs:
  • Propnet size: 22K components (-92%), 150K links (-85%), 5 seconds (-75%)
Finally, from looking at the output from the verbose mode of the OptimizingPropNetFactory, I discovered that it was spending most of its time generating rules for when players could make noop rules, as defined by the following rule:

(<= (legal ?player noop)
    (true (control ?player))
    (true (cell ?b1 ?player))
    (stuck ?b1 ?player)
    (true (cell ?b2 ?player))
    (distinct ?b1 ?b2)
    (stuck ?b2 ?player)
    (true (cell ?b3 ?player))
    (distinct ?b2 ?b3)
    (distinct ?b3 ?b1)
    (stuck ?b3 ?player))

This is a rather roundabout way of saying that a player should be able to make a noop move if they can't make any other moves. Specifically, this rule says they can make such a move if they have three pieces and all of them are stuck (have no legal moves associated with them). In the propnet, this rule needs to be instantiated for every combination of three cells on the board.

The much simpler way to express this is by asking if the player has any legal moves at all, directly:

(<= (legal ?player noop)
    (true (control ?player))
    (not (hasAnyLegalMove ?player)))

(<= (hasAnyLegalMove ?player)
    (movable ?player ?b ?e))

This turns out to be much more efficient across the board. (Note that like before, this is only a valid substitution because we have extra knowledge of the game. If a player could have more or fewer than three pieces at once, the two sets of rules wouldn't be equivalent.) Adding this on top of all our other improvements:
  • Prover: 700 (+71%)
  • Tuple-based prover: 1800 (+64%)
  • Propnet: 90000 (+2300%)
  • Propnet size: 3152 components (-99%), 5554 links (-99.4%), 2 seconds (-90%)
I'll work on getting these changes applied to all the Chinese checkers games in the Tiltyard Base repository.

1 comment:

  1. Interesting. I had noticed just a day or two ago how slow 6-player Chinese Checkers was for me (circa 20 depth charges per second per thread only!), and was planning to look into it. This post is very helpful (though obviously if you update the Tiltyard rules any automated analysis becomes irrelevant for those games - perhaps one variant should be left as an alternate game on Tiltyard [in addition to a fixed version] to serve as a discriminator for players capable of automating some of this analysis.

    ReplyDelete