Saturday, June 20, 2015

Amazons and Efficient Recursion

I recently rewrote Amazons to make it more efficient, and thus more attractive as a GGP game. I've also been doing some unrelated work on a compiled prover rules engine, which is much faster than the ProverStateMachine GGP-Base. In the process, I've had some ideas about how to deal with recursion efficiently in provers, which could have implications for many other important games.

Amazons is a two-player board game ostensibly related to chess, though the gameplay and goals are quite different. Both players have four pieces resembling chess queens; these are called "amazons". They start scattered across a 10x10 board. Unlike in chess, there are no captures; each turn, the player moves one of their amazons (in the manner of a non-capturing queen move) and then fires an arrow from that amazon (anywhere a queen could move, possibly into the space just vacated). Arrows stay in their target square for the remainder of the game and block both movement and fired arrows, gradually cluttering the board. The first player to be unable to move loses.

The current version of amazons (and its variants amazonsSuicide and amazonsTorus) is rather difficult for rules engines to handle, with this difficulty almost entirely coming from the number of legal moves available on each turn. (amazonsTorus has a couple of other peculiarities, in its use of recursion and a bug allowing pieces to effectively not move, or fire an arrow into their own space.) The new version separates the movement and arrow-firing portions of the move into separate turns; the game remains the same, but simulation is much faster, and the rules are simpler. (On the other hand, the game tree expands due to the additional intermediate nodes; but gamers have more freedom to cope with this as part of their playing algorithms.)

However, there's one part of the rules that still works inefficiently for a backward-chaining prover like the ProverStateMachine. That is the step of simply determining which moves are legal. The rule needs to look for empty spaces adjacent to the piece, then empty spaces next to those, and so on.

This is a problem that's also seen in chess and Reversi. (For backward-chaining reasoners, the difficult part of Reversi is not the complex-looking goal calculation -- which works well as long as query answers are cached* -- but determining which pieces get "flipped" after each move.)

Recursion and Backwards-Chaining

If this were written in an imperative language, the pseudocode for a linear implementation would be easy:

val result = []
for (Direction dir : allDirections()) {
    var (x, y) = (xInput, yInput)
    while (empty(nextSpaceInDir(x, y, dir))) {
        (x, y) = nextSpaceInDir(x, y, dir)
        result.add(move(xInput, yInput, x, y))
    }
}


However, the "direct" translation for that approach into GDL will often break provers, which will get stuck in an infinite loop:

(<= (openPath ?x1 ?y1 ?x2 ?y2 ?dir) ; rule 1.1
    (oneInDir ?x1 ?y1 ?x2 ?y2 ?dir)
    (not (occupied ?x2 ?y2)))
(<= (openPath ?x1 ?y1 ?x3 ?y3 ?dir) ; rule 1.2
    (openPath ?x1 ?y1 ?x2 ?y2 ?dir) ; site of infinite recursion
    (oneInDir ?x2 ?y2 ?x3 ?y3 ?dir)
    (not (occupied ?x3 ?y3)))


Instead, we tend to use this:

(<= (openPath ?x1 ?y1 ?x2 ?y2 ?dir) ; rule 2.1
    (oneInDir ?x1 ?y1 ?x2 ?y2 ?dir)
    (not (occupied ?x2 ?y2)))
(<= (openPath ?x1 ?y1 ?x3 ?y3 ?dir) ; rule 2.2
    (oneInDir ?x1 ?y1 ?x2 ?y2 ?dir) ; start with oneInDir instead
    (not (occupied ?x2 ?y2))
    (openPath ?x2 ?y2 ?x3 ?y3 ?dir))


What's wrong with this second technique? Well, consider the set of openPath sentences we need to compute, say, in the diagonal starting at (1, 1). (For convenience, we can use just one of the coordinates.)

In the first technique, the values of openPath that we compute are (1, 2), (1, 3), (1, 4), ..., (1, 10). The first coordinate is always the same. This number is linear in the length of the line.

For the second, (1, 3) relies on (2, 3); (1, 4) relies on (2, 4), which in turn relies on (2, 3) and (3, 4); and so on. In other words, the number of sentences we have to discover is quadratic in the length of the line, so even an ideal prover will have quadratic performance here.

(Note that forward-chaining techniques -- like most propnet implementations -- will maintain even those openPath sentences that aren't being asked for by the current state's move legality query, so there's no real difference in implementation for those, unless we add the piece's presence as a precondition for both rules.)

Recursion Handling in the ProverStateMachine

The ProverStateMachine got an upgrade in March that allows it to handle recursion like that used in the first approach. It gives correct results according to the semantics of GDL, even though it maintains the order of those conjuncts. However, its approach is not efficient.

When we come across a sentence that we're already asking, we want to return (in the recursive subquery) any sentences that can be produced by other methods of asking the original query. The way we do that is essentially writing down that the original query is recursive, which tells us we need to revisit its subqueries later with additional results. (This happens when we run into rule 1.2 above. Note that this doesn't happen for rule 2.2, as the "query" includes the first two arguments to openPath, which are different for the recursive subquery in approach 2, as opposed to being the same in approach 1.)

When we see that a query we've just finished asking is recursive, we usually need to revisit it to gather additional results, by plugging any new answers we've obtained into the identical subqueries. We repeat this until the set of answers obtained is the same as the set of answers we originally had. (GDL language restrictions ensure that this set only gets larger, not smaller, and will eventually converge.) This means when we do come across a query we're already asking, our return value is whatever's in previousResults.

(Also, caching is disabled while we're in the middle of handling a recursive query, since we'll be getting incomplete results most of the time.)

So how efficient will this method be for that first approach? Unfortunately, each iteration of this process adds only one new value to the inputs, and each time it retries every value it's found so far; again, this is a quadratic procedure relative to the line length.

There's a solution to this, already used by the forward-chaining reasoner GGP-Base uses in its ConstantChecker generation: Only feed in new values to the rules. Indeed, that would work in this case to give linear behavior. So why don't we use this approach? Because it's not that simple. Handling a single recursive call in a rule is easy; handling multiple recursive calls is where the trouble lies.

Let's say we only run the rule with new sentences as the input to any recursive references we see. Then consider what happens if that rule makes multiple recursive references. There might be answers to the query that are only generated if one of the references takes on a new value, and another takes on an older value. This variable assignment wouldn't have been made the first time around (with only old values), and it wouldn't be made the next time either (with only new values). If we're going to limit ourselves to new values, we need to limit this constraint to just one recursive reference at a time; and that could be any recursive reference, not just the first one we come across. This is made extra-complicated by the fact that these recursive references may be indirect or appear in other rules.

The forward-chaining reasoner used by the ConstantChecker isn't currently set up to handle more complex recursion across rules and sentence types. The ProverStateMachine has to meet a somewhat higher standard, as the reference state machine; it should be able to give correct results in all cases, which more efficient implementations can be checked against. This is why I preferred the slower, but more widely applicable approach.

But there's certainly room for faster provers that don't have to work that way. So what approach should they use?

Continuations

My idea (which I'm sure is already in the theorem-prover literature somewhere) also uses this notion of revisiting recursive subqueries later, but in a slightly different way, with major performance implications. Whenever we encounter a recursive call to something we're already asking, we store a continuation -- a set of the information necessary to essentially recreate that point in the query process, with an answer to that query filled in. (This is my own name for the idea; it's something like a closure, and may be able to be implemented using closures.) Before the original call ends, we ensure that each continuation has been run with each available input, and the results collected.

Why doesn't this encounter the problem I mentioned before about only looking at new sentences? It's because if multiple recursive subqueries are encountered, each one will get its own set of continuations -- and some of these will reflect the older sentences having been passed in to earlier queries.

Meanwhile, it only provides each sentence once to each continuation, so it should give linear performance for the first approach, where there is only one continuation.

There's still a little more room for improvement: not all inputs will match the conditions of all continuations. A constant or bound variable in the recursive call can conflict with the actual contents of the sentence. In this case, there should be some efficient filtering (probably involving HashMaps) to deliver each new sentence to only the set of continuations for which it is appropriate. New continuations may also need their own efficient filters over the set of sentences seen so far.

I still haven't implemented this yet, but the compiled prover should be a good place to try it out. It's actually a natural fit; the part of the code corresponding to the rule leading up to the recursion is in one place, and the code afterwards is split out later, where we're processing the continuations. That probably sounds confusing, but I may have a code sample later to better illustrate what's going on.

Numbers

Anyway, here are perf numbers to compare the Amazons versions (and highlight the performance of the compiled prover). As usual, numbers are approximate state changes per second during a 30-second period of repeated random playthroughs. "Rule 1" and "rule 2" refer to the two different implementations of the recursive rule discussed above; "rule 2" is the version currently on Tiltyard.

Rules engineOld AmazonsNew Amazons, rule 2New Amazons, rule 1
Prover0.216540
Tuple Prover148901090
Compiled Prover64022500error
Propnet1721050011800
Propnet gen time486 s5 s10 s
Propnet size, no opt.3076k links, 1502k components78k links, 39k components78k links, 39k components
Propnet size, with opt.1814k links, 585k components64k links, 26k components64k links, 26k components

Note that two state changes in new Amazons are equivalent to one state change in old Amazons, and the numbers haven't been scaled to reflect that. Even so, it's clear that the simpler approach is a huge improvement. (Aside from the numbers above, propnet generation for the old Amazons game required a memory allocation of around 3 GB to succeed.)

I expect to post again sometime later to explain why the compiled prover I'm using is so much more efficient than the other provers, despite using the same core approach to handling rules. Suffice it to say that it is, on average, about 20 times faster than the GGP-Base prover (the first one listed above), and can greatly exceed this on complicated games like the Amazons variants. This makes it competitive with the propnet rule engine I'm using for these comparisons, though certainly much faster propnet-based rule engines exist. (Alloy used that propnet engine in versions before 0.8, though it now has a faster propnet-based approach; I may also do some comparisons with Sancho's engine in the future.)

* The rules could, and perhaps should, be slightly rewritten so the caching is unnecessary.

6 comments:

  1. Why do you want to rewrite the rules to make life easier for players? Why not make them cope?

    ReplyDelete
    Replies
    1. By "coping", do you mean making do with very slow simulations, or having them rewrite the rules themselves?

      When simulations are slow, there's not much players can do to be intelligent, and randomness is more of a factor. Fast simulations provide more options in terms of how to approach developing a game strategy.

      Why not let players do the job of rewriting the rules? Because this turns out to be really hard in most cases; most of these types of transformations involve human-level knowledge of the game (e.g. "these two things can't both be true simultaneously") that's difficult to model and confirm in a general way. Also, then competitions turn into who can do the best optimizations at this low level of computation, which are less interesting (from an AI perspective) than higher-level strategy.

      Also, I'd argue that the better play that's enabled makes for greater entertainment value.

      Delete
    2. I wouldn't say it's less interesting from an AI perspective, just more interesting from a different AI perspective. Being able to do on-the-fly human reasoning about logic, is still pretty interesting. But I agree that in the scope of GGP, it's not meant to be the focus, and the better we can do it, the better we focus on game-playing AI.

      Delete
  2. I'd like to point out that continuation might be a bad word to choose if you're really completely making it up, just because it's already used in the context of languages. But, maybe you knew that? It does actually seem like what you're talking about could be a continuation --- so maybe you chose the name intentionally --- but I don't think I can tell for sure it's normal continuations, without more details. In a completely recursive implementation of backward-chaining, the information needed to keep track of recursive calls would be stored in the stack. But if you really are storing all of the context of the recursion, and then applying it to the value returned by the recursive call, then yeah, that's probably just a real continuation. They're not so common in logic languages, I don't think, but in functional languages, they're all over the place.

    ReplyDelete
    Replies
    1. I wasn't fully aware of the existing concept of continuations, but my subconscious may have dredged up the name from a programming languages class I took years ago.

      I don't think that this maps fully onto the typical use of continuations, as this involves collecting several and potentially running each of them multiple times, with additional information injected. It might still fit a broad theoretical definition of continuations.

      I don't know how much this matters, though, as it describes not a language feature or a way to write GDL, but an implementation strategy for interpreting it efficiently. This would show up in a small number of places as a result. (On the other hand, you could say the same thing about tail recursion, but people also talk about it when writing code that takes advantage of its existence.)

      I'll wait to worry about the name until I actually have a working implementation and a chance to see if it's as effective as I hoped.

      Delete
  3. This comment has been removed by a blog administrator.

    ReplyDelete