Alloy GGP: Implication deduction: Logic over propnets

Today I'd like to share a simple and surprisingly useful algorithm for making logical inferences about games using propositional networks.

I developed this as part of the implementation for a more specific feature (one of the applications listed below). Although it was developed in the context of a larger feature, I wrote the code for this step to stand alone, and it has proved useful in quickly writing other features that involve understanding games more deeply.

The question that it answers is simple: given a set of components that we know to be true or false in a propnet, which other components are also guaranteed to be true or false? I call this implication deduction.

The algorithmically astute among you may notice that a complete solution to this problem is equivalent to the Satisfiability problem, and so would be computationally intractable. The good news is that we don't need to solve this problem completely. We can usually make do with a subset of the components that we could, in theory, identify as having fixed values. In fact, it can make sense to use multiple implementations with different tradeoffs between speed and completeness. In some applications, this will be run once per game; in others, it may be run once per legal move, or per state of the game.

There's also a twist we apply to make this more useful: We want to be able to track these implications across multiple turns. This involves pairing our references to components with logical turn numbers. (These don't have to match real turn numbers; if I'm considering two consecutive turns, I'll always use 0 to indicate the earlier turn and 1 the later turn.) Luckily, due to the nature of the propnet's construction, this does not make the algorithm significantly more complicated. We can take advantage of the transition components that mark the boundary between turns, and use these to note when we cross from one turn to another (or when we reach the boundary of our time span of interest).

The algorithm

We refer to the possible component types in a propnet with the capitalized names Proposition, Transition, Or, And, and Not, to match the class names used in GGP-Base. (Support for Constants can easily be added.)

We use a TimedComponent type that pairs a reference to a component in the propnet with a logical turn number. Its inputs and outputs are the TimedComponents corresponding to the component's inputs and outputs; these are on the same turn, except where a Transition is involved. Moving from a Transition to its input decreases the turn number by one, and vice versa.

Inputs to the algorithm are an initial assignment of TimedComponents to values and a range of turn numbers of interest.

Initialize an “assignmentSoFar” mapping TimedComponents to boolean values. Initialize this with the inputs to the algorithm, i.e. what we already know is true.
Initialize a “componentsToExamine” set. (This could be a queue or stack, but it's advantageous to remove duplicates as new components are added.) The initial values are the TimedComponents in assignmentSoFar.
While componentsToExamine is non-empty, remove a “curComponent” from it and apply the following operations:
1. If curComponent is outside the time range of interest, ignore it and move on to the next component.
2. If curComponent is not in assignmentSoFar, apply the following techniques to check for reasons it might be assigned a value. If one applies, add that value to the assignmentSoFar mapping.
  1. If a Proposition or Transition output of curComponent has a value, curComponent takes the same value.
  2. If a Not output of curComponent has a value, curComponent takes the opposite value.
  3. If an And output of curComponent is true, curComponent is also true. Optionally, if the And output is false and all its other parents are true, curComponent is false.
  4. If an Or output of curComponent is false, curComponent is also false. Optionally, if the Or output is true and all its other parents are false, curComponent is true.
  5. If curComponent is a Proposition or Transition and it has an input with a value, curComponent takes the same value.
  6. If curComponent is a Not and its input has a value, curComponent takes the opposite value.
  7. If curComponent is an And and one of its inputs is false, curComponent is false. If all of its inputs are true, curComponent is true.
  8. If curComponent is an Or and one of its inputs is true, curComponent is true. If all of its inputs are false, curComponent is false.
  9. Optionally, if curComponent is a legal Proposition and the corresponding does Proposition is true, curComponent is true.
  10. Optionally, if curComponent is a does Proposition and the corresponding legal Proposition is false, curComponent is false.
3. If assignmentSoFar contains curComponent (either because it was in the initial mapping or just added), do the following:
  1. Add to componentsToExamine any inputs to curComponent that are not already in assignmentSoFar.
  2. Add to componentsToExamine any outputs of curComponent that are not already in assignmentSoFar.
  3. Optionally, if the assigned value is true, for any output of curComponent that is an And gate assigned the value false, check if all but one input to the child is true. If so, add the last parent to componentsToExamine if it's not already in assignmentSoFar. (This corresponds with the optional technique in 3.2.3 above.)
  4. Optionally, if the assigned value is false, for any output of curComponent that is an Or gate assigned the value true, check if all but one input to the child is false. If so, add the last parent to componentsToExamine if it's not already in assignmentSoFar. (This corresponds with the optional technique in 3.2.4 above.)
  5. Optionally, if curComponent is a legal Proposition and the assigned value is false, add the corresponding does Proposition to componentsToExamine if it's not already in assignmentSoFar. (This corresponds with the optional technique in 3.2.10 above.)
  6. Optionally, if curComponent is a does Proposition and the assigned value is true, add the corresponding legal Proposition to componentsToExamine if it's not already in assignmentSoFar. (This corresponds with the optional technique in 3.2.9 above.)

This is a bit long, but it should be clear what each step accomplishes. We start with the initial assumptions and spread across the directed graph in both directions, upstream and downstream, looking for components with values that can be deduced using one of the techniques in step 3.2.

So what are some of the applications of this technique?

Winning/losing move detection

The original reason I wrote this algorithm was to generate a “generalized endgame book” (GEB); that is, a way to look at a state and see from a subset of the state that a given move will lead to victory. (This is contrasted with a normal endgame database, which requires a complete match on the entire state.) The simplest incarnation of this idea is finding moves that in every case immediately end the game and give the player a score of 100. The benefit here is that this can be applied not just during the first phase of an MCTS playthrough, but during the random descent through previously unexplored states.

(I was planning to do a larger post on GEBs, but when I (finally) re-added GEBs to the latest version of Alloy, I was unable to reproduce my past results showing their benefits in earlier versions. It can be very tricky to tell which advanced features will actually result in higher winning percentages in the context of a larger game player.)

To generate GEBs, implication deduction is normally just one part of a larger, long-running algorithm. However, if we're just looking for moves that immediately cause a player to win or lose, we can use a trick with logical converses to detect these moves quickly.

If you want to find the set of moves that will cause xplayer to win immediately, run the algorithm in the following two configurations:

Time range: Turns 0 and 1

Inputs: <terminal, 0> = false (optional), <terminal, 1> = false

Outputs to watch for: <

(does xplayer
?move)

, 0> = false

Time range: Turns 0 and 1

Inputs: <terminal, 0> = false (optional), <(goal xplayer 100), 1> = false

Outputs to watch for: <

(does xplayer
?move)

, 0> = false

The logic here is that if ?move is a winning move, then <(does xplayer ?move), 0> being true will cause terminal and (goal xplayer 100) to be true on the following turn. The converse also holds: either of the latter being false will cause the original move to be false. By taking advantage of this converse, we can test all moves at once instead of running tests one at a time.

Joint latch detection

We can also use this algorithm to easily find the subset of a given state that will not change for the rest of the game. Unlike most methods of latch detection, this will handle combinations of state that are not latches individually, but keep each other in place due to their combination.

This can be used to reduce the size of a propnet for more efficient simulations; reduce the size of game states as certain sentences no longer need to be included; or identify states that are unwinnable (or unloseable).

To do this latch detection, run the algorithm repeatedly with the turn range [0, 1]. In the first step, set the sentences in turn 0 according to the state of interest (setting the base propositions should suffice). In subsequent steps, take the intersection of the components in turns 0 and 1 and put that in step 0. (If a component is included in both turns but its truth values differ, ignore it.) Repeat this process until the components in 0 and 1 are identical and have the same values; these are the latches.

Goal proximity heuristics

This will only be useful for a few games, but it's also possible to use this type of ternary logic to determine the minimum number of turns before (goal player 100) can become true. This can be used as an admissible heuristic for A* search in puzzles.

A somewhat contrived example (but potentially interesting as a puzzle for GGP in any case) is the made-up “Knight's Journey” puzzle. Unlike the “Knight's Tour”, this simply requires a knight to cross the board from one corner to the other in the minimum number of turns, giving 100 points if successful and 0 otherwise. If the board is large enough, MCTS alone will not find a solution, but goal proximity (measured by number of turns) is one of several heuristic approaches that can guide the player to the solution quickly.

The process for generating the heuristic is like the joint latch detection algorithm, but the output is the number of iterations needed until (goal player 100) is no longer guaranteed to be false. (Note that you'll need to check for the convergence of the algorithm, as that sentence being false may be part of the latch.)

Imperfect information games

I don't have plans just yet to extend Alloy to work in GDL-II. However, this technique would certainly have important applications in that realm. It would allow players to squeeze extra information about past and future turns out of the information they've collected over the course of the match. For example, a Minesweeper player would be able to rule out certain initial mine placement locations in the early turns of the game, then use the remaining possible initial sequences to consider possible placements.

I believe this technique is also easier to reason about (for most programmers) than traditional logic techniques that need to be applied directly to the game rules, like most GDL-II reasoning techniques so far. For example, the current extent of the implications found can be visualized by displaying the propnet colored according to its assignments. This visualization can suggest further refinements to more fully capture what is known.

Extensions

There are many ways this algorithm could be improved further. As mentioned above, a perfect implication deducer would require solving the Satisfiability problem completely. This is difficult in the general case, but it's possible that using a good SAT solver directly would be reasonable and effective.

As a perhaps more practical matter, there are probably relatively simple changes to the algorithm that can make it more efficient or powerful. In particular, there must be more efficient ways to recompute the implication set after slightly tweaking the inputs, which would be useful for several applications. There are also game-specific facts that could assist the algorithm, such as mutex awareness.

1 comment:

Steve DraperMay 13, 2015 at 6:04 AM
Coincidentally, I have been working for a few months on using distance analysis via the propnet to aid game search. I intend to write a couple of blog posts about this, but it involves using this kind of analysis to build a move-distance matrix (establishing a metric space for moves), wherein the distance between 2 moves is the smallest number of turns that must elapse before both of them could result in a change to some common base prop (in some follow-on sequence). This allows the definition of an 'efficient sequence' (one where all the moves have some effect on one 'goal' [in a general sense]), and you can then trim the search space to filter out sequences that cannot possibly be 'efficient' in this sense. We use this (prototype only as yet) in some games to look for forced win/losses. It's especially helpful in games like Breakthrough where piece mobility is low, so move distances can be quite large - it means that sequences with pawn moves in one part of the board need not examine moves/responses in 'far away' parts of the board (up to a given search horizon)

Alloy GGP

Sunday, May 10, 2015

Implication deduction: Logic over propnets