Today I'd like to share a simple and
surprisingly useful algorithm for making logical inferences about
games using propositional networks.
I developed this as part of the
implementation for a more specific feature (one of the applications
listed below). Although it was developed in the context of a larger
feature, I wrote the code for this step to stand alone, and it has
proved useful in quickly writing other features that involve
understanding games more deeply.
The question that it answers is simple:
given a set of components that we know to be true or false in a
propnet, which other components are also guaranteed to be true or
false? I call this implication deduction.
The algorithmically astute among you
may notice that a complete solution to this problem is equivalent to
the Satisfiability problem, and so would be computationally
intractable. The good news is that we don't need to solve this
problem completely. We can usually make do with a subset of the
components that we could, in theory, identify as having fixed values.
In fact, it can make sense to use multiple implementations with
different tradeoffs between speed and completeness. In some
applications, this will be run once per game; in others, it may be
run once per legal move, or per state of the game.
There's also a twist we apply to make
this more useful: We want to be able to track these implications
across multiple turns. This involves pairing our references to
components with logical turn numbers. (These don't have to match real
turn numbers; if I'm considering two consecutive turns, I'll always
use 0 to indicate the earlier turn and 1 the later turn.) Luckily,
due to the nature of the propnet's construction, this does not make
the algorithm significantly more complicated. We can take advantage
of the transition components that mark the boundary between turns,
and use these to note when we cross from one turn to another (or when
we reach the boundary of our time span of interest).
The algorithm
We refer to the possible component
types in a propnet with the capitalized names Proposition,
Transition, Or, And, and Not, to match the class names used in GGP-Base. (Support for Constants can easily be
added.)
We use a TimedComponent type that pairs
a reference to a component in the propnet with a logical turn number.
Its inputs and outputs are the TimedComponents corresponding to the
component's inputs and outputs; these are on the same turn, except
where a Transition is involved. Moving from a Transition to its input
decreases the turn number by one, and vice versa.
Inputs to the algorithm are an initial
assignment of TimedComponents to values and a range of turn numbers
of interest.
- Initialize an “assignmentSoFar” mapping TimedComponents to boolean values. Initialize this with the inputs to the algorithm, i.e. what we already know is true.
- Initialize a “componentsToExamine” set. (This could be a queue or stack, but it's advantageous to remove duplicates as new components are added.) The initial values are the TimedComponents in assignmentSoFar.
- While componentsToExamine is non-empty, remove a “curComponent” from it and apply the following operations:
- If curComponent is outside the time range of interest, ignore it and move on to the next component.
- If curComponent is not in assignmentSoFar, apply the following techniques to check for reasons it might be assigned a value. If one applies, add that value to the assignmentSoFar mapping.
- If a Proposition or Transition output of curComponent has a value, curComponent takes the same value.
- If a Not output of curComponent has a value, curComponent takes the opposite value.
- If an And output of curComponent is true, curComponent is also true. Optionally, if the And output is false and all its other parents are true, curComponent is false.
- If an Or output of curComponent is false, curComponent is also false. Optionally, if the Or output is true and all its other parents are false, curComponent is true.
- If curComponent is a Proposition or Transition and it has an input with a value, curComponent takes the same value.
- If curComponent is a Not and its input has a value, curComponent takes the opposite value.
- If curComponent is an And and one of its inputs is false, curComponent is false. If all of its inputs are true, curComponent is true.
- If curComponent is an Or and one of its inputs is true, curComponent is true. If all of its inputs are false, curComponent is false.
- Optionally, if curComponent is a legal Proposition and the corresponding does Proposition is true, curComponent is true.
- Optionally, if curComponent is a does Proposition and the corresponding legal Proposition is false, curComponent is false.
- If assignmentSoFar contains curComponent (either because it was in the initial mapping or just added), do the following:
- Add to componentsToExamine any inputs to curComponent that are not already in assignmentSoFar.
- Add to componentsToExamine any outputs of curComponent that are not already in assignmentSoFar.
- Optionally, if the assigned value is true, for any output of curComponent that is an And gate assigned the value false, check if all but one input to the child is true. If so, add the last parent to componentsToExamine if it's not already in assignmentSoFar. (This corresponds with the optional technique in 3.2.3 above.)
- Optionally, if the assigned value is false, for any output of curComponent that is an Or gate assigned the value true, check if all but one input to the child is false. If so, add the last parent to componentsToExamine if it's not already in assignmentSoFar. (This corresponds with the optional technique in 3.2.4 above.)
- Optionally, if curComponent is a legal Proposition and the assigned value is false, add the corresponding does Proposition to componentsToExamine if it's not already in assignmentSoFar. (This corresponds with the optional technique in 3.2.10 above.)
- Optionally, if curComponent is a does Proposition and the assigned value is true, add the corresponding legal Proposition to componentsToExamine if it's not already in assignmentSoFar. (This corresponds with the optional technique in 3.2.9 above.)
This is a bit long, but it should be clear what each step accomplishes. We start with the initial assumptions and spread across the directed graph in both directions, upstream and downstream, looking for components with values that can be deduced using one of the techniques in step 3.2.
So what are some of the applications of
this technique?
Winning/losing move detection
The original reason I wrote this
algorithm was to generate a “generalized endgame book” (GEB);
that is, a way to look at a state and see from a subset of the state
that a given move will lead to victory. (This is contrasted with a
normal endgame database, which requires a complete match on the
entire state.) The simplest incarnation of this idea is finding moves
that in every case immediately end the game and give the player a
score of 100. The benefit here is that this can be applied not just
during the first phase of an MCTS playthrough, but during the random
descent through previously unexplored states.
(I was planning to do a larger post on
GEBs, but when I (finally) re-added GEBs to the latest version of
Alloy, I was unable to reproduce my past results showing their
benefits in earlier versions. It can be very tricky to tell which
advanced features will actually result in higher winning percentages
in the context of a larger game player.)
To generate GEBs, implication deduction
is normally just one part of a larger, long-running algorithm.
However, if we're just looking for moves that immediately cause a
player to win or lose, we can use a trick with logical converses to
detect these moves quickly.
If you want to find the set of moves
that will cause xplayer to win immediately, run the algorithm in the
following two configurations:
Time range: Turns 0 and 1
Inputs: <
terminal
, 0> = false
(optional), <terminal
, 1> = false
Outputs to watch for: <
(does xplayer
?move)
, 0> = false
Time range: Turns 0 and 1
Inputs: <
terminal
, 0> = false
(optional), <(goal xplayer 100)
, 1> = false
Outputs to watch for: <
(does xplayer
?move)
, 0> = false
The logic here is that if
?move
is a
winning move, then <(does xplayer ?move)
, 0> being true will
cause terminal
and (goal xplayer 100)
to be true on the following
turn. The converse also holds: either of the latter being false will
cause the original move to be false. By taking advantage of this
converse, we can test all moves at once instead of running tests one
at a time.Joint latch detection
We can also use this algorithm to
easily find the subset of a given state that will not change for the
rest of the game. Unlike most methods of latch detection, this will
handle combinations of state that are not latches individually, but
keep each other in place due to their combination.
This can be used to reduce the size of
a propnet for more efficient simulations; reduce the size of game
states as certain sentences no longer need to be included; or
identify states that are unwinnable (or unloseable).
To do this latch detection, run the
algorithm repeatedly with the turn range [0, 1]. In the first step,
set the sentences in turn 0 according to the state of interest
(setting the base propositions should suffice). In subsequent steps,
take the intersection of the components in turns 0 and 1 and put that
in step 0. (If a component is included in both turns but its truth
values differ, ignore it.) Repeat this process until the components
in 0 and 1 are identical and have the same values; these are the
latches.
Goal proximity heuristics
This will only be useful for a few
games, but it's also possible to use this type of ternary logic to
determine the minimum number of turns before (goal player 100) can
become true. This can be used as an admissible heuristic for A*
search in puzzles.
A somewhat contrived example (but
potentially interesting as a puzzle for GGP in any case) is the
made-up “Knight's Journey” puzzle. Unlike the “Knight's Tour”,
this simply requires a knight to cross the board from one corner to
the other in the minimum number of turns, giving 100 points if
successful and 0 otherwise. If the board is large enough, MCTS alone
will not find a solution, but goal proximity (measured by number of
turns) is one of several heuristic approaches that can guide the
player to the solution quickly.
The process for generating the heuristic is like the joint latch detection algorithm, but the output is the number of iterations needed until (goal player 100) is no longer guaranteed to be false. (Note that you'll need to check for the convergence of the algorithm, as that sentence being false may be part of the latch.)
The process for generating the heuristic is like the joint latch detection algorithm, but the output is the number of iterations needed until (goal player 100) is no longer guaranteed to be false. (Note that you'll need to check for the convergence of the algorithm, as that sentence being false may be part of the latch.)
Imperfect information games
I don't have plans just yet to extend
Alloy to work in GDL-II. However, this technique would certainly have
important applications in that realm. It would allow players to
squeeze extra information about past and future turns out of the
information they've collected over the course of the match. For
example, a Minesweeper player would be able to rule out certain
initial mine placement locations in the early turns of the game, then
use the remaining possible initial sequences to consider possible
placements.
I believe this technique is also easier to reason about (for most programmers) than traditional logic techniques that need to be applied directly to the game rules, like most GDL-II reasoning techniques so far. For example, the current extent of the implications found can be visualized by displaying the propnet colored according to its assignments. This visualization can suggest further refinements to more fully capture what is known.
I believe this technique is also easier to reason about (for most programmers) than traditional logic techniques that need to be applied directly to the game rules, like most GDL-II reasoning techniques so far. For example, the current extent of the implications found can be visualized by displaying the propnet colored according to its assignments. This visualization can suggest further refinements to more fully capture what is known.
Extensions
There are many ways this algorithm
could be improved further. As mentioned above, a perfect implication
deducer would require solving the Satisfiability problem completely.
This is difficult in the general case, but it's possible that using a
good SAT solver directly would be reasonable and effective.
As a perhaps more practical matter,
there are probably relatively simple changes to the algorithm that
can make it more efficient or powerful. In particular, there must be
more efficient ways to recompute the implication set after slightly
tweaking the inputs, which would be useful for several applications. There are also game-specific facts that could
assist the algorithm, such as mutex awareness.
Coincidentally, I have been working for a few months on using distance analysis via the propnet to aid game search. I intend to write a couple of blog posts about this, but it involves using this kind of analysis to build a move-distance matrix (establishing a metric space for moves), wherein the distance between 2 moves is the smallest number of turns that must elapse before both of them could result in a change to some common base prop (in some follow-on sequence). This allows the definition of an 'efficient sequence' (one where all the moves have some effect on one 'goal' [in a general sense]), and you can then trim the search space to filter out sequences that cannot possibly be 'efficient' in this sense. We use this (prototype only as yet) in some games to look for forced win/losses. It's especially helpful in games like Breakthrough where piece mobility is low, so move distances can be quite large - it means that sequences with pawn moves in one part of the board need not examine moves/responses in 'far away' parts of the board (up to a given search horizon)
ReplyDelete