Adversarial Search

Adversarial search

Adversarial game - Chess, business, trading, war.
You change state, but then don't control next state.
Opponent will change state in a way:

  1. unpredictable
  2. hostile to you
You only get to change (say) every alternate state.


4.4.1 Minimax (exhaustive)

Minimax


Example where we can do exhaustive search

The Game of Nim: At each move the player must divide a pile of tokens into 2 piles of different sizes. The first player who is unable to make a move loses the game.


The entire state space for a 7-token game.
Image courtesy of Ralph Morelli.
See Luger Fig 4.19


There are precisely 3 end states, 2 where person who starts wins, 1 where person who went 2nd wins:

Green - Person who starts wins.
Blue - Person who went 2nd wins.


Can exhaustively search the space.
Problem is figuring out what opponent will do.
MINIMAX - We are MAX - trying to maximise our score / move to best state.
Opponent is MIN - tries to minimise our score / move to worst state for us.


If opponent MIN goes first, then:
Green - MIN wins - state has value 0 for us.
Blue - MAX wins - state has value 1.



Propagate these values back up.
Image courtesy of Ralph Morelli.
See Luger Fig 4.20
Note 5-1-1 has value 0 not 1, because it is MIN's turn, and they will take us to 4-1-1-1 not to 3-2-1-1.
Whereas 6-1 has value 1 not 0 because it is our turn.

If it is MIN's turn, parent value is the worst of its children.
If it is MAX's turn, parent value is the best of its children.

Note 2nd player always wins!
Person who goes first can never win unless 2nd player makes error.
This is different if no. of tokens different to 7. See discussion.




4.4.2 Minimax (to fixed depth)

If cannot do exhaustive search, do minimax to predefined depth ("minimax to fixed ply depth", "n-ply look-ahead")
Leaves of graph here will probably not terminate in a win/loss. So have to give them heuristic estimate of their value.


Image courtesy of Ralph Morelli.
See Luger Fig 4.21
At MAX level, we will always pick best.
At MIN level, opponent will always pick worst for you.
Here, we (MAX) start. From start state, best we can do in 3 plies is get to state with estimated value 3.


Horizon effect: A state may look promising according to the heuristic (e.g. I capture one of opponent's pieces) but actually lead to disaster later on, beyond the horizon.
In general, cannot avoid such effects and such imperfect heuristics (a perfect heuristic would be an oracle telling you exactly how to win the game).
One strategy: Do some special extra deep searching for states that look exceptionally good.



4.4.3 Alpha-beta pruning

Alpha-beta pruning

Way of pruning the space.
Rather than search entire space to fixed depth, we can get rid of branches if we know they won't be picked.



Image courtesy of Ralph Morelli.
See Luger Fig 4.26

  1. See state B: It's our turn, so B will have value at least 5. But this is more than 3, so we know at state A, opponent will never pick B. B sub-tree can be pruned.
  2. At D, opponent's turn. They will choose 0 or something worse. But that means at C we will never choose D over A. D tree can be pruned.
  3. Likewise, E will be 2 or worse. So at C we will never choose E over A. E tree can be pruned.


It depends on the ordering how much can be pruned. If we search B after searching its sibling (of value 3), we can prune it, but not if it happened to come before.
A lucky ordering and you could prune a lot.
An unlucky ordering and you have to visit all states.

Q. Reverse all orderings.
See what prunings will then be done.