01 March 2019

MCTS Data Flow

In last week's post, Monte Carlo Tree Search (MCTS), I featured a video to explain the fundamental concepts of MCTS. Here's a diagram that takes the explanation several steps further. Note that it's based on AlphaGo Zero, but the concepts apply largely to AlphaZero as well.


Extracted from:
AlphaGo Zero Cheat Sheet
(applied-data.science; big PNG: 8333 x 7500)

I added the red numbers in parentheses to connect the left portion of the diagram, an inverted tree structure ('How AlphaGo Zero chooses its next move'), with the right portion, a step-by-step explanation of the left portion ('First, run the following simulation 1600 times...'). The following text is (mostly) copied from the 'cheat sheet'.

(1) Start at the root node of the tree (the current game state). Choose the action that maximizes 'Q + U'. Continue until a leaf node is reached

(2) The game state of the leaf node is passed into the neural network [NB: the tall stack of blocks; explained elsewhere on the 'cheat sheet'], which outputs predictions about two things:-
      • P - Move probabilities
      • V - Value of the state

(3) The move probabilities (P) are attached to the new feasible actions from the leaf node.

(4) Backup previous edges. Each edge that was traversed to get to the leaf node is updated.

'After 1600 simulations'

(5) The move can either be chosen:-
      • Deterministically (for competitive play)
      • Stochastically (for exploratory play)

Values like 'N', 'W', 'Q', and 'P' are explained in the notes to the diagram. After studying the diagram, I felt that I was finally starting to understand the basic concepts of MCTS. How do they work with AlphaZero? That will require further study.

No comments: