Thinking Fast and Slow with Deep Learning and Tree Search
Thinking Fast and Slow with Deep Learning and Tree Search
Think fast and slow says that human mind has two systems system 1 and system2.
System 1: intuition or heuristic process.
- Eg: Reinforce, DQN
System 2: conscious, explicit and rule-based mode of reasoning
- Tree Search
Expert Iteration - System 1 to make selection with no look over head and System 2 to search possible
search policies and suggest a strong policies
Imitation Learnning : Apprentice is trained to imitate the behaviour of expert policy
Reference:
Read 2.Preliminaries for good explanation
A search tree is created while simulation is been run on the game, which has two phases: tree phase and roll out phase.
Each simulation consists of two parts. First, a tree phase, where the tree is traversed by taking actions according to a tree policy. Second, a rollout phase, where some default policy is followed until the simulation reaches a terminal game state. The result returned by this simulation can then be used to update estimates of the value of each node traversed in the tree during the first phase.
At each node the algo store n(s), the number of iterations in which the node has been visited so far. Each edge stores both n(s; a), the number of times it has been traversed, and r(s; a) the sum of all rewards obtained in simulations that passed through the edge.