Monte Carlo Tree Search Tutorial | DeepMind AlphaGo
Introduction
A best of five game series, $1 million dollars in prize money A high stakes shootout. Between 9 and 15 March, 2016, the second-highest ranked Go player, Lee Sidol, took on a computer program named AlphaGo.
AlphaGo emphatically outplayed and outclassed Mr. Sidol and won the series 4-1. Designed by Googles DeepMind, the program has spawned many other developments in AI, including AlphaGo Zero. These breakthroughs are widely considered as stepping stones towards Artificial General Intelligence (AGI).
In this article, I will introduce you to the algorithm at the heart of AlphaGo Monte Carlo Tree Search (MCTS). This algorithm has one main purpose given the state of a game, choose the most promising move.
To give you some context behind AlphaGo, well first briefly look at the history of game playing AI programs. Then, well see the components of AlphaGo, the Game Tree Concept, a few tree search algorithm, and finally dive into how the MCTS algorithm works.
AI is a vast and complex field. But before AI officially became a recognized body of work, early pioneers in computer science wrote game-playing programs to test whether computers could solve human-intelligence level tasks.
To give you a sense of where Game Playing AI started from and its journey till date, I have put together the below key historical developments:
And this is just skimming the surface! There are plenty of other examples where AI programs exceeded expectations. But this should give you a fair idea of where we stand today.
The core parts of the Alpha Go comprise of:
In this blog, we willfocus on the working of Monte Carlo Tree Searchonly. This helps AlphaGo and AlphaGo Zero smartly explore and reach interesting/good states in a finite time period which in turn helps the AI reach human level performance.
Its application extends beyond games. MCTS can theoretically be applied to any domain that can be described in terms of {state,action} pairs and simulation used to forecast outcomes. Dont worry if this sounds too complex right now, well break down all these concepts in this article.
Game Trees are the most well known data structures that can represent a game. This concept is actually pretty straightforward.
Each node of a game tree represents a particular state in a game. On performing a move, one makes a transition from a node to its children. The nomenclature is very similar to decision trees wherein the terminal nodes are called leaf nodes.
For example, in the above tree, each move is equivalent to putting a cross at different positions. This branches into various other states where a zero is put at each position to generate new states. This process goes on until the leaf node is reached where the win-loss result becomes clear.
Our primary objective behind designing these algorithms is to find best the path to follow in order to win the game. In other words, look/search for a way of traversing the tree that finds the best nodes to achieve victory.
The majority of AI problems can be cast as search problems, which can be solved by finding the best plan, path, model or function.
Tree search algorithms can be seen as building a search tree:
The tree branches out because there are typically several different actions that can be taken in a given state. Tree search algorithms differ depending on which branches are explored and in what order.
Lets discuss a few tree search algorithms.
Uninformed Search algorithms, as the name suggests, search a state space without any further information about the goal. These are considered basic computer science algorithms rather than as a part of AI. Two basic algorithms that fall under this type of search are depth first search (DFS) and breadth first search (BFS). You can read more about them in this blog post.
The Best First Search (BFS) method explores a graph by expanding the most promising node chosen according to a specific rule.The defining characteristic of this search is that, unlikeDFSorBFS(which blindly examine/expand a cell without knowing anything about it), BFS uses an evaluation function (sometimes called a heuristic) to determine which node is the most promising, and then examines this node.
For example, A* algorithm keeps a list of open nodes which are next to an explored node. Note that these open nodes have not been explored. For each open node, an estimate of its distance from the goal is made. New nodes are chosen to explore based on the lowest cost basis, where the cost is the distance from the origin node plus the estimate of the distance to the goal.
For single-player games, simple uninformed or informed search algorithms can be used to find a path to the optimal game state. What should we do for two-player adversarialgames where there is another player to account for? The actions of both players depend on each other.
For these games, we rely on adversarial search. This includes the actions of two (or more) adversarial players. The basic adversarial search algorithm is called Minimax.
This algorithm has been used very successfully for playing classic perfect-information two-player board games such as Checkers and Chess. In fact, it was (re)invented specifically for the purpose of building a chess-playing program.
The core loop of the Minimax algorithm alternates between player 1 and player 2, quite like the white and black players in chess. These are called the min player and the max player. All possible moves are explored for each player.
For each resulting state, all possible moves by the other player are also explored. This goes on until all possible move combinations have been tried out to the point where the game ends (with a win, loss or draw). The entire game tree is generated through this process, from the root node down to the leaves:
Each node is explored to find the moves that give us the maximum value or score.
Games like tic-tac-toe, checkers and chess can arguably be solved using the minimax algorithm. However, things can get a little tricky when there are a large number of potential actions to be taken at each state. This is because minimax explores all the nodes available. It can become frighteningly difficult to solve a complex game like Go in a finite amount of time.
Go has a branching factor of approximately 300 i.e. from each state there are around 300 actions possible, whereas chess typically has around 30 actions to choose from. Further, the positional nature of Go, which is all about surrounding the adversary, makes it very hard to correctly estimate the value of a given board state. For more information on rules for Go, please refer this link.
There are several other games with complex rules that minimax is ill-equipped to solve. These include Battleship Poker with imperfect information and non-deterministic games such as Backgammon and Monopoly. Monte Carlo Tree Search, invented in 2007, provides a possible solution.
The basic MCTS algorithm is simple: a search tree is built, node-by-node, according to the outcomes of simulated playouts. The process can be broken down into the following steps:
Before we delve deeper and understand tree traversal and node expansion, lets get familiar with a few terms.
UCB Value
UCB1, or upper confidence bound for a node, is given by the following formula:
where,
What do we mean by a rollout? Until we reach the leaf node, we randomly choose an action at each step and simulate this action to receive an average reward when the game is over.
Flowchart for Monte Carlo Tree Search
Tree Traversal & Node Expansion
You start with S0, which is the initial state. If the current node is not a leaf node, we calculate the values for UCB1 and choose the node that maximises the UCB value. We keep doing this until we reach the leaf node.
Next, we ask how many times this leaf node was sampled. If its never been sampled before, we simply do a rollout (instead of expanding). However, if it has been sampled before, then we add a new node (state) to the tree for each available action (which we are calling expansion here).
Your current node is now this newly created node. We then do a rollout from this step.
Lets do a complete walkthrough of the algorithm to truly ingrain this concept and understand it in a lucid manner.
Iteration 1:
Initial State
Rollout from S1
Post Backpropogation
The way MCTS works is that we run it for a defined number of iterations or until we are out of time. This will tell us what is the best action at each step that one should take to get the maximum return.
Iteration 2:
Backpropogation from S2
Iteration 3:
Iteration 4:
That is the gist of this algorithm. We can perform more iterations as long as required (or is computationally possible). The underlying idea is thatthe estimate of values at each node becomes more accurate as the number of iterations keep increasing.
Deepminds AlphaGo and AlphaGo Zero programs are far more complex with various other facets that are outside the scope of this article. However, the Monte Carlo Tree Search algorithm remains at the heart of it. MCTS plays the primary role in making complex games like Go easier to crack in a finite amount of time. Some open source implementations of MCTS are linked below:
Implementation in Python
Implementation in C++
I expect reinforcement learning to make a lot of headway in 2019. It wont be surprising to see a lot more complex games being cracked by machines soon. This is a great time to learn reinforcement learning!
I would love to hear your thoughts and suggestions regarding this article and this algorithm in the comments section below. Have you used this algorithm before? If not, which game would you want to try it out on?
Related
See the article here:
Monte Carlo Tree Search Tutorial | DeepMind AlphaGo
- AlphaGo led Lee 4-1 in March 2016. One round Lee Se-dol won remains the last round in which a man be.. - - December 5th, 2024 [December 5th, 2024]
- Koreans picked Google Artificial Intelligence (AI) AlphaGo as an image that comes to mind when they .. - MK - - March 16th, 2024 [March 16th, 2024]
- DeepMind AI rivals the world's smartest high schoolers at geometry - Ars Technica - January 20th, 2024 [January 20th, 2024]
- Why top AI talent is leaving Google's DeepMind - Sifted - November 20th, 2023 [November 20th, 2023]
- Who Is Ilya Sutskever, Meet The Man Who Fired Sam Altman - Dataconomy - November 20th, 2023 [November 20th, 2023]
- Microsoft's LLM 'Everything Of Thought' Method Improves AI ... - AiThority - November 20th, 2023 [November 20th, 2023]
- Absolutely, here's an article on the impact of upcoming technology - Medium - November 20th, 2023 [November 20th, 2023]
- AI: Elon Musk and xAI | Formtek Blog - Formtek Blog - November 20th, 2023 [November 20th, 2023]
- Rise of the Machines Exploring the Fascinating Landscape of ... - TechiExpert.com - November 20th, 2023 [November 20th, 2023]
- What can the current EU AI approach do to overcome the challenges ... - Modern Diplomacy - November 20th, 2023 [November 20th, 2023]
- If I had to pick one AI tool... this would be it. - Exponential View - November 20th, 2023 [November 20th, 2023]
- For the first time, AI produces better weather predictions -- and it's ... - ZME Science - November 20th, 2023 [November 20th, 2023]
- Understanding the World of Artificial Intelligence: A Comprehensive ... - Medium - October 17th, 2023 [October 17th, 2023]
- On AI and the soul-stirring char siu rice - asianews.network - October 17th, 2023 [October 17th, 2023]
- Nvidias Text-to-3D AI Tool Debuts While Its Hardware Business Hits Regulatory Headwinds - Decrypt - October 17th, 2023 [October 17th, 2023]
- One step closer to the Matrix: AI defeats human champion in Street ... - TechRadar - October 17th, 2023 [October 17th, 2023]
- The Vanishing Frontier - The American Conservative - October 17th, 2023 [October 17th, 2023]
- Alphabet: The complete guide to Google's parent company - Android Police - October 17th, 2023 [October 17th, 2023]
- How AI and ML Can Drive Sustainable Revenue Growth by Waleed ... - Digital Journal - October 9th, 2023 [October 9th, 2023]
- The better the AI gets, the harder it is to ignore - BSA bureau - October 9th, 2023 [October 9th, 2023]
- What If the Robots Were Very Nice While They Took Over the World? - WIRED - September 27th, 2023 [September 27th, 2023]
- From Draughts to DeepMind (Scary Smart) | by Sud Alogu | Aug, 2023 - Medium - August 5th, 2023 [August 5th, 2023]
- The Future of Competitive Gaming: AI Game Playing AI - Fagen wasanni - August 5th, 2023 [August 5th, 2023]
- AI's Transformative Impact on Industries - Fagen wasanni - August 5th, 2023 [August 5th, 2023]
- Analyzing the impact of AI in anesthesiology - INDIAai - August 5th, 2023 [August 5th, 2023]
- Economic potential of generative AI - McKinsey - June 20th, 2023 [June 20th, 2023]
- The Intersection of Reinforcement Learning and Deep Learning - CityLife - June 20th, 2023 [June 20th, 2023]
- Chinese AI Giant SenseTime Unveils USD559 Robot That Can Play ... - Yicai Global - June 20th, 2023 [June 20th, 2023]
- Cyber attacks on AI a problem for the future - Verdict - June 20th, 2023 [June 20th, 2023]
- Taming AI to the benefit of humans - Asia News NetworkAsia News ... - asianews.network - May 20th, 2023 [May 20th, 2023]
- Evolutionary reinforcement learning promises further advances in ... - EurekAlert - May 20th, 2023 [May 20th, 2023]
- Commentary: AI's successes - and problems - stem from our own ... - CNA - May 20th, 2023 [May 20th, 2023]
- Machine anxiety: How to reduce confusion and fear about AI technology - Thaiger - May 20th, 2023 [May 20th, 2023]
- We need more than ChatGPT to have true AI. It is merely the first ingredient in a complex recipe - Freethink - May 20th, 2023 [May 20th, 2023]
- Taming AI to the benefit of humans - Opinion - Chinadaily.com.cn - China Daily - May 16th, 2023 [May 16th, 2023]
- To understand AI's problems look at the shortcuts taken to create it - EastMojo - May 16th, 2023 [May 16th, 2023]
- Terence Tao Leads White House's Generative AI Working Group ... - Pandaily - May 16th, 2023 [May 16th, 2023]
- Why we should be concerned about advanced AI - Epigram - May 16th, 2023 [May 16th, 2023]
- Purdue President Chiang to grads: Let Boilermakers lead in ... - Purdue University - May 16th, 2023 [May 16th, 2023]
- 12 shots at staying ahead of AI in the workplace - pharmaphorum - May 16th, 2023 [May 16th, 2023]
- Hypotheses and Visions for an Intelligent World - Huawei - May 16th, 2023 [May 16th, 2023]
- Cloud storage is the key to unlocking AI's full potential for businesses - TechRadar - May 16th, 2023 [May 16th, 2023]
- The Quantum Frontier: Disrupting AI and Igniting a Patent Race - Lexology - April 19th, 2023 [April 19th, 2023]
- Putin and Xi seek to weaponize Artificial Intelligence against America - FOX Bangor/ABC 7 News and Stories - April 19th, 2023 [April 19th, 2023]
- The Future of Generative Large Language Models and Potential ... - JD Supra - April 19th, 2023 [April 19th, 2023]
- A Chatbot Beat the SAT. What Now? - The Atlantic - March 23rd, 2023 [March 23rd, 2023]
- Exclusive: See the cover for Benjamn Labatut's new novel, The ... - Literary Hub - March 23rd, 2023 [March 23rd, 2023]
- These companies are creating ChatGPT alternatives - Tech Monitor - March 23rd, 2023 [March 23rd, 2023]
- Google's AlphaGo AI Beats Human Go Champion | PCMag - February 24th, 2023 [February 24th, 2023]
- AlphaGo: using machine learning to master the ancient game of Go - Google - February 10th, 2023 [February 10th, 2023]
- AI Behind AlphaGo: Machine Learning and Neural Network - February 10th, 2023 [February 10th, 2023]
- Google AlphaGo: How a recreational program will change the world - February 10th, 2023 [February 10th, 2023]
- Computer Go - Wikipedia - November 22nd, 2022 [November 22nd, 2022]
- AvataGo's Metaverse AR Environment will be Your Eternal Friend - Digital Journal - September 17th, 2022 [September 17th, 2022]
- This AI-Generated Artwork Won 1st Place At Fine Arts Contest And Enraged Artists - Bored Panda - September 3rd, 2022 [September 3rd, 2022]
- The best performing from AI in blockchain games, a new DRL model published by rct AI based on training AI in Axie Infinity, AI surpasses the real... - September 3rd, 2022 [September 3rd, 2022]
- Three Methods Researchers Use To Understand AI Decisions - RTInsights - August 20th, 2022 [August 20th, 2022]
- What is my chatbot thinking? Nothing. Here's why the Google sentient bot debate is flawed - Diginomica - August 7th, 2022 [August 7th, 2022]
- Opinion: Can AI be creative? - Los Angeles Times - August 2nd, 2022 [August 2nd, 2022]
- AI predicts the structure of all known proteins and opens a new universe for science - EL PAS USA - August 2nd, 2022 [August 2nd, 2022]
- What is Ethereum Gray Glacier? Should you be worried? - Cryptopolitan - June 24th, 2022 [June 24th, 2022]
- How AI and human intelligence will beat cancer - VentureBeat - June 19th, 2022 [June 19th, 2022]
- Race-by-race tips and preview for Newcastle on Monday - Sydney Morning Herald - June 19th, 2022 [June 19th, 2022]
- A gentle introduction to model-free and model-based reinforcement learning - TechTalks - June 13th, 2022 [June 13th, 2022]
- The role of 'God' in the 'Matrix' - Analytics India Magazine - June 3rd, 2022 [June 3rd, 2022]
- The Powerful New AI Hardware of the Future - CDOTrends - June 3rd, 2022 [June 3rd, 2022]
- The 50 Best Documentaries of All Time 24/7 Wall St. - 24/7 Wall St. - June 3rd, 2022 [June 3rd, 2022]
- How Could AI be used in the Online Casino Industry - Rebellion Research - April 12th, 2022 [April 12th, 2022]
- 5 Times Artificial Intelligence Have Busted World Champions - Analytics Insight - April 2nd, 2022 [April 2nd, 2022]
- The Guardian view on bridging human and machine learning: its all in the game - The Guardian - April 2nd, 2022 [April 2nd, 2022]
- How to Strengthen America's Artificial Intelligence Innovation - The National Interest - April 2nd, 2022 [April 2nd, 2022]
- Why it's time to address the ethical dilemmas of artificial intelligence - Economic Times - April 2nd, 2022 [April 2nd, 2022]
- About - Deepmind - March 18th, 2022 [March 18th, 2022]
- Experts believe a neuro-symbolic approach to be the next big thing in AI. Does it live up to the claims? - Analytics India Magazine - March 18th, 2022 [March 18th, 2022]
- Measuring Attention In Science And Technology - Forbes - March 18th, 2022 [March 18th, 2022]
- The Discontents Of Artificial Intelligence In 2022 - Inventiva - March 16th, 2022 [March 16th, 2022]
- Is AI the Future of Sports? - Built In - March 5th, 2022 [March 5th, 2022]
- This is the reason Demis Hassabis started DeepMind - MIT Technology Review - February 28th, 2022 [February 28th, 2022]
- Sony's AI system outraces some of the world's best e-sports drivers | The Asahi Shimbun: Breaking News, Japan News and Analysis - Asahi Shimbun - February 28th, 2022 [February 28th, 2022]
- SysMoore: The Next 10 Years, The Next 1,000X In Performance - The Next Platform - February 28th, 2022 [February 28th, 2022]