How AlphaZero Learns Chess – Chess.com
AlphaZero's learning process is, to some extent, similar to that of humans. A new paper from DeepMind, which includes a contribution from the 14th world chess champion Vladimir Kramnik, provides strong evidence for the existence of human-understandable concepts in AlphaZero's network, even though AlphaZero has never seen a human game of chess.
How does AlphaZero learn chess? Why does it make certain moves? What values does it give to concepts such as king safety or mobility? How does it learn openings, and how is that different from how humans developed opening theory?
Questions like these are being discussed in a fascinating new paper by DeepMind, titled Acquisition of Chess Knowledge in AlphaZero. It was written by Thomas McGrath, Andrei Kapishnikov, Nenad Tomasev, Adam Pearce, Demis Hassabis, Been Kim, and Ulrich Paquet together with Kramnik. It is the second cooperation between DeepMind and Kramnik, after their research from last year when they used AlphaZero to explore the design of different variants of the game of chess, with different sets of rules.
In their latest paper, the researchers tried a method for encoding human conceptual knowledge, to determine the extent to which the AlphaZero network represents human chess concepts. Examples of such concepts are the bishop pair, material (im)balance, mobility, or king safety. These concepts have in common that they are pre-specified functions that encapsulate a particular piece of domain-specific knowledge.
Some of these concepts were taken from Stockfish 8's evaluation function, such as material, imbalance, mobility, king safety, threats, passed pawns, and space. Stockfish 8 uses these as sub-functions that give individual scores leading to a "total" evaluation that is exported as a continuous value, such as "0.25" (a slight advantage to White) or "-1.48" (a big advantage to Black). Note that more recent versions of Stockfish have developed into Alpha-Zero-like neural networks but were not used for this paper.
The third type of concepts encapsulates more specific lower-level features, such as the existence of forks, pins, or contested files, as well as a range of features regarding pawn structure.
Having established this wide array of human concepts, the next step for the researchers was to try and find them within the AlphaZero network, for which they used a sparse linear regression model. After that, they started visualizing the human concept learning with what they call what-when-where plots: what concept is learned when in training time where in the network.
According to the researchers, AlphaZero indeed develops representations that are closely related to a number of human concepts over the course of training, including high-level evaluation of the position, potential moves and consequences, and specific positional features.
One interesting result was about material imbalance.As was demonstrated in Matthew Sadler and Natasha Regan's award-winning book Game Changer: AlphaZeros Groundbreaking Chess Strategies and the Promise of AI (New In Chess, 2019), AlphaZero seems to view material imbalance differently from Stockfish 8. The paper gives empirical evidence that this is the case at the representational level: AlphaZero initially "follows" Stockfish 8's evaluation of material more and more during its training, but at some point, it turns away from it again.
The next step for the researchers was to relate the human concepts to AlphaZero's value function. One of the first concepts they looked at was piece value, something a beginner will first learn when starting to play chess. The classical values are nine for a queen, five for a rook, three for both the bishop and knight, and one for a pawn. The left figure below (taken from the paper) shows the evolution of piece weights during AlphaZero's training, with piece values converging towards commonly-accepted values.
The image on the right shows that during AlphaZero's training, material becomes more and more important in the early stages of learning chess (consistent to human learning) but it reaches a plateau and at some point, the values of more subtle concepts such as mobility and king safety are becoming more important while material actually decreases in importance.
Another part of the paper is dedicated to comparing AlphaZero's training to the progression of human knowledge over history. The researchers point out that there is a marked difference between AlphaZeros progression of move preferences through its history of training steps, and what is known of the progression of human understanding of chess since the 15th century:
AlphaZero starts with a uniform opening book, allowing it to explore all options equally, and largely narrows down plausible options over time. Recorded human games over the last five centuries point to an opposite pattern: an initial overwhelming preference for 1.e4, with an expansion of plausible options over time.
The researchers compare the games AlphaZero is playing against itself with a large sample taken from the ChessBase Mega Database, starting with games from the year 1475 up till the 21st century.
Humans initially played 1.e4 almost exclusively but 1.d4 was slightly more popular in the early 20th century, soon followed by the increasing popularity of more flexible systems like 1.c4 and 1.Nf3. AlphaZero, on the other hand, tries out a wide array of opening moves in the early stage of its training before starting to value the "main" moves higher.
A more specific example provided is about the Berlin variation of the Ruy Lopez (the move 3...Nf6 after 1.e4 e5 2.Nf3 Nc6 3.Bb5), which only became popular at the top level early 21st century, after Kramnik successfully used it in his world championship match with GM Garry Kasparov in 2000. Before that, it was considered to be somewhat passive and slightly better for White with the move 3...a6 being preferable.
The researchers write:
Looking back in time, it took a while for human chess opening theory to fully appreciate the benefits of Berlin defense and to establish effective ways of playing with Black in this position. On the other hand, AlphaZero develops a preference for this line of play quite rapidly, upon mastering the basic concepts of the game. This already highlights a notable difference in opening play evolution between humans and the machine.
Remarkably, when different versions of AlphaZero are trained from scratch, half of them strongly prefer 3 a6, while the other half strongly prefer 3 Nf6! It is interesting as it means that there is no "unique good chess player. The following table shows the preferences of four different AlphaZero neural networks:
The AlphaZero prior network preferences after 1. e4 e5 2. Nf3 Nc6 3. Bb5, for four different training runs of the system (four different versions of AlphaZero). The prior is given after one million training steps. Sometimes AlphaZero converges to become a player that prefers 3 a6, and sometimes AlphaZero converges to become a player that prefers to respond with 3 Nf6.
In a similar vein, AlphaZero develops its own opening "theory" for a much wider array of openings over the course of its training. At some point, 1.d4 and 1.e4 are discovered to be good opening moves and are rapidly adopted. Similarly, AlphaZero's preferred continuation after 1.e4 e5 is determined in another short temporal window. The figure below illustrates how both 2.d4 and 2.Nf3 are quickly learned as reasonable White moves, but 2.d4 is then dropped almost as quickly in favor of 2.Nf3 as a standard reply.
Kramnik's contribution to the paper is a qualitative assessment, as an attempt to identify themes and differences in the style of play of AlphaZero at different stages of its training. The 14th world champion was provided sample games from four different stages to look at.
According to Kramnik, in the early training stage, AlphaZero has "a crude understanding of material value and fails to accurately assess material in complex positions. This leads to potentially undesirable exchange sequences, and ultimately losing games on material." In the second stage, AlphaZero seemed to have "a solid grasp on material value, thereby being able to capitalize on the material assessment weakness" of the early version.
In the third stage, Kramnik feels that AlphaZero has a better understanding of king safety in imbalanced positions. This manifests in the second version "potentially underestimating the attacks and long-term material sacrifices of the third version, as well as the second version overestimating its own attacks, resulting in losing positions."
In its fourth stage of the training, has a "much deeper understanding" of which attacks will succeed and which would fail. Kramnik notices that it sometimes accepts sacrifices played by the "third version," proceeds to defend well, keep the material advantage, and ultimately converts to a win.
Another point Kramnik makes, which feels similar to how humans learn chess, is that tactical skills appear to precede positional skills as AlphaZero learns. By generating self-play games over separate opening sets (e.g. the Berlin or the Queen's Gambit Declined in the "positional" set and the Najdorf and King's Indian in the "tactical" set), the researchers manage to provide circumstantial evidence but note that further work is needed to understand the order in which skills are acquired.
For a long time, it was believed that machine-learning systems learn uninterpretable representations that have little in common with human understanding of the domain they are trained on. In other words, how and what AI teaches itself is mostly gibberish to humans.
With their latest paper, the researchers have provided strong evidence for the existence of human-understandable concepts in an AI system that wasn't exposed to human-generated data. AlphaZero's network shows the use of human concepts, even though AlphaZero has never seen a human game of chess.
This might have implications outside the chess world. The researchers conclude:
The fact that human concepts can be located even in a superhuman system trained by self-play broadens the range of systems in which we should expect to find human-understandable concepts. We believe that the ability to find human-understandable concepts in the AZ network indicates that a closer examination will reveal more.
Co-author Nenad Tomasev commented to Chess.com that for him personally, he was really curious to consider if there is such a thing as a "natural" progression of chess theory:
Even in the human contextif we were to 'restart' history, go back in timewould the theory of chess have developed in the same way? There were a number of prominent schools of thought in terms of the overall understanding of chess principles and middlegame positions: the importance of dynamism vs. structure, material vs. sacrificial attacks, material imbalance, the importance of space vs. the hypermodern school that invites overextension in order to counterattack, etc. This also informed the openings that were played. Looking at this progression, what remains unclear is whether it would have happened the same way again. Maybe some pieces of chess knowledge and some perspectives are simply easier and more natural for the human mind to grasp and formulate? Maybe the process of refining them and expanding them has a linear trajectory, or not? We can't really restart history, so we can only ever guess what the answer might be.
However, when it comes to AlphaZero, we can retrain it many timesand also compare the findings to what we have previously seen in human play. We can therefore use AlphaZero as a Petri dish for this question, as we look at how it acquires knowledge about the game. As it turns out, there are both similarities and dissimilarities in how it builds its understanding of the game compared to human history. Also, while there is some level of stability (results being in agreement across different training runs), it is by no means absolute (sometimes the training progression looks a little bit different, and different opening lines end up being preferred).
Now, this is by no means a definitive answer to what is, to me personally, a fascinating question. There is still plenty to think about here. Yet, we hope that our results provide an interesting perspective and make it possible for us to start thinking a bit deeper about how we learn, grow, improvethe very nature of intelligence and how it goes all the way from a blank slate to what is a deep understanding of a very complex domain like chess.
Kramnik commented to Chess.com:
"There are two major things which we can try to find out with this work. One is: how does AlphaZero learn chess, how does it improve? That is actually quite important. If we manage one day to understand it fully, then maybe we can interpret it into the human learning process.
Secondly, I believe it is quite fascinating to discover that there are certain patterns that AlphaZero finds meaningful, which actually make little sense for humans. That is my impression. That actually is a subject for further research, in fact, I was thinking that it might easily be that we are missing some very important patterns in chess, because after all, AlphaZero is so strong that if it uses those patterns, I suspect they make sense. That is actually also a very interesting and fascinating subject to understand, if maybe our way of learning chess, of improving in chess, is actually quite limited. We can expand it a bit with the help of AlphaZero, of understanding how it sees chess."
Read this article:
How AlphaZero Learns Chess - Chess.com
- Game Changer: AlphaZero's Groundbreaking Chess Strategies and the Promise of AI by Mathew Sadler and Natasha Regan - ChessBase India - December 18th, 2024 [December 18th, 2024]
- Demis Hassabis - when the chess prodigy won the Nobel Prize in Chemistry - Chess.com - October 14th, 2024 [October 14th, 2024]
- AI Could Learn a Thing or Two From Rat Brains - The Daily Beast - November 13th, 2023 [November 13th, 2023]
- Episode What sets great teams apart | Lane Shackleton (CPO of Coda) - Mirchi Plus - October 1st, 2023 [October 1st, 2023]
- The timeless charm of of 'Chaturanga' - Daily Pioneer - October 1st, 2023 [October 1st, 2023]
- Creating New Stories That Don't Suck - Hollywood in Toto - October 1st, 2023 [October 1st, 2023]
- AI Agents: Adapting to the Future of Software Development - ReadWrite - October 1st, 2023 [October 1st, 2023]
- The Race for AGI: Approaches of Big Tech Giants - Fagen wasanni - July 30th, 2023 [July 30th, 2023]
- Book Review: Re-engineering the Chess Classics by GM Matthew ... - Chess.com - June 4th, 2023 [June 4th, 2023]
- The Sparrow Effect: How DeepMind is Rewriting the AI Script - CityLife - June 4th, 2023 [June 4th, 2023]
- Vitalik Buterin Exclusive Interview: Longevity, AI and More - Lifespan.io News - June 4th, 2023 [June 4th, 2023]
- How to play chess against ChatGPT (and why you probably shouldn't) - Android Authority - May 29th, 2023 [May 29th, 2023]
- Weekend Movers - Conflux (CFX) and Klaytn (KLAY) - Securities.io - May 16th, 2023 [May 16th, 2023]
- How technology reinvented chess as a global social network - Financial Times - May 8th, 2023 [May 8th, 2023]
- Our moral panic over AI - The Spectator Australia - April 13th, 2023 [April 13th, 2023]
- Liability Considerations for Superhuman (and - Fenwick & West LLP - April 13th, 2023 [April 13th, 2023]
- Aston by-election minus one day The Poll Bludger - The Poll Bludger - April 2nd, 2023 [April 2nd, 2023]
- No-Castling Masters: Kramnik and Caruana will play in Dortmund - ChessBase - March 26th, 2023 [March 26th, 2023]
- AI is teamwork Bits&Chips - Bits&Chips - March 20th, 2023 [March 20th, 2023]
- Resolve Strategic nuclear subs poll (open thread) The Poll Bludger - The Poll Bludger - March 20th, 2023 [March 20th, 2023]
- AI Topic: AlphaZero, ChatGPT, Bard, Stable Diffusion and more! - February 24th, 2023 [February 24th, 2023]
- AlphaZero Tackles Chess Variants - by Dennis Monokroussos - February 20th, 2023 [February 20th, 2023]
- AlphaZero Vs. Stockfish 8 | AI Is Conquering Computer Chess - February 10th, 2023 [February 10th, 2023]
- Stockfish (chess) - Wikipedia - November 22nd, 2022 [November 22nd, 2022]
- AlphaZero Chess Engine: The Ultimate Guide - October 14th, 2022 [October 14th, 2022]
- Whos going to save us from bad AI? - MIT Technology Review - October 14th, 2022 [October 14th, 2022]
- DeepMinds game-playing AI has beaten a 50-year-old record in computer science - MIT Technology Review - October 8th, 2022 [October 8th, 2022]
- The Download: TikTok moral panics, and DeepMinds record-breaking AI - MIT Technology Review - October 8th, 2022 [October 8th, 2022]
- Top 5 stories of the week: DeepMind and OpenAI advancements, Intels plan for GPUs, Microsofts zero-day flaws - VentureBeat - October 8th, 2022 [October 8th, 2022]
- Taxing times (open thread) The Poll Bludger - The Poll Bludger - October 8th, 2022 [October 8th, 2022]
- AlphaGo Zero Explained In One Diagram | by David Foster - Medium - October 1st, 2022 [October 1st, 2022]
- A chess scandal brings fresh attention to computers role in the game - The Record by Recorded Future - October 1st, 2022 [October 1st, 2022]
- Meta AI Boss: current AI methods will never lead to true intelligence - Gizchina.com - October 1st, 2022 [October 1st, 2022]
- Meta's AI guru LeCun: Most of today's AI approaches will never lead to true intelligence - ZDNet - September 24th, 2022 [September 24th, 2022]
- Stockfish - Chess Engines - Chess.com - September 9th, 2022 [September 9th, 2022]
- DeepMinds AlphaFold could be the future of science and AI - Vox.com - August 7th, 2022 [August 7th, 2022]
- Correspondence chess server, Go (weiqi) games online - FICGS - July 4th, 2022 [July 4th, 2022]
- Chennai Chess Olympiad and AI - Analytics India Magazine - June 28th, 2022 [June 28th, 2022]
- Yann LeCun has a bold new vision for the future of AI - MIT Technology Review - June 28th, 2022 [June 28th, 2022]
- Special Street Fighter 35th anniversary website launched, features impressive timeline of game release dates over the years - EventHubs - June 28th, 2022 [June 28th, 2022]
- The Nightmarish Frontier of AI in Chess - uschess.org - June 19th, 2022 [June 19th, 2022]
- Four Draws in Round Three of 2022 Candidates | US Chess.org - uschess.org - June 19th, 2022 [June 19th, 2022]
- Part 1: A Realistic Framing Of The Progress In Artificial Intelligence - Investing.com UK - June 19th, 2022 [June 19th, 2022]
- Who Will Win The Candidates: The Case For Each Player - Chess.com - June 13th, 2022 [June 13th, 2022]
- A tale of two universities and two engines - Chess News - March 22nd, 2022 [March 22nd, 2022]
- AlphaZero (And Other!) Chess Variants Now Available For Everyone - Chess.com - March 20th, 2022 [March 20th, 2022]
- How AI is impacting the video game industry - ZME Science - December 17th, 2021 [December 17th, 2021]
- Q&A: How Speechmatics is leading the way in tackling AI bias and improving inclusion - Information Age - November 4th, 2021 [November 4th, 2021]
- AlphaGo | DeepMind - October 22nd, 2021 [October 22nd, 2021]
- Leela Zero - Wikipedia - October 22nd, 2021 [October 22nd, 2021]
- Leela Chess Zero - Wikipedia - October 22nd, 2021 [October 22nd, 2021]
- How AI is reinventing what computers are - MIT Technology Review - October 22nd, 2021 [October 22nd, 2021]
- graphneural.network - Spektral - October 12th, 2021 [October 12th, 2021]
- MuZero - Wikipedia - October 12th, 2021 [October 12th, 2021]
- Bin Yu - October 12th, 2021 [October 12th, 2021]
- A general reinforcement learning algorithm that masters ... - August 29th, 2021 [August 29th, 2021]
- What would it be like to be a conscious AI? We might never know. - MIT Technology Review - August 29th, 2021 [August 29th, 2021]
- AlphaZero to analyse no-castling match of the champions - Chessbase News - July 13th, 2021 [July 13th, 2021]
- How This Startup Aims to Disrupt Copywriting Forever - Inc. - June 6th, 2021 [June 6th, 2021]
- Between Games and Apocalyptic Robots: Considering Near-Term Societal Risks of Reinforcement - Medium - April 17th, 2021 [April 17th, 2021]
- Trapping the queen - Chessbase News - April 17th, 2021 [April 17th, 2021]
- AI 101: All the Ways AI Could Improve or End Our World - Interesting Engineering - April 2nd, 2021 [April 2nd, 2021]
- Quick Scripts AlphaZero - February 17th, 2021 [February 17th, 2021]
- How to Kickstart an AI Venture Without Proprietary Data - Medium - February 17th, 2021 [February 17th, 2021]
- Street Fighter V: What to Expect After the Winter Update | CBR - CBR - Comic Book Resources - February 17th, 2021 [February 17th, 2021]
- This AI chess engine aims to help human players rather than defeat them - The Next Web - February 1st, 2021 [February 1st, 2021]
- Open source at Facebook: 700 repositories and 1.3 million followers - ZDNet - February 1st, 2021 [February 1st, 2021]
- Scientists say dropping acid can help with social anxiety and alcoholism - The Next Web - February 1st, 2021 [February 1st, 2021]
- AlphaZero - Chess Engines - Chess.com - November 21st, 2020 [November 21st, 2020]
- AlphaZero: Shedding new light on chess, shogi, and Go ... - November 21st, 2020 [November 21st, 2020]
- The art of chess: a brief history of the World Championship - TheArticle - November 21st, 2020 [November 21st, 2020]
- Podcast: Can you teach a machine to think? - MIT Technology Review - November 15th, 2020 [November 15th, 2020]
- Retired Chess Grandmaster, AlphaZero AI Reinvent Chess - Science Times - September 17th, 2020 [September 17th, 2020]
- DeepMind's AI is helping to re-write the rules of chess - ZDNet - September 17th, 2020 [September 17th, 2020]
- AI messed up mentally stimulating games. Right now it is actually creating the video game wonderful once again - Publicist Recorder - September 17th, 2020 [September 17th, 2020]
- A|I: The AI Times Surveillance mandated - BetaKit - September 17th, 2020 [September 17th, 2020]
- Starting on Friday: Chess 9LX with Carlsen and Kasparov - Chessbase News - September 17th, 2020 [September 17th, 2020]
- AlphaZero Match Will Be Replicated In Computer Chess Champs - Chess.com - August 3rd, 2020 [August 3rd, 2020]
- Facebook's New Algorithm Can Play Poker And Beat Humans At It - Digital Information World - August 3rd, 2020 [August 3rd, 2020]
- Survival of the Fattest: Macheide and Superman - TheArticle - August 3rd, 2020 [August 3rd, 2020]