Archive for the ‘Alphazero’ Category

AlphaGo Zero – Wikipedia

Artificial intelligence that plays Go

AlphaGo Zero is a version of DeepMind's Go software AlphaGo. AlphaGo's team published an article in the journal Nature on 19 October 2017, introducing AlphaGo Zero, a version created without using data from human games, and stronger than any previous version.[1] By playing games against itself, AlphaGo Zero surpassed the strength of AlphaGo Lee in three days by winning 100 games to 0, reached the level of AlphaGo Master in 21 days, and exceeded all the old versions in 40 days.[2]

Training artificial intelligence (AI) without datasets derived from human experts has significant implications for the development of AI with superhuman skills because expert data is "often expensive, unreliable or simply unavailable."[3] Demis Hassabis, the co-founder and CEO of DeepMind, said that AlphaGo Zero was so powerful because it was "no longer constrained by the limits of human knowledge".[4] David Silver, one of the first authors of DeepMind's papers published in Nature on AlphaGo, said that it is possible to have generalised AI algorithms by removing the need to learn from humans.[5]

Google later developed AlphaZero, a generalized version of AlphaGo Zero that could play chess and Shgi in addition to Go. In December 2017, AlphaZero beat the 3-day version of AlphaGo Zero by winning 60 games to 40, and with 8 hours of training it outperformed AlphaGo Lee on an Elo scale. AlphaZero also defeated a top chess program (Stockfish) and a top Shgi program (Elmo).[6][7]

AlphaGo Zero's neural network was trained using TensorFlow, with 64 GPU workers and 19 CPU parameter servers.Only four TPUs were used for inference. The neural network initially knew nothing about Go beyond the rules. Unlike earlier versions of AlphaGo, Zero only perceived the board's stones, rather than having some rare human-programmed edge cases to help recognize unusual Go board positions. The AI engaged in reinforcement learning, playing against itself until it could anticipate its own moves and how those moves would affect the game's outcome.[8] In the first three days AlphaGo Zero played 4.9 million games against itself in quick succession.[9] It appeared to develop the skills required to beat top humans within just a few days, whereas the earlier AlphaGo took months of training to achieve the same level.[10]

For comparison, the researchers also trained a version of AlphaGo Zero using human games, AlphaGo Master, and found that it learned more quickly, but actually performed more poorly in the long run.[11] DeepMind submitted its initial findings in a paper to Nature in April 2017, which was then published in October 2017.[1]

The hardware cost for a single AlphaGo Zero system in 2017, including the four TPUs, has been quoted as around $25 million.[12]

According to Hassabis, AlphaGo's algorithms are likely to be of the most benefit to domains that require an intelligent search through an enormous space of possibilities, such as protein folding or accurately simulating chemical reactions.[13] AlphaGo's techniques are probably less useful in domains that are difficult to simulate, such as learning how to drive a car.[14] DeepMind stated in October 2017 that it had already started active work on attempting to use AlphaGo Zero technology for protein folding, and stated it would soon publish new findings.[15][16]

AlphaGo Zero was widely regarded as a significant advance, even when compared with its groundbreaking predecessor, AlphaGo. Oren Etzioni of the Allen Institute for Artificial Intelligence called AlphaGo Zero "a very impressive technical result" in "both their ability to do itand their ability to train the system in 40 days, on four TPUs".[8] The Guardian called it a "major breakthrough for artificial intelligence", citing Eleni Vasilaki of Sheffield University and Tom Mitchell of Carnegie Mellon University, who called it an impressive feat and an outstanding engineering accomplishment" respectively.[14] Mark Pesce of the University of Sydney called AlphaGo Zero "a big technological advance" taking us into "undiscovered territory".[17]

Gary Marcus, a psychologist at New York University, has cautioned that for all we know, AlphaGo may contain "implicit knowledge that the programmers have about how to construct machines to play problems like Go" and will need to be tested in other domains before being sure that its base architecture is effective at much more than playing Go. In contrast, DeepMind is "confident that this approach is generalisable to a large number of domains".[9]

In response to the reports, South Korean Go professional Lee Sedol said, "The previous version of AlphaGo wasnt perfect, and I believe thats why AlphaGo Zero was made."On the potential for AlphaGo's development, Lee said he will have to wait and see but also said it will affect young Go players.Mok Jin-seok, who directs the South Korean national Go team, said the Go world has already been imitating the playing styles of previous versions of AlphaGo and creating new ideas from them, and he is hopeful that new ideas will come out from AlphaGo Zero. Mok also added that general trends in the Go world are now being influenced by AlphaGos playing style. "At first, it was hard to understand and I almost felt like I was playing against an alien. However, having had a great amount of experience, Ive become used to it," Mok said. "We are now past the point where we debate the gap between the capability of AlphaGo and humans. Its now between computers."Mok has reportedly already begun analyzing the playing style of AlphaGo Zero along with players from the national team."Though having watched only a few matches, we received the impression that AlphaGo Zero plays more like a human than its predecessors," Mok said.[18]Chinese Go professional, Ke Jie commented on the remarkable accomplishments of the new program: "A pure self-learning AlphaGo is the strongest. Humans seem redundant in front of its self-improvement."[19]

Future of Go Summit

89:11 against AlphaGo Master

On 5 December 2017, DeepMind team released a preprint on arXiv, introducing AlphaZero, a program using generalized AlphaGo Zero's approach, which achieved within 24 hours a superhuman level of play in chess, shogi, and Go, defeating world-champion programs, Stockfish, Elmo, and 3-day version of AlphaGo Zero in each case.[6]

AlphaZero (AZ) is a more generalized variant of the AlphaGo Zero (AGZ) algorithm, and is able to play shogi and chess as well as Go. Differences between AZ and AGZ include:[6]

An open source program, Leela Zero, based on the ideas from the AlphaGo papers is available. It uses a GPU instead of the TPUs recent versions of AlphaGo rely on.

Read this article:
AlphaGo Zero - Wikipedia

How to build your own AlphaZero AI using Python and Keras

Connect4

The game that our algorithm will learn to play is Connect4 (or Four In A Row). Not quite as complex as Go but there are still 4,531,985,219,092 game positions in total.

The game rules are straightforward. Players take it in turns to enter a piece of their colour in the top of any available column. The first player to get four of their colour in a row each vertically, horizontally or diagonally, wins. If the entire grid is filled without a four-in-a-row being created, the game is drawn.

Heres a summary of the key files that make up the codebase:

This file contains the game rules for Connect4.

Each squares is allocated a number from 0 to 41, as follows:

The game.py file gives the logic behind moving from one game state to another, given a chosen action. For example, given the empty board and action 38, the takeAction method return a new game state, with the starting players piece at the bottom of the centre column.

You can replace the game.py file with any game file that conforms to the same API and the algorithm will in principal, learn strategy through self play, based on the rules you have given it.

This contains the code that starts the learning process. It loads the game rules and then iterates through the main loop of the algorithm, which consist of three stages:

There are two agents involved in this loop, the best_player and the current_player.

The best_player contains the best performing neural network and is used to generate the self play memories. The current_player then retrains its neural network on these memories and is then pitched against the best_player. If it wins, the neural network inside the best_player is switched for the neural network inside the current_player, and the loop starts again.

This contains the Agent class (a player in the game). Each player is initialised with its own neural network and Monte Carlo Search Tree.

The simulate method runs the Monte Carlo Tree Search process. Specifically, the agent moves to a leaf node of the tree, evaluates the node with its neural network and then backfills the value of the node up through the tree.

The act method repeats the simulation multiple times to understand which move from the current position is most favourable. It then returns the chosen action to the game, to enact the move.

The replay method retrains the neural network, using memories from previous games.

This file contains the Residual_CNN class, which defines how to build an instance of the neural network.

It uses a condensed version of the neural network architecture in the AlphaGoZero paper i.e. a convolutional layer, followed by many residual layers, then splitting into a value and policy head.

The depth and number of convolutional filters can be specified in the config file.

The Keras library is used to build the network, with a backend of Tensorflow.

To view individual convolutional filters and densely connected layers in the neural network, run the following inside the the run.ipynb notebook:

This contains the Node, Edge and MCTS classes, that constitute a Monte Carlo Search Tree.

The MCTS class contains the moveToLeaf and backFill methods previously mentioned, and instances of the Edge class store the statistics about each potential move.

This is where you set the key parameters that influence the algorithm.

Adjusting these variables will affect that running time, neural network accuracy and overall success of the algorithm. The above parameters produce a high quality Connect4 player, but take a long time to do so. To speed the algorithm up, try the following parameters instead.

Contains the playMatches and playMatchesBetweenVersions functions that play matches between two agents.

To play against your creation, run the following code (its also in the run.ipynb notebook)

When you run the algorithm, all model and memory files are saved in the run folder, in the root directory.

To restart the algorithm from this checkpoint later, transfer the run folder to the run_archive folder, attaching a run number to the folder name. Then, enter the run number, model version number and memory version number into the initialise.py file, corresponding to the location of the relevant files in the run_archive folder. Running the algorithm as usual will then start from this checkpoint.

An instance of the Memory class stores the memories of previous games, that the algorithm uses to retrain the neural network of the current_player.

This file contains a custom loss function, that masks predictions from illegal moves before passing to the cross entropy loss function.

The locations of the run and run_archive folders.

Log files are saved to the log folder inside the run folder.

To turn on logging, set the values of the logger_disabled variables to False inside this file.

Viewing the log files will help you to understand how the algorithm works and see inside its mind. For example, here is a sample from the logger.mcts file.

Equally from the logger.tourney file, you can see the probabilities attached to each move, during the evaluation phase:

Training over a couple of days produces the following chart of loss against mini-batch iteration number:

The top line is the error in the policy head (the cross entropy of the MCTS move probabilities, against the output from the neural network). The bottom line is the error in the value head (the mean squared error between the actual game value and the neural network predict of the value). The middle line is an average of the two.

Clearly, the neural network is getting better at predicting the value of each game state and the likely next moves. To show how this results in stronger and stronger play, I ran a league between 17 players, ranging from the 1st iteration of the neural network, up to the 49th. Each pairing played twice, with both players having a chance to play first.

Here are the final standings:

Clearly, the later versions of the neural network are superior to the earlier versions, winning most of their games. It also appears that the learning hasnt yet saturated with further training time, the players would continue to get stronger, learning more and more intricate strategies.

As an example, one clear strategy that the neural network has favoured over time is grabbing the centre column early. Observe the difference between the first version of the algorithm and say, the 30th version:

1st neural network version

30th neural network version

This is a good strategy as many lines require the centre column claiming this early ensures your opponent cannot take advantage of this. This has been learnt by the neural network, without any human input.

There is a game.py file for a game called Metasquares in the games folder. This involves placing X and O markers in a grid to try to form squares of different sizes. Larger squares score more points than smaller squares and the player with the most points when the grid is full wins.

If you switch the Connect4 game.py file for the Metasquares game.py file, the same algorithm will learn how to play Metasquares instead.

Hopefully you find this article useful let me know in the comments below if you find any typos or have questions about anything in the codebase or article and Ill get back to you as soon as possible.

Excerpt from:
How to build your own AlphaZero AI using Python and Keras

AlphaZero Crushes Stockfish In New 1,000-Game Match …

In news reminiscent of the initial AlphaZero shockwave last December, the artificial intelligence company DeepMind released astounding results from an updated version of the machine-learning chess project today.

The results leave no question, once again, that AlphaZero plays some of the strongest chess in the world.

The updated AlphaZero crushed Stockfish 8 in a new 1,000-game match, scoring +155 -6 =839. (See below for three sample games from this match with analysis by Stockfish 10 and video analysis by GM Robert Hess.)

AlphaZero also bested Stockfish in a series of time-odds matches, soundly beating the traditional engine even at time odds of 10 to one.

In additional matches, the new AlphaZero beat the"latest development version" of Stockfish, with virtually identical results as the match vs Stockfish 8, according to DeepMind. The pre-release copy of journal article, which is dated Dec. 7, 2018, does not specify the exact development version used.

[Update: Today's release of the full journal article specifies that the match was against the latest development version of Stockfish as of Jan. 13, 2018, which was Stockfish 9.]

The machine-learning engine also won all matches against "a variant of Stockfish that uses a strong opening book," according to DeepMind. Adding the opening book did seem to help Stockfish, which finally won a substantial number of games when AlphaZero was Blackbut not enough to win the match.

AlphaZero's results (wins green, losses red) vs the latest Stockfish and vs Stockfish with a strong opening book. Image by DeepMind via Science.

The results will be published in an upcoming article by DeepMind researchers in the journal Scienceand were provided to selected chess media by DeepMind, which is based in London and owned by Alphabet, the parent company of Google.

The 1,000-game match was played in early 2018. In the match, both AlphaZero and Stockfish were given three hours each game plus a 15-second increment per move. This time control would seem to make obsolete one of the biggest arguments against the impact of last year's match, namely that the 2017 time control of one minute per move played to Stockfish's disadvantage.

With three hours plus the 15-second increment, no such argument can be made, as that is an enormous amount of playing time for any computer engine. In the time odds games, AlphaZero was dominant up to 10-to-1 odds. Stockfish only began to outscore AlphaZero when the odds reached 30-to-1.

AlphaZero's results (wins green, losses red) vs Stockfish 8 in time odds matches. Image by DeepMind via Science.

AlphaZero's results in the time odds matches suggest it is not only much stronger than any traditional chess engine, but that it also uses a much more efficient search for moves. According to DeepMind, AlphaZero uses a Monte Carlo tree search, and examines about 60,000 positions per second, compared to 60 million for Stockfish.

An illustration of how AlphaZero searches for chess moves. Image by DeepMind via Science.

What can computer chess fans conclude after reading these results? AlphaZero has solidified its status as one of the elite chess players in the world. But the results are even more intriguing if you're following the ability of artificial intelligence to master general gameplay.

According to the journal article, the updated AlphaZero algorithm is identical in three challenging games: chess, shogi, and go. This version of AlphaZero was able to beat the top computer players of all three games after just a few hours of self-training, starting from just the basic rules of the games.

The updated AlphaZero results come exactly one year to the day since DeepMind unveiled the first, historic AlphaZero results in a surprise match vs Stockfish that changed chess forever.

Since then, an open-source project called Lc0 has attempted to replicate the success of AlphaZero, and the project has fascinated chess fans. Lc0 now competes along with the champion Stockfish and the rest of the world's top engines in the ongoing Chess.com Computer Chess Championship.

CCC fans will be pleased to see that some of the new AlphaZero games include "fawn pawns," the CCC-chat nickname for lone advanced pawns that cramp an opponent's position. Perhaps the establishment of these pawns is a critical winning strategy, as it seems AlphaZero and Lc0 have independently learned it.

DeepMind released 20 sample games chosen by GM Matthew Sadler from the 1,000 game match. Chess.com has selected three of these games with deep analysis by Stockfish 10 and video analysis by GM Robert Hess. You can download the 20 sample games at the bottom of this article, analyzed by Stockfish 10, and four sample games analyzed by Lc0.

Update: After this article was published, DeepMind released 210 sample games that you can download here.

Selected game 1 with analysis by Stockfish 10:

Game 1 video analysis by GM Robert Hess:

Selected game 2with analysis by Stockfish 10:

Game 2 video analysis by GM Robert Hess:

Selected game 3 with analysis by Stockfish 10:

Game 3 video analysis by GM Robert Hess:

IM Anna Rudolf also made a video analysis of one of the sample games, calling it "AlphaZero's brilliancy."

The new version of AlphaZero trained itself to play chess starting just from the rules of the game, using machine-learning techniques to continually update its neural networks. According to DeepMind, 5,000 TPUs (Google's tensor processing unit, an application-specific integrated circuit for article intelligence) were used to generate the first set of self-play games, and then 16 TPUs were used to train the neural networks.

The total training time in chess was nine hours from scratch. According to DeepMind, it took the new AlphaZero just four hours of training to surpass Stockfish; by nine hours it was far ahead of the world-champion engine.

For the games themselves, Stockfish used 44 CPU (central processing unit) cores and AlphaZero used a single machine with four TPUs and 44 CPU cores. Stockfish had a hash size of 32GB and used syzygy endgame tablebases.

AlphaZero's results vs. Stockfish in the most popular human openings. In the left bar, AlphaZero plays White; in the right bar, AlphaZero is Black. Image by DeepMind via Science. Click on the image for a larger version.

The sample games released were deemed impressive by chess professionals who were given preview access to them. GM Robert Hess categorized the games as "immensely complicated."

DeepMind itself noted the unique style of its creation in the journal article:

"In several games, AlphaZero sacrificed pieces for long-term strategic advantage, suggesting that it has a more fluid, context-dependent positional evaluation than the rule-based evaluations used by previous chess programs," the DeepMind researchers said.

The AI company also emphasized the importance of using the same AlphaZero version in three different games, touting it as a breakthrough in overall game-playing intelligence:

"These results bring us a step closer to fulfilling a longstanding ambition of artificial intelligence: a general game-playing system that can learn to master any game," the DeepMind researchers said.

You can download the 20 sample games provided by DeepMind and analyzed by Chess.com using Stockfish 10 on a powerful computer. The first set of games contains 10 games with no opening book, and the second set contains games with openings from the 2016 TCEC (Top Chess Engine Championship).

PGN downloads:

20 games with analysis by Stockfish 10:

4 selected games with analysis by Lc0:

Love AlphaZero? You can watch the machine-learning chess project it inspired, Lc0, in the ongoing Computer Chess Championship now.

See the rest here:
AlphaZero Crushes Stockfish In New 1,000-Game Match ...

The future is here AlphaZero learns chess | ChessBase

About threeyears ago, DeepMind, a company owned by Google that specializes in AI development, turned its attention to the ancient game of Go. Go had been the one game that had eluded all computer efforts to become world class, and even up until the announcement was deemed a goal that would not be attained for another decade! This was how large the difference was. When a public challenge and match was organized against the legendary player Lee Sedol, a South Korean whose track record had him in the ranks of the greatest ever, everyone thought it would be an interesting spectacle, but a certain win by the human. The question wasnt even whether the program AlphaGo would win or lose, but how much closer it was to the Holy Grail goal. The result was a crushing 4-1 victory, and a revolution in the Go world. In spite of a ton of second-guessing by the elite, who could not accept the loss, eventually they came to terms with the reality of AlphaGo, a machine that was among the very best, albeit not unbeatable. It had lost a game after all.

The saga did not end there. A year later a new updated version of AlphaGo was pitted against the world number oneof Go, Ke Jie, a young Chinese whose genius is not without parallels to Magnus Carlsen in chess. At the age of just 16 he won his first world title and by the age of 17 was the clear world number one. That had been in 2015, and now at age 19, he was even stronger. The new match was held in China itself, and even Ke Jie knew he was most likely a serious underdog. There were no illusions anymore. He played superbly but still lost by a perfect 3-0, a testimony to the amazing capabilities of the new AI.

Many chess players and pundits had wondered how it would do in the noble game of chess. There were serious doubts on just how successful it might be. Go is a huge and long game with a 19x19 grid, in which all pieces are the same, and not one moves. Calculating ahead as in chess is an exercise in futility so pattern recognition is king. Chess is very different. There is no questioning the value of knowledge and pattern recognition in chess, but the royal game is supremely tactical and a lot of knowledge can be compensated for by simply outcalculating the opponent. This has been true not only of computer chess, but humans as well.

However, there were some very startling results in the last few months that need to be understood. DeepMinds interest in Go did not end with that match against the number one. You might ask yourself what more there was to do after that? Beat him 20-0 and not just 3-0? No, of course not. However, the super Go program became an internal litmus test of a sorts. Its standard was unquestioned and quantified, so if one wanted to test a new self-learning AI, and how good it was, then throwing it at Go and seeing how it compared to the AlphaGo program would be a way to measure it.

A new AI was created called AlphaZero. It had several strikingly different changes. The first was that it was not shown tens of thousands of master games in Go to learn from, instead it was shown none. Not a single one. It was merely shown the rules, without any other information. The result was a shock. Within just three days its completely self-taught Go program was stronger than the version that had beat Lee Sedol, a result the previous AI had needed over a year to achieve. Within three weeks it was beating the strongest AlphaGo that had defeated Ke Jie. What is more: while the Lee Sedol version had used 48 highly specialized processors to create the program, this new version used only four!

Graph showing the relative evolution of AlphaZero : Source: DeepMind

Approaching chess might still seem unusual. After all, although DeepMind had already shown near revolutionary breakthroughs thanks to Go, that had been a game that had yet to be solved. Chess already had its Deep Blue 20 years ago, and today even a good smartphone can beat the world number one. What is there to prove exactly?

Garry Kasparov is seen chatting with Demis Hassabis, founder of DeepMind | Photo: Lennart Ootes

It needs to be remembered that Demis Hassabis, the founder of DeepMind has a profound chess connection of his own. He had been a chess prodigy in his own right, and at age 13 was the second highest rated player under 14 in the world, second only to Judit Polgar. He eventually left the chess track to pursue other things, like founding his own PC video game company at age 17, but the link is there. There was still a burning question on everyones mind: just how well would AlphaZero do if it was focused on chess? Would it just be very smart, but smashed by the number-crunching engines of today where a single ply is often the difference between winning or losing? Or would something special come of it?

Professor David Silver explains how AlphaZero was able to progress much quicker when it had to learn everything on its own as opposed to analzying large amounts of data. The efficiency of a principled algorithm was the most important factor.

On December 5 the DeepMind group published a new paper at the site of Cornell University called "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm", and the results were nothing short of staggering. AlphaZero had done more than just master the game, it had attained new heights in ways considered inconceivable. The test is in the pudding of course, so before going into some of the fascinating nitty-gritty details, lets cut to the chase. It played a match against the latest and greatest version of Stockfish, and won by an incredible score of 64 : 36, and not only that, AlphaZero had zero losses (28 wins and 72 draws)!

Stockfishneeds no introduction to ChessBase readers, but it's worth noting that the program was on a computer that was running nearly 900 times faster! Indeed, AlphaZero was calculating roughly 80 thousand positions per second, while Stockfish, running on a PC with 64 threads (likely a 32-core machine) was running at 70 million positions per second.To better understand how big a deficit that is, if another version of Stockfish were to run 900 times slower, this would be equivalent to roughly 8 moves less deep. How is this possible?

The paper "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm" at Cornell University

The paper explains:

AlphaZero compensates for the lower number of evaluations by using its deep neural network to focus much more selectively on the most promising variations arguably a more human-like approach to search, as originally proposed by Shannon. Figure 2 shows the scalability of each player with respect to thinking time, measured on an Elo scale, relative to Stockfish or Elmo with 40ms thinking time. AlphaZeros MCTS scaled more effectively with thinking time than either Stockfish or Elmo, calling into question the widely held belief that alpha-beta search is inherently superior in these domains.

This diagram shows that the longer AlphaZero had to think, the more it improved compared to Stockfish

In other words, instead of a hybrid brute-force approach, which has been the core of chess engines today, it went in a completely different direction, opting for an extremely selective search that emulates how humans think. A top player may be able to outcalculate a weaker player in both consistency and depth, but it still remains a joke compared to what even the weakest computer programs are doing. It is the humans sheer knowledge and ability to filter out so many moves that allows them to reach the standard they do. Remember that although Garry Kasparov lost to Deep Blue it is not clear at all that it was genuinely stronger than him even then, and this was despite reaching speeds of 200 million positions per second. If AlphaZero is really able to use its understanding to not only compensate 900 times fewer moves, but surpass them, then we are looking at a major paradigm shift.

Since AlphaZero did not benefit from any chess knowledge, which means no games or opening theory, it also means it had to discover opening theory on its own. And do recall that this is the result of only 24 hours of self-learning. The team produced fascinating graphs showing the openings it discovered as well as the ones it gradually rejected as it grew stronger!

Professor David Silver, lead scientist behind AlphaZero, explains how AlphaZero learned openings in Go, and gradually began to discard some in favor of others as it improved. The same is seen in chess.

In the diagram above, we can see that in the early games, AlphaZero was quite enthusiastic about playing the French Defense, but after two hours (this so humiliating) began to play it less and less.

The Caro-Kann fared a good deal better, and held a prime spot in AlphaZero's opening choices until it also gradually filtered it out. So what openings did AlphaZero actually like or choose by the end of its learning process? The English Opening and the Queen's Gambit!

The paper also came accompanied by ten games to share the results. It needs to be said that these are very different from the usual fare of engine games. If Karpov had been a chess engine, he might have been called AlphaZero. There is a relentless positional boa constrictor approach that is simply unheard of. Modern chess engines are focused on activity, and have special safeguards to avoid blocked positions as they have no understanding of them and often find themselves in a dead end before they realize it. AlphaZero has no such prejudices or issues, and seems to thrive on snuffing out the opponents play. It is singularly impressive, and what is astonishing is how it is able to also find tactics that the engines seem blind to.

In this position from Game 5 of the ten published, this position arose after move 20...Kh8. The completely disjointed array of Blacks pieces is striking, and AlphaZero came up with the fantastic 21.Bg5!! After analyzing it and the consequences, there is no question this is the killer move here, and while my laptop cannot produce 70 million positions per second, I gave it to Houdini 6.02 with 9 million positions per second. It analyzed it for one full hour and was unable to find 21.Bg5!!

A screenshot of Houdini 6.02 after an hour of analysis

Here is another little gem of a shot, in which AlphaZero had completely stymied Stockfish positionally, and now wraps it up with some nice tactics.Look at this incredible sequence in game nine:

Here AlphaZero played the breathtaking 30. Bxg6!! The threat is obviously 30...fxg6 31. Qxe6+, but how do you continue after the game's 30...Bxg5 31. Qxg5 fxg6?

Here AlphaZero continued with 32. f5!! and after 32...Rg8 33. Qh6 Qf7 34. f6 obtained a deadly bind, and worked it into a win 20 moves later. Time to get a thesaurus for all the references synonymous of 'amazing'.

So where does this leave chess, and what does it mean in general? This is a game-changer, a term that is so often used and abused, and there is no other way of describing it. Deep Blue was a breakthrough moment, but its result was thanks to highly specialized hardware whose purpose was to play chess, nothing else. If one had tried to make it play Go, for example, it would have never worked. This completely open-ended AI able to learn from the least amount of information and take this to levels hitherto never imagined is not a threat to beat us at any number of activities, it is a promise to analyze problems such as disease, famine, and other problems in ways that might conceivably lead to genuine solutions.

For chess, this will likely lead to genuinely breakthrough engines following in these footsteps. That is what happened in Go. For years and years, Go programs had been more or less stuck where they were, unable to make any meaningful advances, and then came along AlphaGo. It wasn't because AlphaGo offered some inspiration to'try harder', it was because just as here, a paper was published detailing all the techniques and algorithms developed and used so that others might follow in their footsteps. And they did. Literally within a couple of months, new versions of top programs such as Crazy Stone, began offering updated engines with Deep Learning, which brought hundreds (plural) of Elo in improvement. This is no exaggeration.

Within a couple of months, the revolutionary techniques used to create AlphaGo began to appear in top PC programs of Go

The paper on chess offers similar information allowing anyone to do what they did. Obviously they won't have the benefit of the specialized TPUs, a processor designed especially for this deep learning training, but nor are they required to do so. It bears remembering that this was also done without the benefit of many of the specialized programming techniques and tricks in chess programming. Who is to say they cannot be combined for even greater results? Even the DeepMind team think it bears investigating:

"It is likely that some of these techniques could further improve the performance of AlphaZero; however, we have focused on a pure self-play reinforcement learning approach and leave these extensions for future research."

Replay the ten games between AlphaZero and Stockfish 8 (70 million NPS)

Read more:
The future is here AlphaZero learns chess | ChessBase

Google’s AlphaZero Destroys Stockfish In 100-Game Match …

Chess changed forever today. And maybe the rest of the world did, too.

A little more than a year after AlphaGosensationally won against the top Go player, the artificial-intelligence programAlphaZero has obliterated the highest-rated chess engine.

Stockfish, which for most top players is their go-to preparation tool, and which won the 2016 TCEC Championshipand the2017 Chess.com Computer Chess Championship,didn't stand a chance. AlphaZero won the closed-door, 100-game match with 28 wins, 72 draws, and zero losses.

Oh, and it took AlphaZero only four hours to "learn" chess. Sorry humans, you had a good run.

That's right -- the programmers of AlphaZero, housed within the DeepMind division of Google, had it use a type of"machine learning,"specifically reinforcement learning. Put more plainly, AlphaZero was not "taught" the game in the traditional sense. That means no opening book, no endgame tables, and apparently no complicated algorithms dissecting minute differences between center pawns and side pawns.

Google headquarters in London from inside, with the DeepMind section on the eighth floor. | Photo: Maria Emelianova/Chess.com.

This would be akin to a robot being given access to thousands of metal bits and parts, but no knowledge of a combustion engine, then it experiments numerous times with every combination possible until it builds a Ferrari. That's all in less time that it takes to watch the "Lord of the Rings" trilogy. The program had four hours to play itself many, many times, thereby becoming its own teacher.

For now, the programming team is keeping quiet. They chose not to comment to Chess.com, pointing out the paper "is currently under review" but you can read the full paper here. Part of the research group isDemis Hassabis, a candidate master from England and co-founder of DeepMind (bought by Google in 2014). Hassabis, who played in the ProBiz event of the London Chess Classic, is currently at the Neural Information Processing Systems conference in California where he is a co-author of another paper on a different subject.

Demis Hassabis playing with Michael Adams at the ProBiz event at Google Headquarters London just a few days ago. | Photo: Maria Emelianova/Chess.com.

One person that did comment to Chess.com has quite a lot of first-hand experience playing chess computers. GM Garry Kasparov is not surprised that DeepMind branched out from Go to chess.

"It's a remarkable achievement, even if we should have expected it after AlphaGo," he told Chess.com. "It approaches the 'Type B,' human-like approach to machine chess dreamt of by Claude Shannon and Alan Turing instead of brute force."

One of the 10 selected games given in the paper.

Indeed, much like humans, AlphaZero searches fewer positions that its predecessors. The paper claims that it looks at "only" 80,000 positions per second, compared to Stockfish's 70 million per second.

GM Peter Heine Nielsen, the longtime second of World Champion GM Magnus Carlsen, is now on board with the FIDE president in one way: aliens. As he told Chess.com, "After reading the paper but especially seeing the games I thought, well, I always wondered how it would be if a superior species landed on earth and showed us how they play chess. I feel now I know."

Chess.com's interview with Nielsen on the AlphaZero news.

We also learned, unsurprisingly, that White is indeed the choice, even among the non-sentient. Of AlphaZero's 28 wins, 25 came from the white side (although +3=47-0 as Black against the 3400+ Stockfish isn't too bad either).

The machine also ramped up the frequency of openings it preferred. Sorry, King's Indian practitioners, your baby is not the chosen one. The French also tailed off in the program's enthusiasm over time, while the Queen's Gambit and especially the English Opening were well represented.

Frequency of openings over time employed by AlphaZero in its "learning" phase. Image sourced fromAlphaZero research paper.

What do you do if you are a thing that never tires and you just mastered a 1400-year-old game? You conquer another one. After the Stockfish match, AlphaZero then "trained" for only two hours and then beat the best Shogi-playing computer program "Elmo."

The ramifications for such an inventive way of learning are of course not limited to games.

"We have always assumed that chess required too much empirical knowledge for a machine to play so well from scratch, with no human knowledge added at all," Kasparov said. "Of course Ill be fascinated to see what we can learn about chess from AlphaZero, since that is the great promise of machine learning in generalmachines figuring out rules that humans cannot detect. But obviously the implications are wonderful far beyond chess and other games. The ability of a machine to replicate and surpass centuries of human knowledge in complex closed systems is a world-changing tool."

Garry Kasparov and Demis Hassabis together at the ProBiz event in London. | Photo: Maria Emelianova/Chess.com.

Chess.com interviewed eight of the 10 players participating in the London Chess Classicabout their thoughts on the match. A video compilation of their thoughts will be posted on the site later.

The player with most strident objections to the conditions of the match was GM Hikaru Nakamura. While a heated discussion is taking place online about processing power of the two sides, Nakamura thought that was a secondary issue.

The American called the match "dishonest" and pointed out that Stockfish's methodology requires it to have an openings book for optimal performance. While he doesn't think the ultimate winner would have changed, Nakamura thought the size of the winning score would be mitigated.

"I am pretty sure God himself could not beat Stockfish 75 percent of the time with White without certain handicaps," he said about the 25 wins and 25 draws AlphaZero scored with the white pieces.

GM Larry Kaufman, lead chess consultant on the Komodo program, hopes to see the new program's performance on home machines without the benefits of Google's own computers. He also echoed Nakamura's objections to Stockfish's lack of its standard opening knowledge.

"It is of course rather incredible, he said. "Although after I heard about the achievements of AlphaGo Zero in Go I was rather expecting something like this, especially since the team has a chess master, Demis Hassabis. What isn't yet clear is whether AlphaZero could play chess on normal PCs and if so how strong it would be. It may well be that the current dominance of minimax chess engines may be at an end, but it's too soon to say so. It should be pointed out that AlphaZero had effectively built its own opening book, so a fairer run would be against a top engine using a good opening book."

Whatever the merits of the match conditions, Nielsen is eager to see what other disciplines will be refined or mastered by this type of learning.

"[This is] actual artificial intelligence," he said. "It goes from having something that's relevant to chess to something that's gonna win Nobel Prizes or even bigger than Nobel Prizes. I think it's basically cool for us that they also decided to do four hours on chess because we get a lot of knowledge. We feel it's a great day for chess but of course it goes so much further."

Follow this link:
Google's AlphaZero Destroys Stockfish In 100-Game Match ...