Facebook develops AI algorithm that learns to play poker on the fly – VentureBeat
Facebook researchers have developed a general AI framework called Recursive Belief-based Learning (ReBeL) that they say achieves better-than-human performance in heads-up, no-limit Texas holdem poker while using less domain knowledge than any prior poker AI. They assert that ReBeL is a step toward developing universal techniques for multi-agent interactions in other words, general algorithms that can be deployed in large-scale, multi-agent settings. Potential applications run the gamut from auctions, negotiations, and cybersecurity to self-driving cars and trucks.
Combining reinforcement learning with search at AI model training and test time has led to a number of advances. Reinforcement learning is where agents learn to achieve goals by maximizing rewards, while search is the process of navigating from a start to a goal state. For example, DeepMinds AlphaZero employed reinforcement learning and search to achieve state-of-the-art performance in the board games chess, shogi, and Go. But the combinatorial approach suffers a performance penalty when applied to imperfect-information games like poker (or even rock-paper-scissors), because it makes a number of assumptions that dont hold in these scenarios. The value of any given action depends on the probability that its chosen, and more generally, on the entire play strategy.
The Facebook researchers propose that ReBeL offers a fix. ReBeL builds on work in which the notion of game state is expanded to include the agents belief about what state they might be in, based on common knowledge and the policies of other agents. ReBeL trains two AI models a value network and a policy network for the states through self-play reinforcement learning. It uses both models for search during self-play. The result is a simple, flexible algorithm the researchers claim is capable of defeating top human players at large-scale, two-player imperfect-information games.
At a high level, ReBeL operates on public belief states rather than world states (i.e., the state of a game). Public belief states (PBSs) generalize the notion of state value to imperfect-information games like poker; a PBS is a common-knowledge probability distribution over a finite sequence of possible actions and states, also called a history. (Probability distributions are specialized functions that give the probabilities of occurrence of different possible outcomes.) In perfect-information games, PBSs can be distilled down to histories, which in two-player zero-sum games effectively distill to world states. A PBS in poker is the array of decisions a player could make and their outcomes given a particular hand, a pot, and chips.
Above: Poker chips.
Image Credit: Flickr: Sean Oliver
ReBeL generates a subgame at the start of each game thats identical to the original game, except its rooted at an initial PBS. The algorithm wins it by running iterations of an equilibrium-finding algorithm and using the trained value network to approximate values on every iteration. Through reinforcement learning, the values are discovered and added as training examples for the value network, and the policies in the subgame are optionally added as examples for the policy network. The process then repeats, with the PBS becoming the new subgame root until accuracy reaches a certain threshold.
In experiments, the researchers benchmarked ReBeL on games of heads-up no-limit Texas holdem poker, Liars Dice, and turn endgame holdem, which is a variant of no-limit holdem in which both players check or call for the first two of four betting rounds. The team used up to 128 PCs with eight graphics cards each to generate simulated game data, and they randomized the bet and stack sizes (from 5,000 to 25,000 chips) during training. ReBeL was trained on the full game and had $20,000 to bet against its opponent in endgame holdem.
The researchers report that against Dong Kim, whos ranked as one of the best heads-up poker players in the world, ReBeL played faster than two seconds per hand across 7,500 hands and never needed more than five seconds for a decision. In aggregate, they said it scored 165 (with a standard deviation of 69) thousandths of a big blind (forced bet) per game against humans it played compared with Facebooks previous poker-playing system, Libratus, which maxed out at 147 thousandths.
For fear of enabling cheating, the Facebook team decided against releasing the ReBeL codebase for poker. Instead, they open-sourced their implementation for Liars Dice, which they say is also easier to understand and can be more easily adjusted. We believe it makes the game more suitable as a domain for research, they wrote in the a preprint paper. While AI algorithms already exist that can achieve superhuman performance in poker, these algorithms generally assume that participants have a certain number of chips or use certain bet sizes. Retraining the algorithms to account for arbitrary chip stacks or unanticipated bet sizes requires more computation than is feasible in real time. However, ReBeL can compute a policy for arbitrary stack sizes and arbitrary bet sizes in seconds.
See the original post:
Facebook develops AI algorithm that learns to play poker on the fly - VentureBeat
- Demis Hassabis - when the chess prodigy won the Nobel Prize in Chemistry - Chess.com - October 14th, 2024 [October 14th, 2024]
- AI Could Learn a Thing or Two From Rat Brains - The Daily Beast - November 13th, 2023 [November 13th, 2023]
- Episode What sets great teams apart | Lane Shackleton (CPO of Coda) - Mirchi Plus - October 1st, 2023 [October 1st, 2023]
- The timeless charm of of 'Chaturanga' - Daily Pioneer - October 1st, 2023 [October 1st, 2023]
- Creating New Stories That Don't Suck - Hollywood in Toto - October 1st, 2023 [October 1st, 2023]
- AI Agents: Adapting to the Future of Software Development - ReadWrite - October 1st, 2023 [October 1st, 2023]
- The Race for AGI: Approaches of Big Tech Giants - Fagen wasanni - July 30th, 2023 [July 30th, 2023]
- Book Review: Re-engineering the Chess Classics by GM Matthew ... - Chess.com - June 4th, 2023 [June 4th, 2023]
- The Sparrow Effect: How DeepMind is Rewriting the AI Script - CityLife - June 4th, 2023 [June 4th, 2023]
- Vitalik Buterin Exclusive Interview: Longevity, AI and More - Lifespan.io News - June 4th, 2023 [June 4th, 2023]
- How to play chess against ChatGPT (and why you probably shouldn't) - Android Authority - May 29th, 2023 [May 29th, 2023]
- Weekend Movers - Conflux (CFX) and Klaytn (KLAY) - Securities.io - May 16th, 2023 [May 16th, 2023]
- How technology reinvented chess as a global social network - Financial Times - May 8th, 2023 [May 8th, 2023]
- Our moral panic over AI - The Spectator Australia - April 13th, 2023 [April 13th, 2023]
- Liability Considerations for Superhuman (and - Fenwick & West LLP - April 13th, 2023 [April 13th, 2023]
- Aston by-election minus one day The Poll Bludger - The Poll Bludger - April 2nd, 2023 [April 2nd, 2023]
- No-Castling Masters: Kramnik and Caruana will play in Dortmund - ChessBase - March 26th, 2023 [March 26th, 2023]
- AI is teamwork Bits&Chips - Bits&Chips - March 20th, 2023 [March 20th, 2023]
- Resolve Strategic nuclear subs poll (open thread) The Poll Bludger - The Poll Bludger - March 20th, 2023 [March 20th, 2023]
- How AlphaZero Learns Chess - Chess.com - February 24th, 2023 [February 24th, 2023]
- AI Topic: AlphaZero, ChatGPT, Bard, Stable Diffusion and more! - February 24th, 2023 [February 24th, 2023]
- AlphaZero Tackles Chess Variants - by Dennis Monokroussos - February 20th, 2023 [February 20th, 2023]
- AlphaZero Vs. Stockfish 8 | AI Is Conquering Computer Chess - February 10th, 2023 [February 10th, 2023]
- Stockfish (chess) - Wikipedia - November 22nd, 2022 [November 22nd, 2022]
- AlphaZero Chess Engine: The Ultimate Guide - October 14th, 2022 [October 14th, 2022]
- Whos going to save us from bad AI? - MIT Technology Review - October 14th, 2022 [October 14th, 2022]
- DeepMinds game-playing AI has beaten a 50-year-old record in computer science - MIT Technology Review - October 8th, 2022 [October 8th, 2022]
- The Download: TikTok moral panics, and DeepMinds record-breaking AI - MIT Technology Review - October 8th, 2022 [October 8th, 2022]
- Top 5 stories of the week: DeepMind and OpenAI advancements, Intels plan for GPUs, Microsofts zero-day flaws - VentureBeat - October 8th, 2022 [October 8th, 2022]
- Taxing times (open thread) The Poll Bludger - The Poll Bludger - October 8th, 2022 [October 8th, 2022]
- AlphaGo Zero Explained In One Diagram | by David Foster - Medium - October 1st, 2022 [October 1st, 2022]
- A chess scandal brings fresh attention to computers role in the game - The Record by Recorded Future - October 1st, 2022 [October 1st, 2022]
- Meta AI Boss: current AI methods will never lead to true intelligence - Gizchina.com - October 1st, 2022 [October 1st, 2022]
- Meta's AI guru LeCun: Most of today's AI approaches will never lead to true intelligence - ZDNet - September 24th, 2022 [September 24th, 2022]
- Stockfish - Chess Engines - Chess.com - September 9th, 2022 [September 9th, 2022]
- DeepMinds AlphaFold could be the future of science and AI - Vox.com - August 7th, 2022 [August 7th, 2022]
- Correspondence chess server, Go (weiqi) games online - FICGS - July 4th, 2022 [July 4th, 2022]
- Chennai Chess Olympiad and AI - Analytics India Magazine - June 28th, 2022 [June 28th, 2022]
- Yann LeCun has a bold new vision for the future of AI - MIT Technology Review - June 28th, 2022 [June 28th, 2022]
- Special Street Fighter 35th anniversary website launched, features impressive timeline of game release dates over the years - EventHubs - June 28th, 2022 [June 28th, 2022]
- The Nightmarish Frontier of AI in Chess - uschess.org - June 19th, 2022 [June 19th, 2022]
- Four Draws in Round Three of 2022 Candidates | US Chess.org - uschess.org - June 19th, 2022 [June 19th, 2022]
- Part 1: A Realistic Framing Of The Progress In Artificial Intelligence - Investing.com UK - June 19th, 2022 [June 19th, 2022]
- Who Will Win The Candidates: The Case For Each Player - Chess.com - June 13th, 2022 [June 13th, 2022]
- A tale of two universities and two engines - Chess News - March 22nd, 2022 [March 22nd, 2022]
- AlphaZero (And Other!) Chess Variants Now Available For Everyone - Chess.com - March 20th, 2022 [March 20th, 2022]
- How AI is impacting the video game industry - ZME Science - December 17th, 2021 [December 17th, 2021]
- Q&A: How Speechmatics is leading the way in tackling AI bias and improving inclusion - Information Age - November 4th, 2021 [November 4th, 2021]
- AlphaGo | DeepMind - October 22nd, 2021 [October 22nd, 2021]
- Leela Zero - Wikipedia - October 22nd, 2021 [October 22nd, 2021]
- Leela Chess Zero - Wikipedia - October 22nd, 2021 [October 22nd, 2021]
- How AI is reinventing what computers are - MIT Technology Review - October 22nd, 2021 [October 22nd, 2021]
- graphneural.network - Spektral - October 12th, 2021 [October 12th, 2021]
- MuZero - Wikipedia - October 12th, 2021 [October 12th, 2021]
- Bin Yu - October 12th, 2021 [October 12th, 2021]
- A general reinforcement learning algorithm that masters ... - August 29th, 2021 [August 29th, 2021]
- What would it be like to be a conscious AI? We might never know. - MIT Technology Review - August 29th, 2021 [August 29th, 2021]
- AlphaZero to analyse no-castling match of the champions - Chessbase News - July 13th, 2021 [July 13th, 2021]
- How This Startup Aims to Disrupt Copywriting Forever - Inc. - June 6th, 2021 [June 6th, 2021]
- Between Games and Apocalyptic Robots: Considering Near-Term Societal Risks of Reinforcement - Medium - April 17th, 2021 [April 17th, 2021]
- Trapping the queen - Chessbase News - April 17th, 2021 [April 17th, 2021]
- AI 101: All the Ways AI Could Improve or End Our World - Interesting Engineering - April 2nd, 2021 [April 2nd, 2021]
- Quick Scripts AlphaZero - February 17th, 2021 [February 17th, 2021]
- How to Kickstart an AI Venture Without Proprietary Data - Medium - February 17th, 2021 [February 17th, 2021]
- Street Fighter V: What to Expect After the Winter Update | CBR - CBR - Comic Book Resources - February 17th, 2021 [February 17th, 2021]
- This AI chess engine aims to help human players rather than defeat them - The Next Web - February 1st, 2021 [February 1st, 2021]
- Open source at Facebook: 700 repositories and 1.3 million followers - ZDNet - February 1st, 2021 [February 1st, 2021]
- Scientists say dropping acid can help with social anxiety and alcoholism - The Next Web - February 1st, 2021 [February 1st, 2021]
- AlphaZero - Chess Engines - Chess.com - November 21st, 2020 [November 21st, 2020]
- AlphaZero: Shedding new light on chess, shogi, and Go ... - November 21st, 2020 [November 21st, 2020]
- The art of chess: a brief history of the World Championship - TheArticle - November 21st, 2020 [November 21st, 2020]
- Podcast: Can you teach a machine to think? - MIT Technology Review - November 15th, 2020 [November 15th, 2020]
- Retired Chess Grandmaster, AlphaZero AI Reinvent Chess - Science Times - September 17th, 2020 [September 17th, 2020]
- DeepMind's AI is helping to re-write the rules of chess - ZDNet - September 17th, 2020 [September 17th, 2020]
- AI messed up mentally stimulating games. Right now it is actually creating the video game wonderful once again - Publicist Recorder - September 17th, 2020 [September 17th, 2020]
- A|I: The AI Times Surveillance mandated - BetaKit - September 17th, 2020 [September 17th, 2020]
- Starting on Friday: Chess 9LX with Carlsen and Kasparov - Chessbase News - September 17th, 2020 [September 17th, 2020]
- AlphaZero Match Will Be Replicated In Computer Chess Champs - Chess.com - August 3rd, 2020 [August 3rd, 2020]
- Facebook's New Algorithm Can Play Poker And Beat Humans At It - Digital Information World - August 3rd, 2020 [August 3rd, 2020]
- Survival of the Fattest: Macheide and Superman - TheArticle - August 3rd, 2020 [August 3rd, 2020]