A gentle introduction to model-free and model-based reinforcement learning – TechTalks
Image credit: 123RF (with modifications)
Reinforcement learning is one of the exciting branches of artificial intelligence. It plays an important role in game-playing AI systems, modern robots, chip-design systems, and other applications.
There are many different types of reinforcement learning algorithms, but two main categories are model-based and model-free RL. They are both inspired by our understanding of learning in humans and animals.
Nearly every book on reinforcement learning contains a chapter that explains the differences between model-free and model-based reinforcement learning. But seldom are the biological and evolutionary precedents discussed in books about reinforcement learning algorithms for computers.
I found a very interesting explanation of model-free and model-based RL in The Birth of Intelligence, a book that explores the evolution of intelligence. In a conversation with TechTalks, Daeyeol Lee, neuroscientist and author of The Birth of Intelligence, discussed different modes of reinforcement learning in humans and animals, AI and natural intelligence, and future directions of research.
In the late nineteenth century, psychologist Edward Thorndike proposed the law of effect, which states that actions with positive effects in a particular situation become more likely to occur again in that situation, and responses that produce negative effects become less likely to occur in the future.
Thorndike explored the law of effect with an experiment in which he placed a cat inside a puzzle box and measured the time it took for the cat to escape it. To escape, the cat had to manipulate a series of gadgets such as strings and levers. Thorndike observed that as the cat interacted with the puzzle box, it learned the behavioral responses that could help it escape. Over time, the cat became faster and faster at escaping the box. Thorndike concluded that the cat learned from the reward and punishments that its actions provided.
The law of effect later paved the way for behaviorism, a branch of psychology that tries to explain human and animal behavior in terms of stimuli and responses.
The law of effect is also the basis for model-free reinforcement learning. In model-free reinforcement learning, an agent perceives the world, takes an action, and measures the reward. The agent usually starts by taking random actions and gradually repeats those that are associated with more rewards.
You basically look at the state of the world, a snapshot of what the world looks like, and then you take an action. Afterward, you increase or decrease the probability of taking the same action in the given situation depending on its outcome, Lee said. Thats basically what model-free reinforcement learning is. The simplest thing you can imagine.
In model-free reinforcement learning, theres no direct knowledge or model of the world. The RL agent must directly experience every outcome of each action through trial and error.
Thorndikes law of effect was prevalent until the 1930s, when Edward Tolman, another psychologist, discovered an important insight while exploring how fast rats could learn to navigate mazes. During his experiments, Tolman realized that animals could learn things about their environment without reinforcement.
For example, when a rat is let loose in a maze, it will freely explore the tunnels and gradually learn the structure of the environment. If the same rat is later reintroduced to the same environment and is provided with a reinforcement signal, such as finding food or searching for the exit, it can reach its goal much quicker than animals who did not have the opportunity to explore the maze. Tolman called this latent learning.
Latent learning enables animals and humans to develop a mental representation of their world and simulate hypothetical scenarios in their minds and predict the outcome. This is also the basis of model-based reinforcement learning.
In model-based reinforcement learning, you develop a model of the world. In terms of computer science, its a transition probability, how the world goes from one state to another state depending on what kind of action you produce in it, Lee said. When youre in a given situation where youve already learned the model of the environment previously, youll do a mental simulation. Youll basically search through the model youve acquired in your brain and try to see what kind of outcome would occur if you take a particular series of actions. And when you find the path of actions that will get you to the goal that you want, youll start taking those actions physically.
The main benefit of model-based reinforcement learning is that it obviates the need for the agent to undergo trial-and-error in its environment. For example, if you hear about an accident that has blocked the road you usually take to work, model-based RL will allow you to do a mental simulation of alternative routes and change your path. With model-free reinforcement learning, the new information would not be of any use to you. You would proceed as usual until you reached the accident scene, and then you would start updating your value function and start exploring other actions.
Model-based reinforcement learning has especially been successful in developing AI systems that can master board games such as chess and Go, where the environment is deterministic.
In some cases, creating a decent model of the environment is either not possible or too difficult. And model-based reinforcement learning can potentially be very time-consuming, which can prove to be dangerous or even fatal in time-sensitive situations.
Computationally, model-based reinforcement learning is a lot more elaborate. You have to acquire the model, do the mental simulation, and you have to find the trajectory in your neural processes and then take the action, Lee said.
Lee added, however, that model-based reinforcement learning does not necessarily have to be more complicated than model-free RL.
What determines the complexity of model-free RL is all the possible combinations of stimulus set and action set, he said. As you have more and more states of the world or sensor representation, the pairs that youre going to have to learn between states and actions are going to increase. Therefore, even though the idea is simple, if there are many states and those states are mapped to different actions, youll need a lot of memory.
On the contrary, in model-based reinforcement learning, the complexity will depend on the model you build. If the environment is really complicated but can be modeled with a relatively simple model that can be acquired quickly, then the simulation would be much simpler and cost-efficient.
And if the environment tends to change relatively frequently, then rather than trying to relearn the stimulus-action pair associations whenever the world changes, you can have a much more efficient outcome if youre using model-based reinforcement learning, Lee said.
Basically, neither model-based nor model-free reinforcement learning is a perfect solution. And wherever you see a reinforcement learning system tackling a complicated problem, theres a likely chance that it is using both model-based and model-free RLand possibly more forms of learning.
Research in neuroscience shows that humans and animals have multiple forms of learning, and the brain constantly switches between these modes depending on the certainty it has on them at any given moment.
If the model-free RL is working really well and it is accurately predicting the reward all the time, that means theres less uncertainty with model-free and youre going to use it more, Lee said. And on the contrary, if you have a really accurate model of the world and you can do the mental simulations of whats going to happen every moment of time, then youre more likely to use model-based RL.
In recent years, there has been growing interest in creating AI systems that combine multiple modes of reinforcement learning. Recent research by scientists at UC San Diego shows that combining model-free and model-based reinforcement learning achieves superior performance in control tasks.
If you look at a complicated algorithm like AlphaGo, it has elements of both model-free and model-based RL, Lee said. It learns the state values based on board configurations, and that is basically model-free RL, because youre trying values depending on where all the stones are. But it also does forward search, which is model-based.
But despite remarkable achievements, progress in reinforcement learning is still slow. As soon as RL models are faced with complex and unpredictable environments, their performance starts to degrade. For example, creating a reinforcement learning system that played Dota 2 at championship level required tens of thousands of hours of training, a feat that is physically impossible for humans. Other tasks such as robotic hand manipulation also require huge amounts of training and trial-and-error.
Part of the reason reinforcement learning still struggles with efficiency is the gap remaining in our knowledge of learning in humans and animals. And we have much more than just model-free and model-based reinforcement learning, Lee believes.
I think our brain is a pandemonium of learning algorithms that have evolved to handle many different situations, he said.
In addition to constantly switching between these modes of learning, the brain manages to maintain and update them all the time, even when they are not actively involved in decision-making.
When you have multiple learning algorithms, they become useless if you turn some of them off. Even if youre relying on one algorithmsay model-free RLthe other algorithms must continue to run. I still have to update my world model rather than keep it frozen because if I dont, several hours later, when I realize that I need to switch to the model-based RL, it will be obsolete, Lee said.
Some interesting work in AI research shows how this might work. A recent technique inspired by psychologist Daniel Kahnemans System 1 and System 2 thinking shows that maintaining different learning modules and updating them in parallel helps improve the efficiency and accuracy of AI systems.
Another thing that we still have to figure out is how to apply the right inductive biases in our AI systems to make sure they learn the right things in a cost-efficient way. Billions of years of evolution have provided humans and animals with the inductive biases needed to learn efficiently and with as little data as possible.
The information that we get from the environment is very sparse. And using that information, we have to generalize. The reason is that the brain has inductive biases and has biases that can generalize from a small set of examples. That is the product of evolution, and a lot of neuroscientists are getting more interested in this, Lee said.
However, while inductive biases might be easy to understand for an object recognition task, they become a lot more complicated for abstract problems such as building social relationships.
The idea of inductive bias is quite universal and applies not just to perception and object recognition but to all kinds of problems that an intelligent being has to deal with, Lee said. And I think that is in a way orthogonal to the model-based and model-free distinction because its about how to build an efficient model of the complex structure based on a few observations. Theres a lot more that we need to understand.
Go here to read the rest:
A gentle introduction to model-free and model-based reinforcement learning - TechTalks
- AlphaGo led Lee 4-1 in March 2016. One round Lee Se-dol won remains the last round in which a man be.. - - December 5th, 2024 [December 5th, 2024]
- Koreans picked Google Artificial Intelligence (AI) AlphaGo as an image that comes to mind when they .. - MK - - March 16th, 2024 [March 16th, 2024]
- DeepMind AI rivals the world's smartest high schoolers at geometry - Ars Technica - January 20th, 2024 [January 20th, 2024]
- Why top AI talent is leaving Google's DeepMind - Sifted - November 20th, 2023 [November 20th, 2023]
- Who Is Ilya Sutskever, Meet The Man Who Fired Sam Altman - Dataconomy - November 20th, 2023 [November 20th, 2023]
- Microsoft's LLM 'Everything Of Thought' Method Improves AI ... - AiThority - November 20th, 2023 [November 20th, 2023]
- Absolutely, here's an article on the impact of upcoming technology - Medium - November 20th, 2023 [November 20th, 2023]
- AI: Elon Musk and xAI | Formtek Blog - Formtek Blog - November 20th, 2023 [November 20th, 2023]
- Rise of the Machines Exploring the Fascinating Landscape of ... - TechiExpert.com - November 20th, 2023 [November 20th, 2023]
- What can the current EU AI approach do to overcome the challenges ... - Modern Diplomacy - November 20th, 2023 [November 20th, 2023]
- If I had to pick one AI tool... this would be it. - Exponential View - November 20th, 2023 [November 20th, 2023]
- For the first time, AI produces better weather predictions -- and it's ... - ZME Science - November 20th, 2023 [November 20th, 2023]
- Understanding the World of Artificial Intelligence: A Comprehensive ... - Medium - October 17th, 2023 [October 17th, 2023]
- On AI and the soul-stirring char siu rice - asianews.network - October 17th, 2023 [October 17th, 2023]
- Nvidias Text-to-3D AI Tool Debuts While Its Hardware Business Hits Regulatory Headwinds - Decrypt - October 17th, 2023 [October 17th, 2023]
- One step closer to the Matrix: AI defeats human champion in Street ... - TechRadar - October 17th, 2023 [October 17th, 2023]
- The Vanishing Frontier - The American Conservative - October 17th, 2023 [October 17th, 2023]
- Alphabet: The complete guide to Google's parent company - Android Police - October 17th, 2023 [October 17th, 2023]
- How AI and ML Can Drive Sustainable Revenue Growth by Waleed ... - Digital Journal - October 9th, 2023 [October 9th, 2023]
- The better the AI gets, the harder it is to ignore - BSA bureau - October 9th, 2023 [October 9th, 2023]
- What If the Robots Were Very Nice While They Took Over the World? - WIRED - September 27th, 2023 [September 27th, 2023]
- From Draughts to DeepMind (Scary Smart) | by Sud Alogu | Aug, 2023 - Medium - August 5th, 2023 [August 5th, 2023]
- The Future of Competitive Gaming: AI Game Playing AI - Fagen wasanni - August 5th, 2023 [August 5th, 2023]
- AI's Transformative Impact on Industries - Fagen wasanni - August 5th, 2023 [August 5th, 2023]
- Analyzing the impact of AI in anesthesiology - INDIAai - August 5th, 2023 [August 5th, 2023]
- Economic potential of generative AI - McKinsey - June 20th, 2023 [June 20th, 2023]
- The Intersection of Reinforcement Learning and Deep Learning - CityLife - June 20th, 2023 [June 20th, 2023]
- Chinese AI Giant SenseTime Unveils USD559 Robot That Can Play ... - Yicai Global - June 20th, 2023 [June 20th, 2023]
- Cyber attacks on AI a problem for the future - Verdict - June 20th, 2023 [June 20th, 2023]
- Taming AI to the benefit of humans - Asia News NetworkAsia News ... - asianews.network - May 20th, 2023 [May 20th, 2023]
- Evolutionary reinforcement learning promises further advances in ... - EurekAlert - May 20th, 2023 [May 20th, 2023]
- Commentary: AI's successes - and problems - stem from our own ... - CNA - May 20th, 2023 [May 20th, 2023]
- Machine anxiety: How to reduce confusion and fear about AI technology - Thaiger - May 20th, 2023 [May 20th, 2023]
- We need more than ChatGPT to have true AI. It is merely the first ingredient in a complex recipe - Freethink - May 20th, 2023 [May 20th, 2023]
- Taming AI to the benefit of humans - Opinion - Chinadaily.com.cn - China Daily - May 16th, 2023 [May 16th, 2023]
- To understand AI's problems look at the shortcuts taken to create it - EastMojo - May 16th, 2023 [May 16th, 2023]
- Terence Tao Leads White House's Generative AI Working Group ... - Pandaily - May 16th, 2023 [May 16th, 2023]
- Why we should be concerned about advanced AI - Epigram - May 16th, 2023 [May 16th, 2023]
- Purdue President Chiang to grads: Let Boilermakers lead in ... - Purdue University - May 16th, 2023 [May 16th, 2023]
- 12 shots at staying ahead of AI in the workplace - pharmaphorum - May 16th, 2023 [May 16th, 2023]
- Hypotheses and Visions for an Intelligent World - Huawei - May 16th, 2023 [May 16th, 2023]
- Cloud storage is the key to unlocking AI's full potential for businesses - TechRadar - May 16th, 2023 [May 16th, 2023]
- The Quantum Frontier: Disrupting AI and Igniting a Patent Race - Lexology - April 19th, 2023 [April 19th, 2023]
- Putin and Xi seek to weaponize Artificial Intelligence against America - FOX Bangor/ABC 7 News and Stories - April 19th, 2023 [April 19th, 2023]
- The Future of Generative Large Language Models and Potential ... - JD Supra - April 19th, 2023 [April 19th, 2023]
- A Chatbot Beat the SAT. What Now? - The Atlantic - March 23rd, 2023 [March 23rd, 2023]
- Exclusive: See the cover for Benjamn Labatut's new novel, The ... - Literary Hub - March 23rd, 2023 [March 23rd, 2023]
- These companies are creating ChatGPT alternatives - Tech Monitor - March 23rd, 2023 [March 23rd, 2023]
- Google's AlphaGo AI Beats Human Go Champion | PCMag - February 24th, 2023 [February 24th, 2023]
- AlphaGo: using machine learning to master the ancient game of Go - Google - February 10th, 2023 [February 10th, 2023]
- AI Behind AlphaGo: Machine Learning and Neural Network - February 10th, 2023 [February 10th, 2023]
- Google AlphaGo: How a recreational program will change the world - February 10th, 2023 [February 10th, 2023]
- Computer Go - Wikipedia - November 22nd, 2022 [November 22nd, 2022]
- AvataGo's Metaverse AR Environment will be Your Eternal Friend - Digital Journal - September 17th, 2022 [September 17th, 2022]
- This AI-Generated Artwork Won 1st Place At Fine Arts Contest And Enraged Artists - Bored Panda - September 3rd, 2022 [September 3rd, 2022]
- The best performing from AI in blockchain games, a new DRL model published by rct AI based on training AI in Axie Infinity, AI surpasses the real... - September 3rd, 2022 [September 3rd, 2022]
- Three Methods Researchers Use To Understand AI Decisions - RTInsights - August 20th, 2022 [August 20th, 2022]
- What is my chatbot thinking? Nothing. Here's why the Google sentient bot debate is flawed - Diginomica - August 7th, 2022 [August 7th, 2022]
- Opinion: Can AI be creative? - Los Angeles Times - August 2nd, 2022 [August 2nd, 2022]
- AI predicts the structure of all known proteins and opens a new universe for science - EL PAS USA - August 2nd, 2022 [August 2nd, 2022]
- What is Ethereum Gray Glacier? Should you be worried? - Cryptopolitan - June 24th, 2022 [June 24th, 2022]
- How AI and human intelligence will beat cancer - VentureBeat - June 19th, 2022 [June 19th, 2022]
- Race-by-race tips and preview for Newcastle on Monday - Sydney Morning Herald - June 19th, 2022 [June 19th, 2022]
- The role of 'God' in the 'Matrix' - Analytics India Magazine - June 3rd, 2022 [June 3rd, 2022]
- The Powerful New AI Hardware of the Future - CDOTrends - June 3rd, 2022 [June 3rd, 2022]
- The 50 Best Documentaries of All Time 24/7 Wall St. - 24/7 Wall St. - June 3rd, 2022 [June 3rd, 2022]
- How Could AI be used in the Online Casino Industry - Rebellion Research - April 12th, 2022 [April 12th, 2022]
- 5 Times Artificial Intelligence Have Busted World Champions - Analytics Insight - April 2nd, 2022 [April 2nd, 2022]
- The Guardian view on bridging human and machine learning: its all in the game - The Guardian - April 2nd, 2022 [April 2nd, 2022]
- How to Strengthen America's Artificial Intelligence Innovation - The National Interest - April 2nd, 2022 [April 2nd, 2022]
- Why it's time to address the ethical dilemmas of artificial intelligence - Economic Times - April 2nd, 2022 [April 2nd, 2022]
- About - Deepmind - March 18th, 2022 [March 18th, 2022]
- Experts believe a neuro-symbolic approach to be the next big thing in AI. Does it live up to the claims? - Analytics India Magazine - March 18th, 2022 [March 18th, 2022]
- Measuring Attention In Science And Technology - Forbes - March 18th, 2022 [March 18th, 2022]
- The Discontents Of Artificial Intelligence In 2022 - Inventiva - March 16th, 2022 [March 16th, 2022]
- Is AI the Future of Sports? - Built In - March 5th, 2022 [March 5th, 2022]
- This is the reason Demis Hassabis started DeepMind - MIT Technology Review - February 28th, 2022 [February 28th, 2022]
- Sony's AI system outraces some of the world's best e-sports drivers | The Asahi Shimbun: Breaking News, Japan News and Analysis - Asahi Shimbun - February 28th, 2022 [February 28th, 2022]
- SysMoore: The Next 10 Years, The Next 1,000X In Performance - The Next Platform - February 28th, 2022 [February 28th, 2022]
- The World's Shortest List Of Technologies To Watch In 2022 - Forbes - February 3rd, 2022 [February 3rd, 2022]