A gentle introduction to model-free and model-based reinforcement learning – TechTalks
Image credit: 123RF (with modifications)
Reinforcement learning is one of the exciting branches of artificial intelligence. It plays an important role in game-playing AI systems, modern robots, chip-design systems, and other applications.
There are many different types of reinforcement learning algorithms, but two main categories are model-based and model-free RL. They are both inspired by our understanding of learning in humans and animals.
Nearly every book on reinforcement learning contains a chapter that explains the differences between model-free and model-based reinforcement learning. But seldom are the biological and evolutionary precedents discussed in books about reinforcement learning algorithms for computers.
I found a very interesting explanation of model-free and model-based RL in The Birth of Intelligence, a book that explores the evolution of intelligence. In a conversation with TechTalks, Daeyeol Lee, neuroscientist and author of The Birth of Intelligence, discussed different modes of reinforcement learning in humans and animals, AI and natural intelligence, and future directions of research.
In the late nineteenth century, psychologist Edward Thorndike proposed the law of effect, which states that actions with positive effects in a particular situation become more likely to occur again in that situation, and responses that produce negative effects become less likely to occur in the future.
Thorndike explored the law of effect with an experiment in which he placed a cat inside a puzzle box and measured the time it took for the cat to escape it. To escape, the cat had to manipulate a series of gadgets such as strings and levers. Thorndike observed that as the cat interacted with the puzzle box, it learned the behavioral responses that could help it escape. Over time, the cat became faster and faster at escaping the box. Thorndike concluded that the cat learned from the reward and punishments that its actions provided.
The law of effect later paved the way for behaviorism, a branch of psychology that tries to explain human and animal behavior in terms of stimuli and responses.
The law of effect is also the basis for model-free reinforcement learning. In model-free reinforcement learning, an agent perceives the world, takes an action, and measures the reward. The agent usually starts by taking random actions and gradually repeats those that are associated with more rewards.
You basically look at the state of the world, a snapshot of what the world looks like, and then you take an action. Afterward, you increase or decrease the probability of taking the same action in the given situation depending on its outcome, Lee said. Thats basically what model-free reinforcement learning is. The simplest thing you can imagine.
In model-free reinforcement learning, theres no direct knowledge or model of the world. The RL agent must directly experience every outcome of each action through trial and error.
Thorndikes law of effect was prevalent until the 1930s, when Edward Tolman, another psychologist, discovered an important insight while exploring how fast rats could learn to navigate mazes. During his experiments, Tolman realized that animals could learn things about their environment without reinforcement.
For example, when a rat is let loose in a maze, it will freely explore the tunnels and gradually learn the structure of the environment. If the same rat is later reintroduced to the same environment and is provided with a reinforcement signal, such as finding food or searching for the exit, it can reach its goal much quicker than animals who did not have the opportunity to explore the maze. Tolman called this latent learning.
Latent learning enables animals and humans to develop a mental representation of their world and simulate hypothetical scenarios in their minds and predict the outcome. This is also the basis of model-based reinforcement learning.
In model-based reinforcement learning, you develop a model of the world. In terms of computer science, its a transition probability, how the world goes from one state to another state depending on what kind of action you produce in it, Lee said. When youre in a given situation where youve already learned the model of the environment previously, youll do a mental simulation. Youll basically search through the model youve acquired in your brain and try to see what kind of outcome would occur if you take a particular series of actions. And when you find the path of actions that will get you to the goal that you want, youll start taking those actions physically.
The main benefit of model-based reinforcement learning is that it obviates the need for the agent to undergo trial-and-error in its environment. For example, if you hear about an accident that has blocked the road you usually take to work, model-based RL will allow you to do a mental simulation of alternative routes and change your path. With model-free reinforcement learning, the new information would not be of any use to you. You would proceed as usual until you reached the accident scene, and then you would start updating your value function and start exploring other actions.
Model-based reinforcement learning has especially been successful in developing AI systems that can master board games such as chess and Go, where the environment is deterministic.
In some cases, creating a decent model of the environment is either not possible or too difficult. And model-based reinforcement learning can potentially be very time-consuming, which can prove to be dangerous or even fatal in time-sensitive situations.
Computationally, model-based reinforcement learning is a lot more elaborate. You have to acquire the model, do the mental simulation, and you have to find the trajectory in your neural processes and then take the action, Lee said.
Lee added, however, that model-based reinforcement learning does not necessarily have to be more complicated than model-free RL.
What determines the complexity of model-free RL is all the possible combinations of stimulus set and action set, he said. As you have more and more states of the world or sensor representation, the pairs that youre going to have to learn between states and actions are going to increase. Therefore, even though the idea is simple, if there are many states and those states are mapped to different actions, youll need a lot of memory.
On the contrary, in model-based reinforcement learning, the complexity will depend on the model you build. If the environment is really complicated but can be modeled with a relatively simple model that can be acquired quickly, then the simulation would be much simpler and cost-efficient.
And if the environment tends to change relatively frequently, then rather than trying to relearn the stimulus-action pair associations whenever the world changes, you can have a much more efficient outcome if youre using model-based reinforcement learning, Lee said.
Basically, neither model-based nor model-free reinforcement learning is a perfect solution. And wherever you see a reinforcement learning system tackling a complicated problem, theres a likely chance that it is using both model-based and model-free RLand possibly more forms of learning.
Research in neuroscience shows that humans and animals have multiple forms of learning, and the brain constantly switches between these modes depending on the certainty it has on them at any given moment.
If the model-free RL is working really well and it is accurately predicting the reward all the time, that means theres less uncertainty with model-free and youre going to use it more, Lee said. And on the contrary, if you have a really accurate model of the world and you can do the mental simulations of whats going to happen every moment of time, then youre more likely to use model-based RL.
In recent years, there has been growing interest in creating AI systems that combine multiple modes of reinforcement learning. Recent research by scientists at UC San Diego shows that combining model-free and model-based reinforcement learning achieves superior performance in control tasks.
If you look at a complicated algorithm like AlphaGo, it has elements of both model-free and model-based RL, Lee said. It learns the state values based on board configurations, and that is basically model-free RL, because youre trying values depending on where all the stones are. But it also does forward search, which is model-based.
But despite remarkable achievements, progress in reinforcement learning is still slow. As soon as RL models are faced with complex and unpredictable environments, their performance starts to degrade. For example, creating a reinforcement learning system that played Dota 2 at championship level required tens of thousands of hours of training, a feat that is physically impossible for humans. Other tasks such as robotic hand manipulation also require huge amounts of training and trial-and-error.
Part of the reason reinforcement learning still struggles with efficiency is the gap remaining in our knowledge of learning in humans and animals. And we have much more than just model-free and model-based reinforcement learning, Lee believes.
I think our brain is a pandemonium of learning algorithms that have evolved to handle many different situations, he said.
In addition to constantly switching between these modes of learning, the brain manages to maintain and update them all the time, even when they are not actively involved in decision-making.
When you have multiple learning algorithms, they become useless if you turn some of them off. Even if youre relying on one algorithmsay model-free RLthe other algorithms must continue to run. I still have to update my world model rather than keep it frozen because if I dont, several hours later, when I realize that I need to switch to the model-based RL, it will be obsolete, Lee said.
Some interesting work in AI research shows how this might work. A recent technique inspired by psychologist Daniel Kahnemans System 1 and System 2 thinking shows that maintaining different learning modules and updating them in parallel helps improve the efficiency and accuracy of AI systems.
Another thing that we still have to figure out is how to apply the right inductive biases in our AI systems to make sure they learn the right things in a cost-efficient way. Billions of years of evolution have provided humans and animals with the inductive biases needed to learn efficiently and with as little data as possible.
The information that we get from the environment is very sparse. And using that information, we have to generalize. The reason is that the brain has inductive biases and has biases that can generalize from a small set of examples. That is the product of evolution, and a lot of neuroscientists are getting more interested in this, Lee said.
However, while inductive biases might be easy to understand for an object recognition task, they become a lot more complicated for abstract problems such as building social relationships.
The idea of inductive bias is quite universal and applies not just to perception and object recognition but to all kinds of problems that an intelligent being has to deal with, Lee said. And I think that is in a way orthogonal to the model-based and model-free distinction because its about how to build an efficient model of the complex structure based on a few observations. Theres a lot more that we need to understand.
Go here to read the rest:
A gentle introduction to model-free and model-based reinforcement learning - TechTalks
- "The Man Who Beat AlphaGo" Lee Se-dol picked "Marriage" as one of the best things in his life.Recent.. - - November 7th, 2025 [November 7th, 2025]
- Schwarzenegger urges Californians to oppose Democratic redistricting ballot measure, as GOP presses on in other states - CNN - October 26th, 2025 [October 26th, 2025]
- Trump says hes targeting Democrats programs, but the suffering is bipartisan - The Hill - October 26th, 2025 [October 26th, 2025]
- Analysis | After Trump gains, New Jersey governors race offers a test for Democrats - The Washington Post - October 26th, 2025 [October 26th, 2025]
- Trump looms over 2025 races in Virginia, New Jersey, NYC and California - USA Today - October 26th, 2025 [October 26th, 2025]
- Opinion | How Democrats Became the Party of the Well-to-Do - The New York Times - October 26th, 2025 [October 26th, 2025]
- Transcript: House Minority Leader Hakeem Jeffries on "Face the Nation with Margaret Brennan," Oct. 26, 2025 - CBS News - October 26th, 2025 [October 26th, 2025]
- 'King-like powers': Chris Murphy says Trump prefers the government to remain closed - Politico - October 26th, 2025 [October 26th, 2025]
- On GPS: Is the future of the Democratic Party on the left? - CNN - October 26th, 2025 [October 26th, 2025]
- Elect the Jersey guy: How Jack Ciattarelli is trying to erase Democrats advantage in a crucial governors race - CNN - October 26th, 2025 [October 26th, 2025]
- Can Democrats harness the energy of the No Kings protests to fight Trump? - The Guardian - October 26th, 2025 [October 26th, 2025]
- Democrats face identity crisis after years of losing touch with voters - Deseret News - October 26th, 2025 [October 26th, 2025]
- Meet the candidates in the special election for Texas Senate District 9 - CBS News - October 26th, 2025 [October 26th, 2025]
- New Georgia Democratic Party leader, government shutdown, NBA gambling | On The Record with ANF - Atlanta News First - October 26th, 2025 [October 26th, 2025]
- Expert warns Democrats risk backlash over failure to condemn violent rhetoric in their ranks - Fox News - October 26th, 2025 [October 26th, 2025]
- I hate to be the one to tell you, but Democrats are starting to like Trump | Opinion - USA Today - October 26th, 2025 [October 26th, 2025]
- Why has the US government shut down and what does it mean? - BBC - October 26th, 2025 [October 26th, 2025]
- Article | Virginia Democrats are the next surprising entrant into the redistricting battle - POLITICO Pro - October 26th, 2025 [October 26th, 2025]
- Could she be Democrats' greatest Hope? Meet Tim Walz's TikTok famous daughter. - USA Today - October 26th, 2025 [October 26th, 2025]
- Democrats Join With Trump in the Death of Democracy - GV Wire - October 26th, 2025 [October 26th, 2025]
- Opinion | The exploding cigar of mid-decade gerrymandering - The Washington Post - October 26th, 2025 [October 26th, 2025]
- Minnesota Democrats hold the first of a series of town halls on gun violence - MPR News - October 26th, 2025 [October 26th, 2025]
- South Korean Go champion defeats AlphaGo for the first time in a comeback victory - Mashdigi - September 25th, 2025 [September 25th, 2025]
- Why AlphaGo, not ChatGPT, will shape the future of wealth management - Professional Wealth Management - September 17th, 2025 [September 17th, 2025]
- The world shuddered when Lee Se-dol made a "God's move" against AlphaGo in 2016. The final result wa.. - - August 26th, 2025 [August 26th, 2025]
- The Go Summit concluded with AlphaGo 2.0 defeating the human brain in three matches. - Mashdigi - August 22nd, 2025 [August 22nd, 2025]
- Lee Sedol showcases board game success and family life on 'Radio Star' - CHOSUNBIZ - Chosun Biz - August 20th, 2025 [August 20th, 2025]
- AlphaGo evolved again and in just three days learned the human Go strategy that took thousands of years to develop. - Mashdigi - August 18th, 2025 [August 18th, 2025]
- In the third round of the Man vs. Machine game, a five-player team still lost to AlphaGo 5. - Mashdigi - August 18th, 2025 [August 18th, 2025]
- AlphaGo defeated Lee Sedol 4:1 to end the century showdown - Mashdigi - August 18th, 2025 [August 18th, 2025]
- Google: The key to AlphaGo 2.0's fast thinking lies in the TensorFlow learning framework - Mashdigi - August 18th, 2025 [August 18th, 2025]
- World Go champion Ke Jie faces AlphaGo 2.0 in the showdown of the century tomorrow. - Mashdigi - August 18th, 2025 [August 18th, 2025]
- Lee Se-dol, a Go engineer who played a great match with "AlphaGo" with Lee Kuk-jong, the head of the.. - - August 14th, 2025 [August 14th, 2025]
- The Rise of Self-Improving AI : How Machines Are Redefining Innovation - Geeky Gadgets - August 6th, 2025 [August 6th, 2025]
- AI Wins Gold Medal at International Mathematical Olympiad (IMO), but "AlphaGo Moment" in Math Community Yet to Arrive - 36Kr - August 1st, 2025 [August 1st, 2025]
- It's exciting, but you can't just read it comfortably. This is the story of Jang Kang-myung's latest.. - - July 20th, 2025 [July 20th, 2025]
- Google's AlphaGo retires from competition after beating world number one 3 - 0 - HardwareZone Singapore - June 29th, 2025 [June 29th, 2025]
- Google's AlphaGo AI just beat the number one ranked Go player in the world - HardwareZone Singapore - June 29th, 2025 [June 29th, 2025]
- It was November 2015. There were two world competitions. It was four months before AlphaGo, made by - - June 22nd, 2025 [June 22nd, 2025]
- The rise of Generative AI: from AlphaGo to ChatGPT - imd.org - June 1st, 2025 [June 1st, 2025]
- With the effect of Lee Se-dol, a former Go player who beat AlphaGo, "Devils Plan 2" became the secon.. - - May 14th, 2025 [May 14th, 2025]
- Chinese teams AI paper paved the way for ChatGPT. Greater glory awaits by 2030 - South China Morning Post - April 21st, 2025 [April 21st, 2025]
- AI scholars win Turing Prize for technique that made possible AlphaGo's chess triumph - ZDNet - March 9th, 2025 [March 9th, 2025]
- The evolution of AI: From AlphaGo to AI agents, physical AI, and beyond - MIT Technology Review - March 1st, 2025 [March 1st, 2025]
- AlphaGo led Lee 4-1 in March 2016. One round Lee Se-dol won remains the last round in which a man be.. - - December 5th, 2024 [December 5th, 2024]
- Koreans picked Google Artificial Intelligence (AI) AlphaGo as an image that comes to mind when they .. - MK - - March 16th, 2024 [March 16th, 2024]
- DeepMind AI rivals the world's smartest high schoolers at geometry - Ars Technica - January 20th, 2024 [January 20th, 2024]
- Why top AI talent is leaving Google's DeepMind - Sifted - November 20th, 2023 [November 20th, 2023]
- Who Is Ilya Sutskever, Meet The Man Who Fired Sam Altman - Dataconomy - November 20th, 2023 [November 20th, 2023]
- Microsoft's LLM 'Everything Of Thought' Method Improves AI ... - AiThority - November 20th, 2023 [November 20th, 2023]
- Absolutely, here's an article on the impact of upcoming technology - Medium - November 20th, 2023 [November 20th, 2023]
- AI: Elon Musk and xAI | Formtek Blog - Formtek Blog - November 20th, 2023 [November 20th, 2023]
- Rise of the Machines Exploring the Fascinating Landscape of ... - TechiExpert.com - November 20th, 2023 [November 20th, 2023]
- What can the current EU AI approach do to overcome the challenges ... - Modern Diplomacy - November 20th, 2023 [November 20th, 2023]
- If I had to pick one AI tool... this would be it. - Exponential View - November 20th, 2023 [November 20th, 2023]
- For the first time, AI produces better weather predictions -- and it's ... - ZME Science - November 20th, 2023 [November 20th, 2023]
- Understanding the World of Artificial Intelligence: A Comprehensive ... - Medium - October 17th, 2023 [October 17th, 2023]
- On AI and the soul-stirring char siu rice - asianews.network - October 17th, 2023 [October 17th, 2023]
- Nvidias Text-to-3D AI Tool Debuts While Its Hardware Business Hits Regulatory Headwinds - Decrypt - October 17th, 2023 [October 17th, 2023]
- One step closer to the Matrix: AI defeats human champion in Street ... - TechRadar - October 17th, 2023 [October 17th, 2023]
- The Vanishing Frontier - The American Conservative - October 17th, 2023 [October 17th, 2023]
- Alphabet: The complete guide to Google's parent company - Android Police - October 17th, 2023 [October 17th, 2023]
- How AI and ML Can Drive Sustainable Revenue Growth by Waleed ... - Digital Journal - October 9th, 2023 [October 9th, 2023]
- The better the AI gets, the harder it is to ignore - BSA bureau - October 9th, 2023 [October 9th, 2023]
- What If the Robots Were Very Nice While They Took Over the World? - WIRED - September 27th, 2023 [September 27th, 2023]
- From Draughts to DeepMind (Scary Smart) | by Sud Alogu | Aug, 2023 - Medium - August 5th, 2023 [August 5th, 2023]
- The Future of Competitive Gaming: AI Game Playing AI - Fagen wasanni - August 5th, 2023 [August 5th, 2023]
- AI's Transformative Impact on Industries - Fagen wasanni - August 5th, 2023 [August 5th, 2023]
- Analyzing the impact of AI in anesthesiology - INDIAai - August 5th, 2023 [August 5th, 2023]
- Economic potential of generative AI - McKinsey - June 20th, 2023 [June 20th, 2023]
- The Intersection of Reinforcement Learning and Deep Learning - CityLife - June 20th, 2023 [June 20th, 2023]
- Chinese AI Giant SenseTime Unveils USD559 Robot That Can Play ... - Yicai Global - June 20th, 2023 [June 20th, 2023]
- Cyber attacks on AI a problem for the future - Verdict - June 20th, 2023 [June 20th, 2023]
- Taming AI to the benefit of humans - Asia News NetworkAsia News ... - asianews.network - May 20th, 2023 [May 20th, 2023]
- Evolutionary reinforcement learning promises further advances in ... - EurekAlert - May 20th, 2023 [May 20th, 2023]
- Commentary: AI's successes - and problems - stem from our own ... - CNA - May 20th, 2023 [May 20th, 2023]
- Machine anxiety: How to reduce confusion and fear about AI technology - Thaiger - May 20th, 2023 [May 20th, 2023]
- We need more than ChatGPT to have true AI. It is merely the first ingredient in a complex recipe - Freethink - May 20th, 2023 [May 20th, 2023]
- Taming AI to the benefit of humans - Opinion - Chinadaily.com.cn - China Daily - May 16th, 2023 [May 16th, 2023]
- To understand AI's problems look at the shortcuts taken to create it - EastMojo - May 16th, 2023 [May 16th, 2023]