Why video games and board games arent a good measure of AI intelligence – The Verge
Measuring the intelligence of AI is one of the trickiest but most important questions in the field of computer science. If you cant understand whether the machine youve built is cleverer today than it was yesterday, how do you know youre making progress?
At first glance, this might seem like a non-issue. Obviously AI is getting smarter is one reply. Just look at all the money and talent pouring into the field. Look at the milestones, like beating humans at Go, and the applications that were impossible to solve a decade ago that are commonplace today, like image recognition. How is that not progress?
Another reply is that these achievements arent really a good gauge of intelligence. Beating humans at chess and Go is impressive, yes, but what does it matter if the smartest computer can be out-strategized in general problem-solving by a toddler or a rat?
This is a criticism put forward by AI researcher Franois Chollet, a software engineer at Google and a well-known figure in the machine learning community. Chollet is the creator of Keras, a widely used program for developing neural networks, the backbone of contemporary AI. Hes also written numerous textbooks on machine learning and maintains a popular Twitter feed where he shares his opinions on the field.
In a recent paper titled On the Measure of Intelligence, Chollet also laid out an argument that the AI world needs to refocus on what intelligence is and isnt. If researchers want to make progress toward general artificial intelligence, says Chollet, they need to look past popular benchmarks like video games and board games, and start thinking about the skills that actually make humans clever, like our ability to generalize and adapt.
In an email interview with The Verge, Chollet explained his thoughts on this subject, talking through why he believes current achievements in AI have been misrepresented, how we might measure intelligence in the future, and why scary stories about super intelligent AI (as told by Elon Musk and others) have an unwarranted hold on the publics imagination.
This interview has been lightly edited for clarity.
In your paper, you describe two different conceptions of intelligence that have shaped the field of AI. One presents intelligence as the ability to excel in a wide range of tasks, while the other prioritizes adaptability and generalization, which is the ability for AI to respond to novel challenges. Which framework is a bigger influence right now, and what are the consequences of that?
In the first 30 years of the history of the field, the most influential view was the former: intelligence as a set of static programs and explicit knowledge bases. Right now, the pendulum has swung very far in the opposite direction: the dominant way of conceptualizing intelligence in the AI community is the blank slate or, to use a more relevant metaphor, the freshly-initialized deep neural network. Unfortunately, its a framework thats been going largely unchallenged and even largely unexamined. These questions have a long intellectual history literally decades and I dont see much awareness of this history in the field today, perhaps because most people doing deep learning today joined the field after 2016.
Its never a good thing to have such intellectual monopolies, especially as an answer to poorly understood scientific questions. It restricts the set of questions that get asked. It restricts the space of ideas that people pursue. I think researchers are now starting to wake up to that fact.
In your paper, you also make the case that AI needs a better definition of intelligence in order to improve. Right now, you argue, researchers focus on benchmarking performance in static tests like beating video games and board games. Why do you find this measure of intelligence lacking?
The thing is, once you pick a measure, youre going to take whatever shortcut is available to game it. For instance, if you set chess-playing as your measure of intelligence (which we started doing in the 1970s until the 1990s), youre going to end up with a system that plays chess, and thats it. Theres no reason to assume it will be good for anything else at all. You end up with tree search and minimax, and that doesnt teach you anything about human intelligence. Today, pursuing skill at video games like Dota or StarCraft as a proxy for general intelligence falls into the exact same intellectual trap.
This is perhaps not obvious because, in humans, skill and intelligence are closely related. The human mind can use its general intelligence to acquire task-specific skills. A human that is really good at chess can be assumed to be pretty intelligent because, implicitly, we know they started from zero and had to use their general intelligence to learn to play chess. They werent designed to play chess. So we know they could direct this general intelligence to many other tasks and learn to do these tasks similarly efficiently. Thats what generality is about.
But a machine has no such constraints. A machine can absolutely be designed to play chess. So the inference we do for humans can play chess, therefore must be intelligent breaks down. Our anthropomorphic assumptions no longer apply. General intelligence can generate task-specific skills, but there is no path in reverse, from task-specific skill to generality. At all. So in machines, skill is entirely orthogonal to intelligence. You can achieve arbitrary skills at arbitrary tasks as long as you can sample infinite data about the task (or spend an infinite amount of engineering resources). And that will still not get you one inch closer to general intelligence.
The key insight is that there is no task where achieving high skill is a sign of intelligence. Unless the task is actually a meta-task, that involves acquiring new skills over a broad [range] of previously unknown problems. And thats exactly what I propose as a benchmark of intelligence.
If these current benchmarks dont help us develop AI with more generalized, flexible intelligence, why are they so popular?
Theres no doubt that the effort to beat human champions at specific well-known video games is primarily driven by the press coverage these projects can generate. If the public wasnt interested in these flashy milestones that are so easy to misrepresent as steps toward superhuman general AI, researchers would be doing something else.
I think its a bit sad because research should about answering open scientific questions, not generating PR. If I set out to solve Warcraft III at a superhuman level using deep learning, you can be quite sure that I will get there as long as I have access to sufficient engineering talent and computing power (which is on the order of tens of millions of dollars for a task like this). But once Id have done it, what would I have learned about intelligence or generalization? Well, nothing. At best, Id have developed engineering knowledge about scaling up deep learning. So I dont really see it as scientific research because it doesnt teach us anything we didnt already know. It doesnt answer any open question. If the question was, Can we play X at a superhuman level?, the answer is definitely, Yes, as long as you can generate a sufficiently dense sample of training situations and feed them into a sufficiently expressive deep learning model. Weve known this for some time. (I actually said as much a while before the Dota 2 and StarCraft II AIs reached champion level.)
What do you think the actual achievements of these projects are? To what extent are their results misunderstood or misrepresented?
One stark misrepresentation Im seeing is the argument that these high-skill game-playing systems represent real progress toward AI systems, which can handle the complexity and uncertainty of the real world [as OpenAI claimed in a press release about its Dota 2-playing bot OpenAI Five]. They do not. If they did, it would be an immensely valuable research area, but that is simply not true. Take OpenAI Five, for instance: it wasnt able to handle the complexity of Dota 2 in the first place because it was trained with 16 characters, and it could not generalize to the full game, which has over 100 characters. It was trained over 45,000 years of gameplay then again, note how training data requirements grow combinatorially with task complexity yet, the resulting model proved very brittle: non-champion human players were able to find strategies to reliably beat it in a matter of days after the AI was made available for the public to play against.
If you want to one day become able to handle the complexity and uncertainty of the real world, you have to start asking questions like, what is generalization? How do we measure and maximize generalization in learning systems? And thats entirely orthogonal to throwing 10x more data and compute at a big neural network so that it improves its skill by some small percentage.
So what would be a better measure of intelligence for the field to focus on?
In short, we need to stop evaluating skill at tasks that are known beforehand like chess or Dota or StarCraft and instead start evaluating skill-acquisition ability. This means only using new tasks that are not known to the system beforehand, measuring the prior knowledge about the task that the system starts with, and measuring the sample-efficiency of the system (which is how much data is needed to learn to do the task). The less information (prior knowledge and experience) you require in order to reach a given level of skill, the more intelligent you are. And todays AI systems are really not very intelligent at all.
In addition, I think our measure of intelligence should make human-likeness more explicit because there may be different types of intelligence, and human-like intelligence is what were really talking about, implicitly, when we talk about general intelligence. And that involves trying to understand what prior knowledge humans are born with. Humans learn incredibly efficiently they only require very little experience to acquire new skills but they dont do it from scratch. They leverage innate prior knowledge, besides a lifetime of accumulated skills and knowledge.
[My recent paper] proposes a new benchmark dataset, ARC, which looks a lot like an IQ test. ARC is a set of reasoning tasks, where each task is explained via a small sequence of demonstrations, typically three, and you should learn to accomplish the task from these few demonstrations. ARC takes the position that every task your system is evaluated on should be brand-new and should only involve knowledge of a kind that fits within human innate knowledge. For instance, it should not feature language. Currently, ARC is totally solvable by humans, without any verbal explanations or prior training, but it is completely unapproachable by any AI technique weve tried so far. Thats a big flashing sign that theres something going on there, that were in need of new ideas.
Do you think the AI world can continue to progress by just throwing more computing power at problems? Some have argued that, historically, this has been the most successful approach to improving performance. While others have suggested that were soon going to see diminishing returns if we just follow this path.
This is absolutely true if youre working on a specific task. Throwing more training data and compute power at a vertical task will increase performance on that task. But it will gain you about zero incremental understanding of how to achieve generality in artificial intelligence.
If you have a sufficiently large deep learning model, and you train it on a dense sampling of the input-cross-output space for a task, then it will learn to solve the task, whatever that may be Dota, StarCraft, you name it. Its tremendously valuable. It has almost infinite applications in machine perception problems. The only problem here is that the amount of data you need is a combinatorial function of task complexity, so even slightly complex tasks can become prohibitively expensive.
Take self-driving cars, for instance. Millions upon millions of training situations arent sufficient for an end-to-end deep learning model to learn to safely drive a car. Which is why, first of all, L5 self-driving isnt quite there yet. And second, the most advanced self-driving systems are primarily symbolic models that use deep learning to interface these manually engineered models with sensor data. If deep learning could generalize, wed have had L5 self-driving in 2016, and it would have taken the form of a big neural network.
Lastly, given youre talking about constraints for current AI systems, it seems worth asking about the idea of superintelligence the fear that an extremely powerful AI could cause extreme harm to humanity in the near future. Do you think such fears are legitimate?
No, I dont believe the superintelligence narrative to be well-founded. We have never created an autonomous intelligent system. There is absolutely no sign that we will be able to create one in the foreseeable future. (This isnt where current AI progress is headed.) And we have absolutely no way to speculate what its characteristics may be if we do end up creating one in the far future. To use an analogy, its a bit like asking in the year 1600: Ballistics has been progressing pretty fast! So, what if we had a cannon that could wipe out an entire city. How do we make sure it would only kill the bad guys? Its a rather ill-formed question, and debating it in the absence of any knowledge about the system were talking about amounts, at best, to a philosophical argument.
One thing about these superintelligence fears is that they mask the fact that AI has the potential to be pretty dangerous today. We dont need superintelligence in order for certain AI applications to represent a danger. Ive written about the use of AI to implement algorithmic propaganda systems. Others have written about algorithmic bias, the use of AI in weapons systems, or about AI as a tool of totalitarian control.
Theres a story about the siege of Constantinople in 1453. While the city was fighting off the Ottoman army, its scholars and rulers were debating what the sex of angels might be. Well, the more energy and attention we spend discussing the sex of angels or the value alignment of hypothetical superintelligent AIs, the less we have for dealing with the real and pressing issues that AI technology poses today. Theres a well-known tech leader that likes to depict superintelligent AI as an existential threat to humanity. Well, while these ideas are grabbing headlines, youre not discussing the ethical questions raised by the deployment of insufficiently accurate self-driving systems on our roads that cause crashes and loss of life.
If one accepts these criticisms that there is not currently a technical grounding for these fears why do you think the superintelligence narrative is popular?
Ultimately, I think its a good story, and people are attracted to good stories. Its not a coincidence that it resembles eschatological religious stories because religious stories have evolved and been selected over time to powerfully resonate with people and to spread effectively. For the very same reason, you also find this narrative in science fiction movies and novels. The reason why its used in fiction, the reason why it resembles religious narratives, and the reason why it has been catching on as a way to understand where AI is headed are all the same: its a good story. And people need stories to make sense of the world. Theres far more demand for such stories than demand for understanding the nature of intelligence or understanding what drives technological progress.
Original post:
Why video games and board games arent a good measure of AI intelligence - The Verge
- 3 Artificial Intelligence (AI) Stocks That Could Deliver Stunning Returns This Year - The Motley Fool - January 27th, 2025 [January 27th, 2025]
- Trumps White House and the New Artificial Intelligence Era - The Dispatch - January 27th, 2025 [January 27th, 2025]
- Artificial intelligence confirms it - these are the jobs that will become extinct in the next 5 years - Unin Rayo - January 27th, 2025 [January 27th, 2025]
- My Top 2 Artificial Intelligence (AI) Stocks for 2025 (Hint: Nvidia Is Not One of Them) - Nasdaq - January 27th, 2025 [January 27th, 2025]
- Artificial intelligence bill passes in the Arkansas House - THV11.com KTHV - January 27th, 2025 [January 27th, 2025]
- Chen elected fellow of Association for the Advancement of Artificial Intelligence - The Source - WashU - WashU - January 27th, 2025 [January 27th, 2025]
- Nvidia Plummeted Today -- Time to Buy the Artificial Intelligence (AI) Leader's Stock? - The Motley Fool - January 27th, 2025 [January 27th, 2025]
- Super Micro Computer Plummeted Today -- Is It Time to Buy the Artificial Intelligence (AI) Stock? - The Motley Fool - January 27th, 2025 [January 27th, 2025]
- The Brief: Impact practitioners on the perils and possibilities of artificial intelligence - ImpactAlpha - January 27th, 2025 [January 27th, 2025]
- 3 Mega-Cap Artificial Intelligence (AI) Stocks Wall Street Thinks Will Soar the Most Over the Next 12 Months - sharewise - January 27th, 2025 [January 27th, 2025]
- 3 Mega-Cap Artificial Intelligence (AI) Stocks Wall Street Thinks Will Soar the Most Over the Next 12 Months - The Motley Fool - January 27th, 2025 [January 27th, 2025]
- Ask how you can do human good: artificial intelligence and the future at HKS - Harvard Kennedy School - January 27th, 2025 [January 27th, 2025]
- This Unstoppable Artificial Intelligence (AI) Stock Climbed 90% in 2024, and Its Still a Buy at Todays Price - MSN - January 27th, 2025 [January 27th, 2025]
- Nvidia Plummeted Today -- Time to Buy the Artificial Intelligence (AI) Leader's Stock? - MSN - January 27th, 2025 [January 27th, 2025]
- Artificial intelligence: key updates and developments (20 27 January) - Lexology - January 27th, 2025 [January 27th, 2025]
- Here's 1 Trillion-Dollar Artificial Intelligence (AI) Chip Stock to Buy Hand Over Fist While It's Still a Bargain - The Motley Fool - January 27th, 2025 [January 27th, 2025]
- Artificial intelligence curriculum being questioned as the future of education in Pennsylvania 'cyber charters' - Beaver County Radio - January 27th, 2025 [January 27th, 2025]
- Why Rezolve Could Be the Next Big Name in Artificial Intelligence - MarketBeat - January 27th, 2025 [January 27th, 2025]
- Artificial Intelligence Market to Hit $3819.2 Billion By 2034, US Leading the Way in Artificial Intelligence - EIN News - January 27th, 2025 [January 27th, 2025]
- President Donald Trump Just Announced Project Stargate: 3 Unstoppable Stocks That Could Profit From the Artificial Intelligence (AI) Buildout - The... - January 26th, 2025 [January 26th, 2025]
- Tevogen Bio Broadens Relationship with Microsoft to Deepen Artificial Intelligence Collaboration and Develop PredicTcell Technology on Azure - Yahoo... - January 26th, 2025 [January 26th, 2025]
- This Artificial Intelligence (AI) Stock Is a Favorite of Billionaires. Here's Why. - The Motley Fool - January 26th, 2025 [January 26th, 2025]
- Beyond ChatGPT: WVU researchers to study use and ethics of artificial intelligence across disciplines - WVU Today - January 26th, 2025 [January 26th, 2025]
- Potential Changes in the Regulation of Artificial Intelligence in 2025 - The National Law Review - January 26th, 2025 [January 26th, 2025]
- This Artificial Intelligence (AI) Innovator Could Be Sitting on a $100 Billion Opportunity That Could Send Shares Soaring 67% - The Motley Fool - January 26th, 2025 [January 26th, 2025]
- International Day of Education 2025 - "Artificial Intelligence and Education: Challenges and Opportunities" - Welcome to the United Nations - January 26th, 2025 [January 26th, 2025]
- 2 Artificial Intelligence (AI) Stocks That Could Make You a Millionaire - The Motley Fool - January 26th, 2025 [January 26th, 2025]
- Some doctors increasingly using artificial intelligence to take notes during appointments - Yoursun.com - January 26th, 2025 [January 26th, 2025]
- This Artificial Intelligence (AI) Stock Has Jumped 30% Already in 2025. It Could Jump Another 32%, According to Wall Street. - The Motley Fool - January 26th, 2025 [January 26th, 2025]
- 3 Reasons Amazon Is 1 of the Best Artificial Intelligence (AI) Stocks to Buy Right Now - The Motley Fool - January 26th, 2025 [January 26th, 2025]
- UnDesto AI- bridging the gap on artificial intelligence - Civic Media - January 26th, 2025 [January 26th, 2025]
- Syngenta Group: Five Key Trends in Artificial Intelligence That Will Revolutionize Agriculture in 2025 - Business Wire - January 26th, 2025 [January 26th, 2025]
- Just Salad turns to artificial intelligence to help guests build their lunch - Restaurant Business Online - January 26th, 2025 [January 26th, 2025]
- Lots of cheap renewable energy required to power artificial intelligence server stacks - MSN - January 26th, 2025 [January 26th, 2025]
- What to Know About the New Trump Administration Executive Order on Artificial Intelligence - Council on Foreign Relations - January 26th, 2025 [January 26th, 2025]
- Nvidia Stock Investors Just Got Fantastic Artificial Intelligence (AI) News From President Trump - The Motley Fool - January 26th, 2025 [January 26th, 2025]
- Trump axes Biden's executive order on artificial intelligence, plans to invest billions - 13WHAM-TV - January 26th, 2025 [January 26th, 2025]
- Stargate artificial intelligence project to exclusively serve OpenAI - Financial Times - January 24th, 2025 [January 24th, 2025]
- Unlocking Early Colorectal Cancer Detection With Artificial Intelligence - AJMC.com Managed Markets Network - January 24th, 2025 [January 24th, 2025]
- Here come the bots: How Michigan schools are leaping into artificial intelligence - Detroit News - January 24th, 2025 [January 24th, 2025]
- 3 Artificial Intelligence (AI) Stocks That Could Go on a Multidecade Run - The Motley Fool - January 24th, 2025 [January 24th, 2025]
- UNESCO Highlights the Role of Artificial Intelligence in Education at Congreso Futuro 2025 - UNESCO - January 24th, 2025 [January 24th, 2025]
- Navigating deepfakes and synthetic media: This course helps students demystify artificial intelligence technologies - The Conversation - January 24th, 2025 [January 24th, 2025]
- How AI Agents Are Changing the Rules of the Game: The Future of Artificial Intelligence - Telefnica - January 24th, 2025 [January 24th, 2025]
- Here Are the 3 Cheapest Megacap Artificial Intelligence (AI) Stocks on the Market to Buy in 2025 - The Motley Fool - January 24th, 2025 [January 24th, 2025]
- Artificial Intelligence (AI) in Games Market to Grow by USD 27.47 Billion (2025-2029), Rising Adoption of AR and VR Games Fuels Growth, Report on AI... - January 24th, 2025 [January 24th, 2025]
- Artificial Intelligence (AI) Chips Market to Grow by USD 902.65 Billion (2025-2029), Focus on AI Chips for Smartphones Drives Growth, Report with AI... - January 24th, 2025 [January 24th, 2025]
- ByteDance in race with US rivals to drive artificial general intelligence - South China Morning Post - January 24th, 2025 [January 24th, 2025]
- Labor Faces Artificial Intelligence and Outsourcing: Appeasement or Class Struggle? - CounterPunch - January 24th, 2025 [January 24th, 2025]
- Artificial Intelligence Chip Market Projected to Grow at 38.2% CAGR, Reaching $383.7 Billion by 2032 - openPR - January 24th, 2025 [January 24th, 2025]
- Donald Trump signs executive order on developing artificial intelligence free from ideological bias - Hindustan Times - January 24th, 2025 [January 24th, 2025]
- UNESCO dedicates the International Day of Education 2025 to Artificial Intelligence - UNESCO - January 24th, 2025 [January 24th, 2025]
- 2 Artificial Intelligence (AI) Stocks I'd Love to Buy on the Next Dip - The Motley Fool - January 24th, 2025 [January 24th, 2025]
- Stargate artificial intelligence project to exclusively serve OpenAI, FT reports By Reuters - Investing.com - January 24th, 2025 [January 24th, 2025]
- OpenAI says its new artificial intelligence agent capable of tending to online tasks is trained to check with users when it encounters CAPTCHA puzzles... - January 24th, 2025 [January 24th, 2025]
- The Glitch Podcast Episode 2: Awards, Announcements and Artificial Intelligence - The Rock Online - January 24th, 2025 [January 24th, 2025]
- In The Age Of Artificial Intelligence, Polymaths Are Back In Vogue - Forbes - January 24th, 2025 [January 24th, 2025]
- Trump axes Biden's executive order on artificial intelligence, plans to invest billions - News3LV - January 24th, 2025 [January 24th, 2025]
- Prediction: This Artificial Intelligence (AI) Semiconductor Stock Is Going to Soar After Jan. 29 - The Motley Fool - January 24th, 2025 [January 24th, 2025]
- Artificial Intelligence In Magnetic Resonance Imaging (MRI) - openPR - January 24th, 2025 [January 24th, 2025]
- President Trump Touts Push To Make US the World Capital of Artificial Intelligence and Crypto - The Daily Hodl - January 24th, 2025 [January 24th, 2025]
- The Future Of Work In The Age Of Artificial Intelligence - Modern Ghana - January 24th, 2025 [January 24th, 2025]
- Artificial intelligence is transforming middle-class jobs. Can it also help the poor? - Brookings Institution - January 24th, 2025 [January 24th, 2025]
- These 2 Artificial Intelligence (AI) Stocks Are Outpacing Nvidia's, and They Can Still Soar Higher - The Motley Fool - January 24th, 2025 [January 24th, 2025]
- 1 No-Brainer Artificial Intelligence (AI) ETF to Buy With $50 During the S&P 500 Bull Market - The Motley Fool - January 24th, 2025 [January 24th, 2025]
- Report uncovers disturbing secret tech companies are keeping about artificial intelligence: 'The trend is worrying' - Yahoo! Voices - January 24th, 2025 [January 24th, 2025]
- What is an AI agent? A computer scientist explains the next wave of artificial intelligence tools - The Conversation Indonesia - December 18th, 2024 [December 18th, 2024]
- Artificial Intelligence (AI) in Construction Global Report 2024: Market to Reach $12.1 Billion by 2030 - How BIM (Building Information Modeling)... - December 18th, 2024 [December 18th, 2024]
- Artificial Intelligence Could Improve Critical Infrastructure Services, But It Comes with Risks - Government Accountability Office - December 18th, 2024 [December 18th, 2024]
- Windfall Launches New Nonprofit SaaS & Artificial Intelligence (AI) Application to Drive High-Performance Fundraising Teams - PR Web - December 18th, 2024 [December 18th, 2024]
- Artificial Intelligence Doesnt Appear Ready to Take Over the World Yet - ETF Trends - December 18th, 2024 [December 18th, 2024]
- 2 Spectacular Artificial Intelligence (AI) Stocks Primed to Beat the S&P 500 in 2025 - The Motley Fool - December 18th, 2024 [December 18th, 2024]
- Meet the 3 Artificial Intelligence (AI) Stocks Dan Ives Says Will Become The First Members of the $4 Trillion Club in 2025 - The Motley Fool - December 18th, 2024 [December 18th, 2024]
- MuleSofts David Egts on Artificial Intelligence Agents - ExecutiveBiz - December 18th, 2024 [December 18th, 2024]
- 61 Percent of Government Employees Haven't Used Artificial Intelligence at Work, New Eagle Hill Consulting Research Finds - PR Newswire - December 18th, 2024 [December 18th, 2024]
- AI Overload: Are We Spending Wisely on Artificial Intelligence? - ExchangeWire - December 18th, 2024 [December 18th, 2024]
- AI is not artificial intelligence, its human intelligence - The Independent - December 18th, 2024 [December 18th, 2024]
- High-performance computing and artificial intelligence: the partnership between Leonardo and Cineca - Leonardo - December 18th, 2024 [December 18th, 2024]
- Cognizant First to Achieve ISO/IEC 42001:2023 Accredited Certification for Artificial Intelligence Management Systems - Cognizant news - December 18th, 2024 [December 18th, 2024]
- The interaction between humans and artificial intelligence demands a new field of study, Northeastern researchers say - Northeastern University - December 18th, 2024 [December 18th, 2024]