Reinforcement learning for the real world – TechTalks
This article is part of ourreviews of AI research papers, a series of posts that explore the latest findings in artificial intelligence.
Labor- and data-efficiency remain two of the key challenges of artificial intelligence. In recent decades, researchers have proven that big data and machine learning algorithms reduce the need for providing AI systems with prior rules and knowledge. But machine learningand more recently deep learninghave presented their own challenges, which require manual labor albeit of different nature.
Creating AI systems that can genuinely learn on their own with minimal human guidance remain a holy grail and a great challenge. According to Sergey Levine, assistant professor at the University of California, Berkeley, a promising direction of research for the AI community is self-supervised offline reinforcement learning.
This is a variation of the RL paradigm that is very close to how humans and animals learn to reuse previously acquired data and skills, and it can be a great boon for applying AI to real-world settings. In a paper titled Understanding the World Through Action and a talk at the NeurIPS 2021 conference, Levine explained how self-supervised learning objectives and offline RL can help create generalized AI systems that can be applied to various tasks.
One common argument in favor of machine learning algorithms is their ability to scale with the availability of data and compute resources. Decades of work on developing symbolic AI systems have produced limited results. These systems require human experts and engineers to manually provide the rules and knowledge that define the behavior of the AI system.
The problem is that in some applications, the rules can be virtually limitless, while in others, they cant be explicitly defined.
In contrast, machine learning models can derive their behavior from data, without the need for explicit rules and prior knowledge. Another advantage of machine learning is that it can glean its own solutions from its training data, which are often more accurate than knowledge engineered by humans.
But machine learning faces its own challenges. Most ML applications are based on supervised learning and require training data to be manually labeled by human annotators. Data annotation poses severe limits to the scaling of ML models.
More recently, researchers have been exploring unsupervised and self-supervised learning, ML paradigms that obviate the need for manual labels. These approaches have helped overcome the limits of machine learning in some applications such as language modeling and medical imaging. But theyre still faced with challenges that prevent their use in more general settings.
Current methods for learning without human labels still require considerable human insight (which is often domain-specific!) to engineer self-supervised learning objectives that allow large models to acquire meaningful knowledge from unlabeled datasets, Levine writes.
Levine writes that the next objective should be to create AI systems that dont require manual labeling or the manual design of self-supervised objectives. These models should be able to distill a deep and meaningful understanding of the world and can perform downstream tasks with robustness generalization, and even a degree of common sense.
Reinforcement learning is inspired by intelligent behavior in animals and humans. Reinforcement learning pioneer Richard Sutton describes RL as the first computational theory of intelligence. An RL agent develops its behavior by interacting with its environment, weighing the punishments and rewards of its actions, and developing policies that maximize rewards.
RL, and more recently deep RL, have proven to be particularly efficient at solving complicated problems such as playing games and training robots. And theres reason to believe reinforcement learning can overcome the limits of current ML systems.
But before it does, RL must overcome its own set of challenges that limit its use in real-world settings.
We could think of modern RL research as consisting of three threads: (1) getting good results in simulated benchmarks (e.g., video games); (2) using simulation+ transfer; (3) running RL in the real world, Levine told TechTalks. I believe that ultimately (3) is the most importantthing, because thats the most promising approach to solve problems that we cant solve today.
Games are simple environments. Board games such as chess and go are closed worlds with deterministic environments. Even games such as StarCraft and Dota, which are played in real-time and have near unlimited states, are much simpler than the real world. Their rules dont change. This is partly why game-playing AI systems have found very few applications in the real world.
On the other hand, physics simulators have seen tremendous advances in recent years. One of the popular methods in fields such as robotics and self-driving cars has been to train reinforcement learning models in simulated environments and then finetune the models with real-world experience. But as Levine explained, this approach is limited too because the domains where we most need learningthe ones where humans far outperform machinesare also the ones that are hardest to simulate.
This approach is only effective at addressing tasks that can be simulated, which is bottlenecked by our ability to create lifelike simulated analogues of the real world and to anticipate all the possible situations that an agent might encounter in reality, Levine said.
One of the biggest challenges we encounter when we try to do real-world RL is generalization, Levine said.
For example, in 2016, Levine was part of a team that constructed an arm farm at Google with 14 robots all learning concurrently from their shared experience. They collected more than half a million grasp attempts, and it was possible to learn effective grasping policies in this way.
But we cant repeat this process for every single task we want robots to learn with RL, he says. Therefore, we need more general-purpose approaches, where a single ever-growing dataset is used as the basis for a general understanding of the world on which more specific skills can be built.
In his paper, Levine points to two key obstacles in reinforcement learning. First, RL systems require manually defined reward functions or goals before they can learn the behaviors that help accomplish those goals. And second, reinforcement learning requires online experience and is not data-driven, which makes it hard to train them on large datasets. Most recent accomplishments in RL have relied on engineers at very wealthy tech companies using massive compute resources to generate immense experiences instead of reusing available data.
Therefore, RL systems need solutions that can learn from past experience and repurpose their learnings in more generalized ways. Moreover, they should be able to handle the continuity of the real world. Unlike simulated environments, you cant reset the real world and start everything from scratch. You need learning systems that can quickly adapt to the constant and unpredictable changes to their environment.
In his NeurIPS talk, Levine compares real-world RL to the story of Robinson Crusoe, the story of a man who is stranded on an island and learns to deal with unknown situations through inventiveness and creativity, using his knowledge of the world and continued exploration in his new habitat.
RL systems in the real world have to deal with a lifelong learning problem, evaluate objectives and performance based entirely on realistic sensing without access to privileged information, and must deal with real-world constraints, including safety, Levine said. These are all things that are typically abstracted away in widely used RL benchmark tasks and video game environments.
However, RL does work in more practical real-world settings, Levine says. For example, in 2018, he and his colleagues an RL-based robotic grasping system attain state-of-the-art results with raw sensory perception. In contrast to static learning behaviors that choose a grasp point and then execute the desired grasp, in their method, the robot continuously updated its grasp strategy based on the most recent observations to optimize long-horizon grasp success.
To my knowledge this is still the best existing system for grasping from monocular RGB images, Levine said. But this sort of thing requires algorithms that are somewhat different from those that perform best in simulated video game settings: it requires algorithms that are adept at utilizing and reusing previously collected data, algorithms that can train large models that generalize, and algorithms that can support large-scale real-world data collection.
Levines reinforcement learning solution includes two key components: unsupervised/self-supervised learning and offline learning.
In his paper, Levine describes self-supervised reinforcement learning as a system that can learn behaviors that control the world in meaningful ways and provides some mechanism to learn to control [the world] in as many ways as possible.
Basically, this means that instead of being optimized for a single goal, the RL agent should be able to achieve many different goals by computing counterfactuals, learning causal models, and obtaining a deep understanding of how actions affect its environment in the long term.
However, creating self-supervised RL models that can solve various goals would still require a massive amount of experience. To address this challenge, Levine proposes offline reinforcement learning, which makes it possible for models to continue learning from previously collected data without the need for continued online experience.
Offline RL can make it possible to apply self-supervised or unsupervised RL methods even in settings where online collection is infeasible, and such methods can serve as one of the most powerful tools for incorporating large and diverse datasets into self-supervised RL, he writes.
The combination of self-supervised and offline RL can help create agents that can create building blocks for learning new tasks and continue learning with little need for new data.
This is very similar to how we learn in the real world. For example, when you want to learn basketball, you use basic skills you learned in the past such as walking, running, jumping, handling objects, etc. You use these capabilities to develop new skills such as dribbling, crossovers, jump shots, free throws, layups, straight and bounce passes, eurosteps, dunks (if youre tall enough), etc. These skills build on each other and help you reach the bigger goal, which is to outscore your opponent. At the same time, you can learn from offline data by reflecting on your past experience and thinking about counterfactuals (e.g., what would have happened if you passed to an open teammate instead of taking a contested shot). You can also learn by processing other data such as videos of yourself and your opponents. In fact, on-court experience is just part of your continuous learning.
Ina paper, Yevgen Chetobar, one of Levines colleagues, shows how self-supervised offline RL can learn policies for fairly general robotic manipulation skills, directly reusing data that they had collected for another project.
This system was able to reach a variety of user-specified goals, and also act as a general-purpose pretraining procedure (a kind of BERT for robotics) for other kinds of tasks specified with conventional reward functions, Levine said.
One of the great benefits of offline and self-supervised RL is learning from real-world data instead of simulated environments.
Basically, it comes down to this question: is it easier to create a brain, or is it easier to create the universe? I think its easier to create a brain, because it is part of the universe, he said.
This is, in fact, one of the great challenges engineers face when creating simulated environments. For example, Levine says, effective simulation for autonomous driving requires simulating other drivers, which requires having an autonomous driving system, which requires simulating other drivers, which requires having an autonomous driving system, etc.
Ultimately, learning from real data will be more effective because it will simply be much easier and more scalable, just as weve seen in supervised learning domains in computer vision and NLP, where no one worries about using simulation, he said. My perspective is that we should figure out how to do RL in a scalable and general-purpose way using real data, and this will spare us from having to expend inordinate amounts of effort building simulators.
See the article here:
Reinforcement learning for the real world - TechTalks
- 3D Shape Tokenization - Apple Machine Learning Research - January 9th, 2025 [January 9th, 2025]
- Machine Learning Used To Create Scalable Solution for Single-Cell Analysis - Technology Networks - January 9th, 2025 [January 9th, 2025]
- Robotics: machine learning paves the way for intuitive robots - Hello Future - January 9th, 2025 [January 9th, 2025]
- Machine learning-based estimation of crude oil-nitrogen interfacial tension - Nature.com - January 9th, 2025 [January 9th, 2025]
- Machine learning Nomogram for Predicting endometrial lesions after tamoxifen therapy in breast Cancer patients - Nature.com - January 9th, 2025 [January 9th, 2025]
- Staying ahead of the automation, AI and machine learning curve - Creamer Media's Engineering News - January 9th, 2025 [January 9th, 2025]
- Machine Learning and Quantum Computing Predict Which Antibiotic To Prescribe for UTIs - Consult QD - January 9th, 2025 [January 9th, 2025]
- Machine Learning, Innovation, And The Future Of AI: A Conversation With Manoj Bhoyar - International Business Times UK - January 9th, 2025 [January 9th, 2025]
- AMD's FSR 4 will use machine learning but requires an RDNA 4 GPU, promises 'a dramatic improvement in terms of performance and quality' - PC Gamer - January 9th, 2025 [January 9th, 2025]
- Explainable artificial intelligence with UNet based segmentation and Bayesian machine learning for classification of brain tumors using MRI images -... - January 9th, 2025 [January 9th, 2025]
- Understanding the Fundamentals of AI and Machine Learning - Nairobi Wire - January 9th, 2025 [January 9th, 2025]
- Machine learning can help blood tests have a separate normal for each patient - The Hindu - January 1st, 2025 [January 1st, 2025]
- Artificial Intelligence and Machine Learning Programs Introduced this Spring - The Flash Today - January 1st, 2025 [January 1st, 2025]
- Virtual reality-assisted prediction of adult ADHD based on eye tracking, EEG, actigraphy and behavioral indices: a machine learning analysis of... - January 1st, 2025 [January 1st, 2025]
- Open source machine learning systems are highly vulnerable to security threats - TechRadar - December 22nd, 2024 [December 22nd, 2024]
- After the PS5 Pro's less dramatic changes, PlayStation architect Mark Cerny says the next-gen will focus more on CPUs, memory, and machine-learning -... - December 22nd, 2024 [December 22nd, 2024]
- Accelerating LLM Inference on NVIDIA GPUs with ReDrafter - Apple Machine Learning Research - December 22nd, 2024 [December 22nd, 2024]
- Machine learning for the prediction of mortality in patients with sepsis-associated acute kidney injury: a systematic review and meta-analysis - BMC... - December 22nd, 2024 [December 22nd, 2024]
- Machine learning uncovers three osteosarcoma subtypes for targeted treatment - Medical Xpress - December 22nd, 2024 [December 22nd, 2024]
- From Miniatures to Machine Learning: Crafting the VFX of Alien: Romulus - Animation World Network - December 22nd, 2024 [December 22nd, 2024]
- Identification of hub genes, diagnostic model, and immune infiltration in preeclampsia by integrated bioinformatics analysis and machine learning -... - December 22nd, 2024 [December 22nd, 2024]
- This AI Paper from Microsoft and Novartis Introduces Chimera: A Machine Learning Framework for Accurate and Scalable Retrosynthesis Prediction -... - December 18th, 2024 [December 18th, 2024]
- Benefits and Challenges of Integrating AI and Machine Learning into EHR Systems - Healthcare IT Today - December 18th, 2024 [December 18th, 2024]
- The History Of AI: How Machine Learning's Evolution Is Reshaping Everything Around Us - SlashGear - December 18th, 2024 [December 18th, 2024]
- AI and Machine Learning to Enhance Pension Plan Governance and the Investor Experience: New CFA Institute Research - Fintech Finance - December 18th, 2024 [December 18th, 2024]
- Address Common Machine Learning Challenges With Managed MLflow - The New Stack - December 18th, 2024 [December 18th, 2024]
- Machine Learning Used To Classify Fossils Of Extinct Pollen - Offworld Astrobiology Applications? - Astrobiology News - December 18th, 2024 [December 18th, 2024]
- Machine learning model predicts CDK4/6 inhibitor effectiveness in metastatic breast cancer - News-Medical.Net - December 18th, 2024 [December 18th, 2024]
- New Lockheed Martin Subsidiary to Offer Machine Learning Tools to Defense Customers - ExecutiveBiz - December 18th, 2024 [December 18th, 2024]
- How Powerful Will AI and Machine Learning Become? - International Policy Digest - December 18th, 2024 [December 18th, 2024]
- ChatGPT-Assisted Machine Learning for Chronic Disease Classification and Prediction: A Developmental and Validation Study - Cureus - December 18th, 2024 [December 18th, 2024]
- Blood Tests Are Far From Perfect But Machine Learning Could Change That - Inverse - December 18th, 2024 [December 18th, 2024]
- Amazons AGI boss: You dont need a PhD in machine learning to build with AI anymore - Fortune - December 18th, 2024 [December 18th, 2024]
- From Novice to Pro: A Roadmap for Your Machine Learning Career - KDnuggets - December 10th, 2024 [December 10th, 2024]
- Dimension nabs $500M second fund for 'still contrary' intersection of bio and machine learning - Endpoints News - December 10th, 2024 [December 10th, 2024]
- Using Machine Learning to Make A Really Big Detailed Simulation - Astrobites - December 10th, 2024 [December 10th, 2024]
- Driving Business Growth with GreenTomatos Data and Machine Learning Strategy on Generative AI - AWS Blog - December 10th, 2024 [December 10th, 2024]
- Unlocking the power of data analytics and machine learning to drive business performance - WTW - December 10th, 2024 [December 10th, 2024]
- AI and the Ethics of Machine Learning | by Abwahabanjum | Dec, 2024 - Medium - December 10th, 2024 [December 10th, 2024]
- Differentiating Cystic Lesions in the Sellar Region of the Brain Using Artificial Intelligence and Machine Learning for Early Diagnosis: A Prospective... - December 10th, 2024 [December 10th, 2024]
- New Amazon SageMaker AI Innovations Reimagine How Customers Build and Scale Generative AI and Machine Learning Models - Amazon Press Release - December 10th, 2024 [December 10th, 2024]
- What is Machine Learning? 18 Crucial Concepts in AI, ML, and LLMs - Netguru - December 5th, 2024 [December 5th, 2024]
- Machine learning-based prediction of antibiotic resistance in Mycobacterium tuberculosis clinical isolates from Uganda - BMC Infectious Diseases - December 5th, 2024 [December 5th, 2024]
- Interdisciplinary Team Needed to Apply Machine Learning in Epilepsy Surgery: Lara Jehi, MD, MHCDS - Neurology Live - December 5th, 2024 [December 5th, 2024]
- A multimodal machine learning model for the stratification of breast cancer risk - Nature.com - December 5th, 2024 [December 5th, 2024]
- Machine learning based intrusion detection framework for detecting security attacks in internet of things - Nature.com - December 5th, 2024 [December 5th, 2024]
- Machine learning evaluation of a hypertension screening program in a university workforce over five years - Nature.com - December 5th, 2024 [December 5th, 2024]
- Vaultree Introduces VENum Stack: Combining the Power of Machine Learning and Encrypted Data Processing for Secure Innovation - PR Newswire - December 5th, 2024 [December 5th, 2024]
- Direct simulation and machine learning structure identification unravel soft martensitic transformation and twinning dynamics - pnas.org - December 5th, 2024 [December 5th, 2024]
- AI and Machine Learning - Maryland to use AI technology to manage traffic flow - SmartCitiesWorld - December 5th, 2024 [December 5th, 2024]
- Researchers make machine learning breakthrough in lithium-ion tech here's how it could make aging batteries safer - Yahoo! Voices - December 5th, 2024 [December 5th, 2024]
- Integrating IoT and machine learning: Benefits and use cases - TechTarget - December 5th, 2024 [December 5th, 2024]
- Landsat asks industry for artificial intelligence (AI) and machine learning for satellite operations - Military & Aerospace Electronics - December 5th, 2024 [December 5th, 2024]
- Machine learning optimized efficient graphene-based ultra-broadband solar absorber for solar thermal applications - Nature.com - December 5th, 2024 [December 5th, 2024]
- Polymathic AI Releases The Well: 15TB of Machine Learning Datasets Containing Numerical Simulations of a Wide Variety of Spatiotemporal Physical... - December 5th, 2024 [December 5th, 2024]
- Prediction of preterm birth using machine learning: a comprehensive analysis based on large-scale preschool children survey data in Shenzhen of China... - December 5th, 2024 [December 5th, 2024]
- Application of machine learning algorithms to identify serological predictors of COVID-19 severity and outcomes - Nature.com - November 30th, 2024 [November 30th, 2024]
- Predicting the time to get back to work using statistical models and machine learning approaches - BMC Medical Research Methodology - November 30th, 2024 [November 30th, 2024]
- AI and Machine Learning - US releases recommendations for use of AI in critical infrastructure - SmartCitiesWorld - November 30th, 2024 [November 30th, 2024]
- Machine learning-based diagnostic model for stroke in non-neurological intensive care unit patients with acute neurological manifestations -... - November 28th, 2024 [November 28th, 2024]
- Analysis of four long non-coding RNAs for hepatocellular carcinoma screening and prognosis by the aid of machine learning techniques - Nature.com - November 28th, 2024 [November 28th, 2024]
- Evaluation and prediction of the physical properties and quality of Jatob-do-Cerrado seeds processed and stored in different conditions using machine... - November 28th, 2024 [November 28th, 2024]
- Researchers use fitness tracker data and machine learning to detect bipolar disorder mood swings - Medical Xpress - November 28th, 2024 [November 28th, 2024]
- Advances in AI and Machine Learning for Nuclear Applications - Frontiers - November 28th, 2024 [November 28th, 2024]
- Researchers make machine learning breakthrough in lithium-ion tech here's how it could make aging batteries safer - The Cool Down - November 28th, 2024 [November 28th, 2024]
- Svitla Systems Publishes Results of the Study on Machine Learning's Role in Credit Scoring - Newsfile - November 28th, 2024 [November 28th, 2024]
- Predicting poor performance on cognitive tests among older adults using wearable device data and machine learning: a feasibility study - Nature.com - November 28th, 2024 [November 28th, 2024]
- Quantum Machine Learning: Bridging the Future of AI and Quantum Computing - TechBullion - November 28th, 2024 [November 28th, 2024]
- AI and machine learning trends in healthcare - Healthcare Leader - November 28th, 2024 [November 28th, 2024]
- Identification of biomarkers for the diagnosis in colorectal polyps and metabolic dysfunction-associated steatohepatitis (MASH) by bioinformatics... - November 28th, 2024 [November 28th, 2024]
- Revolutionizing Business Systems with Machine Learning: Practical Innovations for the Modern Era - TechBullion - November 28th, 2024 [November 28th, 2024]
- Can AI improve plant-based meats? Using mechanical testing and machine learning to mimic the sensory experience - Phys.org - November 16th, 2024 [November 16th, 2024]
- Machine Learning Reveals Impact of Microbial Load on Gut Health and Disease - Genetic Engineering & Biotechnology News - November 16th, 2024 [November 16th, 2024]
- Machine learning for predicting in-hospital mortality in elderly patients with heart failure combined with hypertension: a multicenter retrospective... - November 16th, 2024 [November 16th, 2024]
- Apple Researchers Propose Cut Cross-Entropy (CCE): A Machine Learning Method that Computes the Cross-Entropy Loss without Materializing the Logits for... - November 16th, 2024 [November 16th, 2024]
- Exploring electron-beam induced modifications of materials with machine-learning assisted high temporal resolution electron microscopy - Nature.com - November 16th, 2024 [November 16th, 2024]
- Facilitated the discovery of new / Co-based superalloys by combining first-principles and machine learning - Nature.com - November 16th, 2024 [November 16th, 2024]
- Thwarting Phishing Attacks with Predictive Analytics and Machine Learning in 2024 - Petri.com - November 16th, 2024 [November 16th, 2024]
- Optoelectronic performance prediction of HgCdTe homojunction photodetector in long wave infrared spectral region using traditional simulations and... - November 16th, 2024 [November 16th, 2024]
- A new approach for sex prediction by evaluating mandibular arch and canine dimensions with machine-learning classifiers and intraoral scanners (a... - November 16th, 2024 [November 16th, 2024]