Is fake data the real deal when training algorithms? – The Guardian
Youre at the wheel of your car but youre exhausted. Your shoulders start to sag, your neck begins to droop, your eyelids slide down. As your head pitches forward, you swerve off the road and speed through a field, crashing into a tree.
But what if your cars monitoring system recognised the tell-tale signs of drowsiness and prompted you to pull off the road and park instead? The European Commission has legislated that from this year, new vehicles be fitted with systems to catch distracted and sleepy drivers to help avert accidents. Now a number of startups are training artificial intelligence systems to recognise the giveaways in our facial expressions and body language.
These companies are taking a novel approach for the field of AI. Instead of filming thousands of real-life drivers falling asleep and feeding that information into a deep-learning model to learn the signs of drowsiness, theyre creating millions of fake human avatars to re-enact the sleepy signals.
Big data defines the field of AI for a reason. To train deep learning algorithms accurately, the models need to have a multitude of data points. That creates problems for a task such as recognising a person falling asleep at the wheel, which would be difficult and time-consuming to film happening in thousands of cars. Instead, companies have begun building virtual datasets.
Synthesis AI and Datagen are two companies using full-body 3D scans, including detailed face scans, and motion data captured by sensors placed all over the body, to gather raw data from real people. This data is fed through algorithms that tweak various dimensions many times over to create millions of 3D representations of humans, resembling characters in a video game, engaging in different behaviours across a variety of simulations.
In the case of someone falling asleep at the wheel, they might film a human performer falling asleep and combine it with motion capture, 3D animations and other techniques used to create video games and animated movies, to build the desired simulation. You can map [the target behaviour] across thousands of different body types, different angles, different lighting, and add variability into the movement as well, says Yashar Behzadi, CEO of Synthesis AI.
Using synthetic data cuts out a lot of the messiness of the more traditional way to train deep learning algorithms. Typically, companies would have to amass a vast collection of real-life footage and low-paid workers would painstakingly label each of the clips. These would be fed into the model, which would learn how to recognise the behaviours.
The big sell for the synthetic data approach is that its quicker and cheaper by a wide margin. But these companies also claim it can help tackle the bias that creates a huge headache for AI developers. Its well documented that some AI facial recognition software is poor at recognising and correctly identifying particular demographic groups. This tends to be because these groups are underrepresented in the training data, meaning the software is more likely to misidentify these people.
Niharika Jain, a software engineer and expert in gender and racial bias in generative machine learning, highlights the notorious example of Nikon Coolpixs blink detection feature, which, because the training data included a majority of white faces, disproportionately judged Asian faces to be blinking. A good driver-monitoring system must avoid misidentifying members of a certain demographic as asleep more often than others, she says.
The typical response to this problem is to gather more data from the underrepresented groups in real-life settings. But companies such as Datagen say this is no longer necessary. The company can simply create more faces from the underrepresented groups, meaning theyll make up a bigger proportion of the final dataset. Real 3D face scan data from thousands of people is whipped up into millions of AI composites. Theres no bias baked into the data; you have full control of the age, gender and ethnicity of the people that youre generating, says Gil Elbaz, co-founder of Datagen. The creepy faces that emerge dont look like real people, but the company claims that theyre similar enough to teach AI systems how to respond to real people in similar scenarios.
There is, however, some debate over whether synthetic data can really eliminate bias. Bernease Herman, a data scientist at the University of Washington eScience Institute, says that although synthetic data can improve the robustness of facial recognition models on underrepresented groups, she does not believe that synthetic data alone can close the gap between the performance on those groups and others. Although the companies sometimes publish academic papers showcasing how their algorithms work, the algorithms themselves are proprietary, so researchers cannot independently evaluate them.
In areas such as virtual reality, as well as robotics, where 3D mapping is important, synthetic data companies argue it could actually be preferable to train AI on simulations, especially as 3D modelling, visual effects and gaming technologies improve. Its only a matter of time until you can create these virtual worlds and train your systems completely in a simulation, says Behzadi.
This kind of thinking is gaining ground in the autonomous vehicle industry, where synthetic data is becoming instrumental in teaching self-driving vehicles AI how to navigate the road. The traditional approach filming hours of driving footage and feeding this into a deep learning model was enough to get cars relatively good at navigating roads. But the issue vexing the industry is how to get cars to reliably handle what are known as edge cases events that are rare enough that they dont appear much in millions of hours of training data. For example, a child or dog running into the road, complicated roadworks or even some traffic cones placed in an unexpected position, which was enough to stump a driverless Waymo vehicle in Arizona in 2021.
With synthetic data, companies can create endless variations of scenarios in virtual worlds that rarely happen in the real world. Instead of waiting millions more miles to accumulate more examples, they can artificially generate as many examples as they need of the edge case for training and testing, says Phil Koopman, associate professor in electrical and computer engineering at Carnegie Mellon University.
AV companies such as Waymo, Cruise and Wayve are increasingly relying on real-life data combined with simulated driving in virtual worlds. Waymo has created a simulated world using AI and sensor data collected from its self-driving vehicles, complete with artificial raindrops and solar glare. It uses this to train vehicles on normal driving situations, as well as the trickier edge cases. In 2021, Waymo told the Verge that it had simulated 15bn miles of driving, versus a mere 20m miles of real driving.
An added benefit to testing autonomous vehicles out in virtual worlds first is minimising the chance of very real accidents. A large reason self-driving is at the forefront of a lot of the synthetic data stuff is fault tolerance, says Herman. A self-driving car making a mistake 1% of the time, or even 0.01% of the time, is probably too much.
In 2017, Volvos self-driving technology, which had been taught how to respond to large North American animals such as deer, was baffled when encountering kangaroos for the first time in Australia. If a simulator doesnt know about kangaroos, no amount of simulation will create one until it is seen in testing and designers figure out how to add it, says Koopman. For Aaron Roth, professor of computer and cognitive science at the University of Pennsylvania, the challenge will be to create synthetic data that is indistinguishable from real data. He thinks it is plausible that were at that point for face data, as computers can now generate photorealistic images of faces. But for a lot of other things, which may or may not include kangaroos I dont think that were there yet.
Excerpt from:
Is fake data the real deal when training algorithms? - The Guardian
- How machine learning and AI can be harnessed for mission-based lending - ImpactAlpha - January 27th, 2025 [January 27th, 2025]
- Machine learning meta-analysis identifies individual characteristics moderating cognitive intervention efficacy for anxiety and depression symptoms -... - January 27th, 2025 [January 27th, 2025]
- Using robotics to introduce AI and machine learning concepts into the elementary classroom - George Mason University - January 27th, 2025 [January 27th, 2025]
- Machine learning to identify environmental drivers of phytoplankton blooms in the Southern Baltic Sea - Nature.com - January 27th, 2025 [January 27th, 2025]
- Why Most Machine Learning Projects Fail to Reach Production and How to Beat the Odds - InfoQ.com - January 27th, 2025 [January 27th, 2025]
- Exploring the intersection of AI and climate physics: Machine learning's role in advancing climate science - Phys.org - January 27th, 2025 [January 27th, 2025]
- 5 Questions with Jonah Berger: Using Artificial Intelligence and Machine Learning in Litigation - Cornerstone Research - January 27th, 2025 [January 27th, 2025]
- Modernizing Patient Support: Harnessing Advanced Automation, Artificial Intelligence and Machine Learning to Improve Efficiency and Performance of... - January 27th, 2025 [January 27th, 2025]
- Param Popat Leads the Way in Transforming Machine Learning Systems - Tech Times - January 27th, 2025 [January 27th, 2025]
- Research on noise-induced hearing loss based on functional and structural MRI using machine learning methods - Nature.com - January 27th, 2025 [January 27th, 2025]
- Machine learning is bringing back an infamous pseudoscience used to fuel racism - ZME Science - January 27th, 2025 [January 27th, 2025]
- How AI and Machine Learning are Redefining Customer Experience Management - Customer Think - January 27th, 2025 [January 27th, 2025]
- Machine Learning Data Catalog Software Market Strategic Insights and Key Innovations: Leading Companies and... - WhaTech - January 27th, 2025 [January 27th, 2025]
- How AI and Machine Learning Will Influence Fintech Frontend Development in 2025 - Benzinga - January 27th, 2025 [January 27th, 2025]
- The Nvidia AI interview: Inside DLSS 4 and machine learning with Bryan Catanzaro - Eurogamer - January 22nd, 2025 [January 22nd, 2025]
- The wide use of machine learning VFX techniques on Here - befores & afters - January 22nd, 2025 [January 22nd, 2025]
- .NET Core: Pioneering the Future of AI and Machine Learning - TechBullion - January 22nd, 2025 [January 22nd, 2025]
- Development and validation of a machine learning-based prediction model for hepatorenal syndrome in liver cirrhosis patients using MIMIC-IV and eICU... - January 22nd, 2025 [January 22nd, 2025]
- A comparative study on different machine learning approaches with periodic items for the forecasting of GPS satellites clock bias - Nature.com - January 22nd, 2025 [January 22nd, 2025]
- Machine learning based prediction models for the prognosis of COVID-19 patients with DKA - Nature.com - January 22nd, 2025 [January 22nd, 2025]
- A scoping review of robustness concepts for machine learning in healthcare - Nature.com - January 22nd, 2025 [January 22nd, 2025]
- How AI and machine learning led to mind blowing progress in understanding animal communication - WHYY - January 22nd, 2025 [January 22nd, 2025]
- 3 Predictions For Predictive AI In 2025 - The Machine Learning Times - January 22nd, 2025 [January 22nd, 2025]
- AI and Machine Learning - WEF report offers practical steps for inclusive AI adoption - SmartCitiesWorld - January 22nd, 2025 [January 22nd, 2025]
- Learnings from a Machine Learning Engineer Part 3: The Evaluation | by David Martin | Jan, 2025 - Towards Data Science - January 22nd, 2025 [January 22nd, 2025]
- Google AI Research Introduces Titans: A New Machine Learning Architecture with Attention and a Meta in-Context Memory that Learns How to Memorize at... - January 22nd, 2025 [January 22nd, 2025]
- Improving BrainMachine Interfaces with Machine Learning ... - eeNews Europe - January 22nd, 2025 [January 22nd, 2025]
- Powered by machine learning, a new blood test can enable early detection of multiple cancers - Medical Xpress - January 15th, 2025 [January 15th, 2025]
- Mapping the Edges of Mass Spectral Prediction: Evaluation of Machine Learning EIMS Prediction for Xeno Amino Acids - Astrobiology News - January 15th, 2025 [January 15th, 2025]
- Development of an interpretable machine learning model based on CT radiomics for the prediction of post acute pancreatitis diabetes mellitus -... - January 15th, 2025 [January 15th, 2025]
- Understanding the spread of agriculture in the Western Mediterranean (6th-3rd millennia BC) with Machine Learning tools - Nature.com - January 15th, 2025 [January 15th, 2025]
- "From 'Food Rules' to Food Reality: Machine Learning Unveils the Ultra-Processed Truth in Our Grocery Carts" - American Council on Science... - January 15th, 2025 [January 15th, 2025]
- AI and Machine Learning in Business Market is Predicted to Reach $190.5 Billion at a CAGR of 32% by 2032 - EIN News - January 15th, 2025 [January 15th, 2025]
- QT Imaging Holdings Introduces Machine Learning-Enabled Image Interpolation Algorithm to Substantially Reduce Scan Time - Business Wire - January 15th, 2025 [January 15th, 2025]
- Global Tiny Machine Learning (TinyML) Market to Reach USD 3.4 Billion by 2030 - Key Drivers and Opportunities | Valuates Reports - PR Newswire UK - January 15th, 2025 [January 15th, 2025]
- Machine learning in mental health getting better all the time - Nature.com - January 15th, 2025 [January 15th, 2025]
- Signature-based intrusion detection using machine learning and deep learning approaches empowered with fuzzy clustering - Nature.com - January 15th, 2025 [January 15th, 2025]
- Machine learning and multi-omics in precision medicine for ME/CFS - Journal of Translational Medicine - January 15th, 2025 [January 15th, 2025]
- Exploring the influence of age on the causes of death in advanced nasopharyngeal carcinoma patients undergoing chemoradiotherapy using machine... - January 15th, 2025 [January 15th, 2025]
- 3D Shape Tokenization - Apple Machine Learning Research - January 9th, 2025 [January 9th, 2025]
- Machine Learning Used To Create Scalable Solution for Single-Cell Analysis - Technology Networks - January 9th, 2025 [January 9th, 2025]
- Robotics: machine learning paves the way for intuitive robots - Hello Future - January 9th, 2025 [January 9th, 2025]
- Machine learning-based estimation of crude oil-nitrogen interfacial tension - Nature.com - January 9th, 2025 [January 9th, 2025]
- Machine learning Nomogram for Predicting endometrial lesions after tamoxifen therapy in breast Cancer patients - Nature.com - January 9th, 2025 [January 9th, 2025]
- Staying ahead of the automation, AI and machine learning curve - Creamer Media's Engineering News - January 9th, 2025 [January 9th, 2025]
- Machine Learning and Quantum Computing Predict Which Antibiotic To Prescribe for UTIs - Consult QD - January 9th, 2025 [January 9th, 2025]
- Machine Learning, Innovation, And The Future Of AI: A Conversation With Manoj Bhoyar - International Business Times UK - January 9th, 2025 [January 9th, 2025]
- AMD's FSR 4 will use machine learning but requires an RDNA 4 GPU, promises 'a dramatic improvement in terms of performance and quality' - PC Gamer - January 9th, 2025 [January 9th, 2025]
- Explainable artificial intelligence with UNet based segmentation and Bayesian machine learning for classification of brain tumors using MRI images -... - January 9th, 2025 [January 9th, 2025]
- Understanding the Fundamentals of AI and Machine Learning - Nairobi Wire - January 9th, 2025 [January 9th, 2025]
- Machine learning can help blood tests have a separate normal for each patient - The Hindu - January 1st, 2025 [January 1st, 2025]
- Artificial Intelligence and Machine Learning Programs Introduced this Spring - The Flash Today - January 1st, 2025 [January 1st, 2025]
- Virtual reality-assisted prediction of adult ADHD based on eye tracking, EEG, actigraphy and behavioral indices: a machine learning analysis of... - January 1st, 2025 [January 1st, 2025]
- Open source machine learning systems are highly vulnerable to security threats - TechRadar - December 22nd, 2024 [December 22nd, 2024]
- After the PS5 Pro's less dramatic changes, PlayStation architect Mark Cerny says the next-gen will focus more on CPUs, memory, and machine-learning -... - December 22nd, 2024 [December 22nd, 2024]
- Accelerating LLM Inference on NVIDIA GPUs with ReDrafter - Apple Machine Learning Research - December 22nd, 2024 [December 22nd, 2024]
- Machine learning for the prediction of mortality in patients with sepsis-associated acute kidney injury: a systematic review and meta-analysis - BMC... - December 22nd, 2024 [December 22nd, 2024]
- Machine learning uncovers three osteosarcoma subtypes for targeted treatment - Medical Xpress - December 22nd, 2024 [December 22nd, 2024]
- From Miniatures to Machine Learning: Crafting the VFX of Alien: Romulus - Animation World Network - December 22nd, 2024 [December 22nd, 2024]
- Identification of hub genes, diagnostic model, and immune infiltration in preeclampsia by integrated bioinformatics analysis and machine learning -... - December 22nd, 2024 [December 22nd, 2024]
- This AI Paper from Microsoft and Novartis Introduces Chimera: A Machine Learning Framework for Accurate and Scalable Retrosynthesis Prediction -... - December 18th, 2024 [December 18th, 2024]
- Benefits and Challenges of Integrating AI and Machine Learning into EHR Systems - Healthcare IT Today - December 18th, 2024 [December 18th, 2024]
- The History Of AI: How Machine Learning's Evolution Is Reshaping Everything Around Us - SlashGear - December 18th, 2024 [December 18th, 2024]
- AI and Machine Learning to Enhance Pension Plan Governance and the Investor Experience: New CFA Institute Research - Fintech Finance - December 18th, 2024 [December 18th, 2024]
- Address Common Machine Learning Challenges With Managed MLflow - The New Stack - December 18th, 2024 [December 18th, 2024]
- Machine Learning Used To Classify Fossils Of Extinct Pollen - Offworld Astrobiology Applications? - Astrobiology News - December 18th, 2024 [December 18th, 2024]
- Machine learning model predicts CDK4/6 inhibitor effectiveness in metastatic breast cancer - News-Medical.Net - December 18th, 2024 [December 18th, 2024]
- New Lockheed Martin Subsidiary to Offer Machine Learning Tools to Defense Customers - ExecutiveBiz - December 18th, 2024 [December 18th, 2024]
- How Powerful Will AI and Machine Learning Become? - International Policy Digest - December 18th, 2024 [December 18th, 2024]
- ChatGPT-Assisted Machine Learning for Chronic Disease Classification and Prediction: A Developmental and Validation Study - Cureus - December 18th, 2024 [December 18th, 2024]
- Blood Tests Are Far From Perfect But Machine Learning Could Change That - Inverse - December 18th, 2024 [December 18th, 2024]
- Amazons AGI boss: You dont need a PhD in machine learning to build with AI anymore - Fortune - December 18th, 2024 [December 18th, 2024]
- From Novice to Pro: A Roadmap for Your Machine Learning Career - KDnuggets - December 10th, 2024 [December 10th, 2024]
- Dimension nabs $500M second fund for 'still contrary' intersection of bio and machine learning - Endpoints News - December 10th, 2024 [December 10th, 2024]
- Using Machine Learning to Make A Really Big Detailed Simulation - Astrobites - December 10th, 2024 [December 10th, 2024]
- Driving Business Growth with GreenTomatos Data and Machine Learning Strategy on Generative AI - AWS Blog - December 10th, 2024 [December 10th, 2024]
- Unlocking the power of data analytics and machine learning to drive business performance - WTW - December 10th, 2024 [December 10th, 2024]
- AI and the Ethics of Machine Learning | by Abwahabanjum | Dec, 2024 - Medium - December 10th, 2024 [December 10th, 2024]
- Differentiating Cystic Lesions in the Sellar Region of the Brain Using Artificial Intelligence and Machine Learning for Early Diagnosis: A Prospective... - December 10th, 2024 [December 10th, 2024]
- New Amazon SageMaker AI Innovations Reimagine How Customers Build and Scale Generative AI and Machine Learning Models - Amazon Press Release - December 10th, 2024 [December 10th, 2024]