Machine learning results: pay attention to what you don’t see – STAT
Even as machine learning and artificial intelligence are drawing substantial attention in health care, overzealousness for these technologies has created an environment in which other critical aspects of the research are often overlooked.
Theres no question that the increasing availability of large data sources and off-the-shelf machine learning tools offer tremendous resources to researchers. Yet a lack of understanding about the limitations of both the data and the algorithms can lead to erroneous or unsupported conclusions.
Given that machine learning in the health domain can have a direct impact on peoples lives, broad claims emerging from this kind of research should not be embraced without serious vetting. Whether conducting health care research or reading about it, make sure to consider what you dont see in the data and analyses.
advertisement
One key question to ask is: Whose information is in the data and what do these data reflect?
Common forms of electronic health data, such as billing claims and clinical records, contain information only on individuals who have encounters with the health care system. But many individuals who are sick dont or cant see a doctor or other health care provider and so are invisible in these databases. This may be true for individuals with lower incomes or those who live in rural communities with rising hospital closures. As University of Toronto machine learning professor Marzyeh Ghassemi said earlier this year:
Even among patients who do visit their doctors, health conditions are not consistently recorded. Health data also reflect structural racism, which has devastating consequences.
Data from randomized trials are not immune to these issues. As a ProPublica report demonstrated, black and Native American patients are drastically underrepresented in cancer clinical trials. This is important to underscore given that randomized trials are frequently highlighted as superior in discussions about machine learning work that leverages nonrandomized electronic health data.
In interpreting results from machine learning research, its important to be aware that the patients in a study often do not depict the population we wish to make conclusions about and that the information collected is far from complete.
It has become commonplace to evaluate machine learning algorithms based on overall measures like accuracy or area under the curve. However, one evaluation metric cannot capture the complexity of performance. Be wary of research that claims to be ready for translation into clinical practice but only presents a leader board of tools that are ranked based on a single metric.
As an extreme illustration, an algorithm designed to predict a rare condition found in only 1% of the population can be extremely accurate by labeling all individuals as not having the condition. This tool is 99% accurate, but completely useless. Yet, it may outperform other algorithms if accuracy is considered in isolation.
Whats more, algorithms are frequently not evaluated based on multiple hold-out samples in cross-validation. Using only a single hold-out sample, which is done in many published papers, often leads to higher variance and misleading metric performance.
Beyond examining multiple overall metrics of performance for machine learning, we should also assess how tools perform in subgroups as a step toward avoiding bias and discrimination. For example, artificial intelligence-based facial recognition software performed poorly when analyzing darker-skinned women. Many measures of algorithmic fairness center on performance in subgroups.
Bias in algorithms has largely not been a focus in health care research. That needs to change. A new study found substantial racial bias against black patients in a commercial algorithm used by many hospitals and other health care systems. Other work developed algorithms to improve fairness for subgroups in health care spending formulas.
Subjective decision-making pervades research. Who decides what the research question will be, which methods will be applied to answering it, and how the techniques will be assessed all matter. Diverse teams are needed not just because they yield better results. As Rediet Abebe, a junior fellow of Harvards Society of Fellows, has written, In both private enterprise and the public sector, research must be reflective of the society were serving.
The influx of so-called digital data thats available through search engines and social media may be one resource for understanding the health of individuals who do not have encounters with the health care system. There have, however, been notable failures with these data. But there are also promising advances using online search queries at scale where traditional approaches like conducting surveys would be infeasible.
Increasingly granular data are now becoming available thanks to wearable technologies such as Fitbit trackers and Apple Watches. Researchers are actively developing and applying techniques to summarize the information gleaned from these devices for prevention efforts.
Much of the published clinical machine learning research, however, focuses on predicting outcomes or discovering patterns. Although machine learning for causal questions in health and biomedicine is a rapidly growing area, we dont see a lot of this work yet because it is new. Recent examples of it include the comparative effectiveness of feeding interventions in a pediatric intensive care unit and the effectiveness of different types of drug-eluting coronary artery stents.
Understanding how the data were collected and using appropriate evaluation metrics will also be crucial for studies that incorporate novel data sources and those attempting to establish causality.
In our drive to improve health with (and without) machine learning, we must not forget to look for what is missing: What information do we not have about the underlying health care system? Why might an individual or a code be unobserved? What subgroups have not been prioritized? Who is on the research team?
Giving these questions a place at the table will be the only way to see the whole picture.
Sherri Rose, Ph.D., is associate professor of health care policy at Harvard Medical School and co-author of the first book on machine learning for causal inference, Targeted Learning (Springer, 2011).
See the article here:
Machine learning results: pay attention to what you don't see - STAT
- Combining multi-parametric MRI radiomics features with tumor abnormal protein to construct a machine learning-based predictive model for prostate... - July 2nd, 2025 [July 2nd, 2025]
- New insight into viscosity prediction of imidazolium-based ionic liquids and their mixtures with machine learning models - Nature - July 2nd, 2025 [July 2nd, 2025]
- Implementing partial least squares and machine learning regressive models for prediction of drug release in targeted drug delivery application -... - July 2nd, 2025 [July 2nd, 2025]
- Advanced analysis of defect clusters in nuclear reactors using machine learning techniques - Nature - July 2nd, 2025 [July 2nd, 2025]
- Machine learning analysis of kinematic movement features during functional tasks to discriminate chronic neck pain patients from asymptomatic controls... - July 2nd, 2025 [July 2nd, 2025]
- Enhanced machine learning models for predicting three-year mortality in Non-STEMI patients aged 75 and above - BMC Geriatrics - July 2nd, 2025 [July 2nd, 2025]
- Modeling seawater intrusion along the Alabama coastline using physical and machine learning models to evaluate the effects of multiscale natural and... - July 2nd, 2025 [July 2nd, 2025]
- A comprehensive study based on machine learning models for early identification Mycoplasma pneumoniae infection in segmental/lobar pneumonia - Nature - July 2nd, 2025 [July 2nd, 2025]
- Identifying ovarian cancer with machine learning DNA methylation pattern analysis - Nature - July 2nd, 2025 [July 2nd, 2025]
- High-isolation dual-band MIMO antenna for next-generation 5G wireless networks at 28/38 GHz with machine learning-based gain prediction - Nature - July 2nd, 2025 [July 2nd, 2025]
- Sony and AMD want to focus on machine learning for the PS6 - Instant Gaming News - July 2nd, 2025 [July 2nd, 2025]
- How Machine Learning is Reshaping the Future of Sports Betting? - London Daily News - July 2nd, 2025 [July 2nd, 2025]
- An interpretable machine learning model for predicting depression in middle-aged and elderly cancer patients in China: a study based on the CHARLS... - July 2nd, 2025 [July 2nd, 2025]
- These Eight Projects Showcase the Power of Machine Learning on the Edge - Hackster.io - June 29th, 2025 [June 29th, 2025]
- Build Custom AI Tools for Your AI Agents that Combine Machine Learning and Statistical Analysis - MarkTechPost - June 29th, 2025 [June 29th, 2025]
- Check out these essential tips and trends for SEO in 2025 as AI and machine learning loom large - EdTech Innovation Hub - June 29th, 2025 [June 29th, 2025]
- Using machine learning to predict the severity of salmonella infection - Open Access Government - June 28th, 2025 [June 28th, 2025]
- How AI and machine learning are transforming drug discovery - Pharmaceutical Technology - June 28th, 2025 [June 28th, 2025]
- Capturing the complexity of human strategic decision-making with machine learning - Nature - June 26th, 2025 [June 26th, 2025]
- A framework to evaluate machine learning crystal stability predictions - Nature - June 24th, 2025 [June 24th, 2025]
- Machine learning revealed giant thermal conductivity reduction by strong phonon localization in two-angle disordered twisted multilayer graphene -... - June 24th, 2025 [June 24th, 2025]
- How AI and Machine Learning Are Powering the Next Generation of Pump Maintenance - Robotics Tomorrow - June 24th, 2025 [June 24th, 2025]
- Actuate Therapeutics Reports Positive Biomarker and Machine Learning Data from Phase 2 Elraglusib Trial in First-Line Treatment of Metastatic... - June 24th, 2025 [June 24th, 2025]
- Texas A&M Researchers Introduce a Two-Phase Machine Learning Method Named ShockCast for High-Speed Flow Simulation with Neural Temporal Re-Meshing -... - June 22nd, 2025 [June 22nd, 2025]
- Machine learning method helps bring diagnostic testing out of the lab - Medical Xpress - June 22nd, 2025 [June 22nd, 2025]
- Sebi proposes five-point rulebook for responsible use of AI, machine learning - The New Indian Express - June 22nd, 2025 [June 22nd, 2025]
- HAPIR: a refined Hallmark gene set-based machine learning approach for predicting immunotherapy response in cancer patients - Nature - June 20th, 2025 [June 20th, 2025]
- Machine learning boosts accuracy of point-of-care disease detection - News-Medical - June 20th, 2025 [June 20th, 2025]
- How AI and Machine Learning Are Transforming Food Poisoning Outbreak Detection - Food Poisoning News - June 20th, 2025 [June 20th, 2025]
- Evo 2 machine learning model enlists the power of AI in the fight against diseases - Medical Xpress - June 20th, 2025 [June 20th, 2025]
- Machine learning can predict which babies will be born with low birth weights - Medical Xpress - June 20th, 2025 [June 20th, 2025]
- Development and Validation of a Machine Learning Model for Identifying Novel HIV Integrase Inhibitors - Cureus - June 20th, 2025 [June 20th, 2025]
- IIT launches new online certificate programme in data science and machine learning for working profession - Times of India - June 20th, 2025 [June 20th, 2025]
- Calgary startup tackles referee abuse with microphones and machine learning - Yahoo - June 20th, 2025 [June 20th, 2025]
- New machine learning program accurately predicts who will stick with their exercise program - AOL.com - June 20th, 2025 [June 20th, 2025]
- Machine learning and generative AI: What are they good for in 2025? - MIT Sloan - June 4th, 2025 [June 4th, 2025]
- Researchers use machine learning to improve gene therapy - Stanford Report - June 4th, 2025 [June 4th, 2025]
- Machine learning for workpiece mass prediction using real and synthetic acoustic data - Nature - June 4th, 2025 [June 4th, 2025]
- Analyzing the Effect of Linguistic Similarity on Cross-Lingual Transfer: Tasks and Input Representations Matter - Apple Machine Learning Research - June 4th, 2025 [June 4th, 2025]
- Machine learning models for predicting severe acute kidney injury in patients with sepsis-induced myocardial injury - Nature - June 4th, 2025 [June 4th, 2025]
- A machine learning approach to carbon emissions prediction of the top eleven emitters by 2030 and their prospects for meeting Paris agreement targets... - June 4th, 2025 [June 4th, 2025]
- Augmentation of wastewater-based epidemiology with machine learning to support global health surveillance - Nature - June 4th, 2025 [June 4th, 2025]
- Analysis of a nonsteroidal anti inflammatory drug solubility in green solvent via developing robust models based on machine learning technique -... - June 4th, 2025 [June 4th, 2025]
- Your DNA Is a Machine Learning Model: Its Already Out There - Towards Data Science - June 4th, 2025 [June 4th, 2025]
- Development and validation of a risk prediction model for kinesiophobia in postoperative lung cancer patients: an interpretable machine learning... - June 4th, 2025 [June 4th, 2025]
- Predicting long-term patency of radiocephalic arteriovenous fistulas with machine learning and the PREDICT-AVF web app - Nature - June 4th, 2025 [June 4th, 2025]
- How Machine Learning and Cascade Learning Open Doors of Advanced Automation - Supply & Demand Chain Executive - June 4th, 2025 [June 4th, 2025]
- New Hydrogenation Reaction Mechanism for Superhydride Revealed by Machine Learning - Asia Research News | - June 4th, 2025 [June 4th, 2025]
- AI experiences rapid adoption, but with mixed outcomes Highlights from VotE: AI & Machine Learning - S&P Global - June 4th, 2025 [June 4th, 2025]
- IIPE introduces online M.Tech in Data Science and Machine Learning for working professionals - India Today - June 4th, 2025 [June 4th, 2025]
- Introducing Windows ML: The future of machine learning development on Windows - Windows Blog - May 19th, 2025 [May 19th, 2025]
- Settlement strategies and their driving mechanisms of Neolithic settlements using machine learning approaches: a case study in Zhejiang Province -... - May 19th, 2025 [May 19th, 2025]
- MyWear revolutionizes real-time health monitoring with comparative analysis of machine learning - Nature - May 19th, 2025 [May 19th, 2025]
- Leveraging stacking machine learning models and optimization for improved cyberattack detection - Nature - May 19th, 2025 [May 19th, 2025]
- Predicting land suitability for wheat and barley crops using machine learning techniques - Nature - May 10th, 2025 [May 10th, 2025]
- AI and Machine Learning - Ribeiro Preto adopts Optibus to optimise public bus system - Smart Cities World - May 10th, 2025 [May 10th, 2025]
- Childrens Hospital Los Angeles Leads Development of First Machine Learning Tool to Predict Risk of Cisplatin-Induced Hearing Loss - Business Wire - May 10th, 2025 [May 10th, 2025]
- Google is using machine learning to help Android users avoid unwanted and dangerous notifications - BetaNews - May 10th, 2025 [May 10th, 2025]
- London School of Emerging Technology (LSET) Concludes International Workshop on Emerging AI & Machine Learning Innovation - Barchart.com - May 10th, 2025 [May 10th, 2025]
- Thermal performance, entropy generation, and machine learning insights of AlO-TiO hybrid nanofluids in turbulent flow - Nature - May 10th, 2025 [May 10th, 2025]
- Predicting the efficacy of bevacizumab on peritumoral edema based on imaging features and machine learning - Nature - May 10th, 2025 [May 10th, 2025]
- How AI and machine learning are supercharging video conferencing tools - European CEO - May 10th, 2025 [May 10th, 2025]
- The need for a risk-based approach to AI and machine learning in healthcare - Health Tech World - May 10th, 2025 [May 10th, 2025]
- Integrated bioinformatics, machine learning, and molecular docking reveal crosstalk genes and potential drugs between periodontitis and systemic lupus... - May 10th, 2025 [May 10th, 2025]
- Adversarial Machine Learning in Detecting Inauthentic Behavior on Social Platforms - AiThority - May 10th, 2025 [May 10th, 2025]
- Exploring crop health and its associations with fungal soil microbiome composition using machine learning applied to remote sensing data - Nature - May 10th, 2025 [May 10th, 2025]
- Trust-based model and machine learning improve forest fire detection system - International Fire & Safety Journal - May 10th, 2025 [May 10th, 2025]
- A machine learning engineer shares the rsums that landed her jobs at Meta and X and what she'd change if she applied again - Business Insider Africa - May 5th, 2025 [May 5th, 2025]
- Recentive Analytics v. Fox: The Federal Circuit Provides Analysis on the Patent Eligibility of Machine Learning Claims - Mintz - May 5th, 2025 [May 5th, 2025]
- A machine learning engineer shares the rsums that landed her jobs at Meta and X and what she'd change if she applied again - Business Insider - May 5th, 2025 [May 5th, 2025]
- Enhancing urban resilience through machine learning-supported flood risk assessment: integrating flood susceptibility with building function... - May 5th, 2025 [May 5th, 2025]
- MicroAlgo Inc. Develops Classifier Auto-Optimization Technology Based on Variational Quantum Algorithms, Accelerating the Advancement of Quantum... - May 5th, 2025 [May 5th, 2025]
- Enhanced metal ion adsorption using ZnO-MXene nanocomposites with machine learning-based performance prediction - Nature - May 5th, 2025 [May 5th, 2025]
- Integrating SHAP analysis with machine learning to predict postpartum hemorrhage in vaginal births - BMC Pregnancy and Childbirth - May 5th, 2025 [May 5th, 2025]
- Machine learning provide new insights into how the brain responds to heroin use - News-Medical - May 2nd, 2025 [May 2nd, 2025]
- Machine Learning and AI in Basic HIV Research: From Big Data Analysis to Large Language Models - UNC Gillings School of Global Public Health - May 2nd, 2025 [May 2nd, 2025]
- Machine learning brings new insights to cells role in addiction, relapse - University of Cincinnati - May 2nd, 2025 [May 2nd, 2025]
- UH/UC Researchers Use Machine Learning to Map Brain Changes from Heroin Addiction - University of Houston - May 2nd, 2025 [May 2nd, 2025]
- Machine Learning Algorithm Predicts Shiba Inu Price In May You Should See This - The Crypto Update - May 2nd, 2025 [May 2nd, 2025]
- Seerist partners with SOCOM to enhance AI and machine learning for special operations - Defence Industry Europe - May 2nd, 2025 [May 2nd, 2025]