Machine learning results: pay attention to what you don’t see – STAT
Even as machine learning and artificial intelligence are drawing substantial attention in health care, overzealousness for these technologies has created an environment in which other critical aspects of the research are often overlooked.
Theres no question that the increasing availability of large data sources and off-the-shelf machine learning tools offer tremendous resources to researchers. Yet a lack of understanding about the limitations of both the data and the algorithms can lead to erroneous or unsupported conclusions.
Given that machine learning in the health domain can have a direct impact on peoples lives, broad claims emerging from this kind of research should not be embraced without serious vetting. Whether conducting health care research or reading about it, make sure to consider what you dont see in the data and analyses.
advertisement
One key question to ask is: Whose information is in the data and what do these data reflect?
Common forms of electronic health data, such as billing claims and clinical records, contain information only on individuals who have encounters with the health care system. But many individuals who are sick dont or cant see a doctor or other health care provider and so are invisible in these databases. This may be true for individuals with lower incomes or those who live in rural communities with rising hospital closures. As University of Toronto machine learning professor Marzyeh Ghassemi said earlier this year:
Even among patients who do visit their doctors, health conditions are not consistently recorded. Health data also reflect structural racism, which has devastating consequences.
Data from randomized trials are not immune to these issues. As a ProPublica report demonstrated, black and Native American patients are drastically underrepresented in cancer clinical trials. This is important to underscore given that randomized trials are frequently highlighted as superior in discussions about machine learning work that leverages nonrandomized electronic health data.
In interpreting results from machine learning research, its important to be aware that the patients in a study often do not depict the population we wish to make conclusions about and that the information collected is far from complete.
It has become commonplace to evaluate machine learning algorithms based on overall measures like accuracy or area under the curve. However, one evaluation metric cannot capture the complexity of performance. Be wary of research that claims to be ready for translation into clinical practice but only presents a leader board of tools that are ranked based on a single metric.
As an extreme illustration, an algorithm designed to predict a rare condition found in only 1% of the population can be extremely accurate by labeling all individuals as not having the condition. This tool is 99% accurate, but completely useless. Yet, it may outperform other algorithms if accuracy is considered in isolation.
Whats more, algorithms are frequently not evaluated based on multiple hold-out samples in cross-validation. Using only a single hold-out sample, which is done in many published papers, often leads to higher variance and misleading metric performance.
Beyond examining multiple overall metrics of performance for machine learning, we should also assess how tools perform in subgroups as a step toward avoiding bias and discrimination. For example, artificial intelligence-based facial recognition software performed poorly when analyzing darker-skinned women. Many measures of algorithmic fairness center on performance in subgroups.
Bias in algorithms has largely not been a focus in health care research. That needs to change. A new study found substantial racial bias against black patients in a commercial algorithm used by many hospitals and other health care systems. Other work developed algorithms to improve fairness for subgroups in health care spending formulas.
Subjective decision-making pervades research. Who decides what the research question will be, which methods will be applied to answering it, and how the techniques will be assessed all matter. Diverse teams are needed not just because they yield better results. As Rediet Abebe, a junior fellow of Harvards Society of Fellows, has written, In both private enterprise and the public sector, research must be reflective of the society were serving.
The influx of so-called digital data thats available through search engines and social media may be one resource for understanding the health of individuals who do not have encounters with the health care system. There have, however, been notable failures with these data. But there are also promising advances using online search queries at scale where traditional approaches like conducting surveys would be infeasible.
Increasingly granular data are now becoming available thanks to wearable technologies such as Fitbit trackers and Apple Watches. Researchers are actively developing and applying techniques to summarize the information gleaned from these devices for prevention efforts.
Much of the published clinical machine learning research, however, focuses on predicting outcomes or discovering patterns. Although machine learning for causal questions in health and biomedicine is a rapidly growing area, we dont see a lot of this work yet because it is new. Recent examples of it include the comparative effectiveness of feeding interventions in a pediatric intensive care unit and the effectiveness of different types of drug-eluting coronary artery stents.
Understanding how the data were collected and using appropriate evaluation metrics will also be crucial for studies that incorporate novel data sources and those attempting to establish causality.
In our drive to improve health with (and without) machine learning, we must not forget to look for what is missing: What information do we not have about the underlying health care system? Why might an individual or a code be unobserved? What subgroups have not been prioritized? Who is on the research team?
Giving these questions a place at the table will be the only way to see the whole picture.
Sherri Rose, Ph.D., is associate professor of health care policy at Harvard Medical School and co-author of the first book on machine learning for causal inference, Targeted Learning (Springer, 2011).
See the article here:
Machine learning results: pay attention to what you don't see - STAT
- Muna Al-Khaifi: Detection of Breast Cancer Using Machine Learning and Explainable AI - Oncodaily - October 13th, 2025 [October 13th, 2025]
- Expedia Group Unveils Innovative AI and Machine Learning Solutions to Transform Partner Travel Experiences - Travel And Tour World - October 13th, 2025 [October 13th, 2025]
- Machine Learning-Guided Prediction of Formulation Performance in Inhalable CiprofloxacinBile Acid Dispersions with Antimicrobial and Toxicity... - October 13th, 2025 [October 13th, 2025]
- Machine Learning and BIG DATA workshop planned Oct. 14-15 - West Virginia University - October 11th, 2025 [October 11th, 2025]
- How Google enables third-party circularity by increasing recycling rates with Machine Learning - The World Business Council for Sustainable... - October 11th, 2025 [October 11th, 2025]
- Integrating Artificial Intelligence and Machine Learning in Hydroclimatic Research - A Promising Step Forward - University of Northern British... - October 11th, 2025 [October 11th, 2025]
- Semi-automatic detection of anteriorly displaced temporomandibular joint discs in magnetic resonance images using machine learning - BMC Oral Health - October 11th, 2025 [October 11th, 2025]
- AI and Machine Learning - Partnership to bring infrastructure intelligence to US public sector - Smart Cities World - October 11th, 2025 [October 11th, 2025]
- Between rain and snow, machine learning finds nine precipitation types - Phys.org - October 9th, 2025 [October 9th, 2025]
- Between rain and snow, machine learning finds 9 precipitation types - Michigan Engineering News - October 9th, 2025 [October 9th, 2025]
- Machine learning optimizes nanoparticle design for drug delivery to the brain - Physics World - October 9th, 2025 [October 9th, 2025]
- Development and validation of a machine learning-based prediction model for prolonged length of stay after laparoscopic gastrointestinal surgery: a... - October 9th, 2025 [October 9th, 2025]
- G Sachs: Stock Mkt Not in Bubble Yet; Machine Learning/ AI Expected to Spawn New Wave of Superstars - AASTOCKS.com - October 9th, 2025 [October 9th, 2025]
- AI and Machine Learning - See.Sense works with City of Sydney to develop AI dashboard - Smart Cities World - October 9th, 2025 [October 9th, 2025]
- Machine Learning Used to Predict Live Birth Outcomes in Fresh Embryo Transfers - geneonline.com - October 9th, 2025 [October 9th, 2025]
- RIT researchers use machine learning to better understand the pathways of disease - Rochester Institute of Technology - October 7th, 2025 [October 7th, 2025]
- Leveraging machine learning to predict mosquito bed net utilization among women of reproductive age in sub-Saharan Africa - Malaria Journal - October 7th, 2025 [October 7th, 2025]
- Machine learning-based radiomics using magnetic resonance images for prediction of clinical complete response to neoadjuvant chemotherapy in patients... - October 7th, 2025 [October 7th, 2025]
- Machine Learning Self Driving Cars: The Technology Driving the Future of Mobility - SpeedwayMedia.com - October 7th, 2025 [October 7th, 2025]
- Investigating the relationship between blood factors and HDL-C levels in the bloodstream using machine learning methods - Journal of Health,... - October 7th, 2025 [October 7th, 2025]
- AI in the fast lane: F1 teams Alpine, Audi use machine learning as force multiplier - The Business Times - October 7th, 2025 [October 7th, 2025]
- Future Scope of Machine Learning in Healthcare Market Set to Witness Significant Growth by 2025-2032 - openPR.com - October 7th, 2025 [October 7th, 2025]
- AI and Machine Learning - AI readiness and adoption toolkit launched - Smart Cities World - October 4th, 2025 [October 4th, 2025]
- Machine Learning Model UmamiPredict Developed to Forecast Savory Taste of Molecules and Peptides - geneonline.com - October 4th, 2025 [October 4th, 2025]
- Machine Learning Boosts Crop Yield Predictions in Senegal - Bioengineer.org - October 4th, 2025 [October 4th, 2025]
- Machine learning-driven stability analysis of eco-friendly superhydrophobic graphene-based coatings on copper substrate - Nature - October 4th, 2025 [October 4th, 2025]
- Integrated machine learning analysis of proteomic and transcriptomic data identifies healing associated targets in diabetic wound repair - Nature - October 4th, 2025 [October 4th, 2025]
- Development and evaluation of a machine learning prediction model for short-term mortality in patients with diabetes or hyperglycemia at emergency... - October 4th, 2025 [October 4th, 2025]
- Fast and robust mixed gas identification and recognition using tree-based machine learning and sensor array response - Nature - October 4th, 2025 [October 4th, 2025]
- Estimation of sexual dimorphism of adult human mandibles of South Indian origin using non-metric parameters and machine learning classification... - October 4th, 2025 [October 4th, 2025]
- Cloud-Based Machine Learning Platforms Technologies Market Growth and Future Prospects - Precedence Research - October 4th, 2025 [October 4th, 2025]
- Machine Learning Framework Developed to Optimize Phosphorus Recovery in Hydrothermal Treatment of Livestock Manure - geneonline.com - October 4th, 2025 [October 4th, 2025]
- Unifying machine learning and interpolation theory via interpolating neural networks - Nature - October 2nd, 2025 [October 2nd, 2025]
- Anna: an open-source platform for real-time integration of machine learning classifiers with veterinary electronic health records - BMC Veterinary... - October 2nd, 2025 [October 2nd, 2025]
- The Future of Liver Health: Can Human Models and Machine Learning Reduce Disease Rates? - Technology Networks - October 2nd, 2025 [October 2nd, 2025]
- Machine Learning Radiomics Predicts Pancreatic Cancer Invasion - Bioengineer.org - October 2nd, 2025 [October 2nd, 2025]
- Next-generation COVID-19 detection using a metasurface biosensor with machine learning-enhanced refractive index sensing - Nature - October 2nd, 2025 [October 2nd, 2025]
- Machine learning-based models for screening of anemia and leukemia using features of complete blood count reports - Nature - October 2nd, 2025 [October 2nd, 2025]
- Estimating the peak age of chess players through statistical and machine learning techniques - Nature - October 2nd, 2025 [October 2nd, 2025]
- Optimizing water quality index using machine learning: a six-year comparative study in riverine and reservoir systems - Nature - October 2nd, 2025 [October 2nd, 2025]
- Physics-informed machine learning-based real-time long-horizon temperature fields prediction in metallic additive manufacturing - Nature - October 2nd, 2025 [October 2nd, 2025]
- The Silicon Revolution: How AI and Machine Learning Are Forging the Future of Semiconductor Manufacturing - FinancialContent - October 2nd, 2025 [October 2nd, 2025]
- Machine learning model for differentiating Pneumocystis jirovecii pneumonia from colonization and analyzing mortality risk in non-HIV patients using... - October 2nd, 2025 [October 2nd, 2025]
- Radiomics and Machine Learning Applied to CECT Scans Show Potential in Predicting Perineural Invasion in Pancreatic Cancer - geneonline.com - October 2nd, 2025 [October 2nd, 2025]
- Machine learning and response surface optimization to enhance diesel engine performance using milk scum biodiesel with alumina nanoparticles - Nature - October 2nd, 2025 [October 2nd, 2025]
- Landmark Patent Appeal Decision Strengthens Protection for AI and Machine Learning Innovations - The National Law Review - October 2nd, 2025 [October 2nd, 2025]
- Machine learning researchers and industry leaders gathering at Santa Clara University - Stories - News & Events - Santa Clara University - September 30th, 2025 [September 30th, 2025]
- Building better batteries with amorphous materials and machine learning - Tech Xplore - September 30th, 2025 [September 30th, 2025]
- Machine Learning-Supported Fragment Hit Expansion in Absence of X-Ray Structures - Evotec - September 30th, 2025 [September 30th, 2025]
- Machine learning model predicts which radiotherapy patients are most vulnerable to adverse side effects - Health Imaging - September 30th, 2025 [September 30th, 2025]
- How AI and Machine Learning Are Revolutionizing Laser Welding - Downbeach - September 30th, 2025 [September 30th, 2025]
- What if A.I. Doesnt Get Much Better Than This? - Machine Learning Week 2025 - September 30th, 2025 [September 30th, 2025]
- Sex estimation from the sternum in Turkish population using various machine learning methods and deep neural networks - SpringerOpen - September 30th, 2025 [September 30th, 2025]
- Predictive AI Must Be Valuated But Rarely Is. Heres How To Do It - Machine Learning Week 2025 - September 30th, 2025 [September 30th, 2025]
- Interpretable machine learning incorporating major lithology for regional landslide warning in northern and eastern Guangdong - Nature - September 28th, 2025 [September 28th, 2025]
- Building Machine Learning Application with Django - KDnuggets - September 28th, 2025 [September 28th, 2025]
- Evaluating the use of body mass index change as a proxy for anorexia nervosa recovery: a machine learning perspective - Journal of Eating Disorders - September 28th, 2025 [September 28th, 2025]
- Prediction of cutting parameters and reduction of output parameters using machine learning in milling of Inconel 718 alloy - Nature - September 28th, 2025 [September 28th, 2025]
- How AI and machine learning are changing both retail and online casino experiences - Retail Technology Innovation Hub - September 28th, 2025 [September 28th, 2025]
- Machine learning and cell imaging combine to predict effectiveness of multiple sclerosis medication - Medical Xpress - September 25th, 2025 [September 25th, 2025]
- IC combines machine learning and analogue inferencing - Electronics Weekly - September 25th, 2025 [September 25th, 2025]
- ODU Awarded $2.3M NIH Grant to Improve Detection of Brain Tumor Recurrence with AI and Machine Learning - Old Dominion University - September 25th, 2025 [September 25th, 2025]
- Development of a machine learning-based depression risk identification tool for older adults with asthma - BMC Psychiatry - September 25th, 2025 [September 25th, 2025]
- AI and Machine Learning Uses in Neuroscience Drug Discovery, Upcoming Webinar Hosted by Xtalks - PR Newswire - September 25th, 2025 [September 25th, 2025]
- Error-controlled non-additive interaction discovery in machine learning models - Nature - September 23rd, 2025 [September 23rd, 2025]
- AI, Machine Learning Will Drive Market Data Consumption - Markets Media - September 23rd, 2025 [September 23rd, 2025]
- Machine Learning Model May Optimize Treatment Selection and Survival in HCC - Targeted Oncology - September 23rd, 2025 [September 23rd, 2025]
- From pixels to pumps: Machine learning and satellite imagery help map irrigation - Phys.org - September 23rd, 2025 [September 23rd, 2025]
- CMU physicist challenges what we know about particle physics with machine learning - The Tartan - September 23rd, 2025 [September 23rd, 2025]
- Hire Python Developers to Leverage the Power of Machine Learning & AI - WebWire - September 23rd, 2025 [September 23rd, 2025]
- AI-Powered Biology Careers in 2025: Opportunities with Machine Learning Skills - BioTecNika - September 23rd, 2025 [September 23rd, 2025]
- Machine learning and predictingstock price movements on NGX - Businessamlive - September 23rd, 2025 [September 23rd, 2025]
- Building a Hybrid Rule-Based and Machine Learning Framework to Detect and Defend Against Jailbreak Prompts in LLM Systems - MarkTechPost - September 21st, 2025 [September 21st, 2025]
- Development of a novel machine learning-based adaptive resampling algorithm for nuclear data processing - Nature - September 19th, 2025 [September 19th, 2025]
- Autobot platform uses machine learning to rapidly find best ways to make advanced materials - Tech Xplore - September 19th, 2025 [September 19th, 2025]
- 5 Key Takeaways | The Law of the Machine (Learning): Solving Complex AI Challenges - JD Supra - September 17th, 2025 [September 17th, 2025]
- Spectral and Machine Learning Approach Enhances Efficiency of Grape Embryo Rescue | Newswise - Newswise - September 17th, 2025 [September 17th, 2025]
- Helpful Reminders for Patent Eligibility of AI, Machine Learning, and Other Software-Related Inventions - JD Supra - September 17th, 2025 [September 17th, 2025]
- Opening the black box of machine learning-controlled plasma treatments - AIP.ORG - September 17th, 2025 [September 17th, 2025]
- Post-compilation Circuit Scaling for Quantum Machine Learning Models Reveals Resource Trends and Topology Impacts - Quantum Zeitgeist - September 17th, 2025 [September 17th, 2025]