HEAL: A framework for health equity assessment of machine learning performance – Google Research
Posted by Mike Schaekermann, Research Scientist, Google Research, and Ivor Horn, Chief Health Equity Officer & Director, Google Core
Health equity is a major societal concern worldwide with disparities having many causes. These sources include limitations in access to healthcare, differences in clinical treatment, and even fundamental differences in the diagnostic technology. In dermatology for example, skin cancer outcomes are worse for populations such as minorities, those with lower socioeconomic status, or individuals with limited healthcare access. While there is great promise in recent advances in machine learning (ML) and artificial intelligence (AI) to help improve healthcare, this transition from research to bedside must be accompanied by a careful understanding of whether and how they impact health equity.
Health equity is defined by public health organizations as fairness of opportunity for everyone to be as healthy as possible. Importantly, equity may be different from equality. For example, people with greater barriers to improving their health may require more or different effort to experience this fair opportunity. Similarly, equity is not fairness as defined in the AI for healthcare literature. Whereas AI fairness often strives for equal performance of the AI technology across different patient populations, this does not center the goal of prioritizing performance with respect to pre-existing health disparities.
In Health Equity Assessment of machine Learning performance (HEAL): a framework and dermatology AI model case study, published in The Lancet eClinicalMedicine, we propose a methodology to quantitatively assess whether ML-based health technologies perform equitably. In other words, does the ML model perform well for those with the worst health outcomes for the condition(s) the model is meant to address? This goal anchors on the principle that health equity should prioritize and measure model performance with respect to disparate health outcomes, which may be due to a number of factors that include structural inequities (e.g., demographic, social, cultural, political, economic, environmental and geographic).
The HEAL framework proposes a 4-step process to estimate the likelihood that an ML-based health technology performs equitably:
The final steps output is termed the HEAL metric, which quantifies how anticorrelated the ML models performance is with health disparities. In other words, does the model perform better with populations that have the worse health outcomes?
This 4-step process is designed to inform improvements for making ML model performance more equitable, and is meant to be iterative and re-evaluated on a regular basis. For example, the availability of health outcomes data in step (2) can inform the choice of demographic factors and brackets in step (1), and the framework can be applied again with new datasets, models and populations.
With this work, we take a step towards encouraging explicit assessment of the health equity considerations of AI technologies, and encourage prioritization of efforts during model development to reduce health inequities for subpopulations exposed to structural inequities that can precipitate disparate outcomes. We should note that the present framework does not model causal relationships and, therefore, cannot quantify the actual impact a new technology will have on reducing health outcome disparities. However, the HEAL metric may help identify opportunities for improvement, where the current performance is not prioritized with respect to pre-existing health disparities.
As an illustrative case study, we applied the framework to a dermatology model, which utilizes a convolutional neural network similar to that described in prior work. This example dermatology model was trained to classify 288 skin conditions using a development dataset of 29k cases. The input to the model consists of three photos of a skin concern along with demographic information and a brief structured medical history. The output consists of a ranked list of possible matching skin conditions.
Using the HEAL framework, we evaluated this model by assessing whether it prioritized performance with respect to pre-existing health outcomes. The model was designed to predict possible dermatologic conditions (from a list of hundreds) based on photos of a skin concern and patient metadata. Evaluation of the model is done using a top-3 agreement metric, which quantifies how often the top 3 output conditions match the most likely condition as suggested by a dermatologist panel. The HEAL metric is computed via the anticorrelation of this top-3 agreement with health outcome rankings.
We used a dataset of 5,420 teledermatology cases, enriched for diversity in age, sex and race/ethnicity, to retrospectively evaluate the models HEAL metric. The dataset consisted of store-and-forward cases from patients of 20 years or older from primary care providers in the USA and skin cancer clinics in Australia. Based on a review of the literature, we decided to explore race/ethnicity, sex and age as potential factors of inequity, and used sampling techniques to ensure that our evaluation dataset had sufficient representation of all race/ethnicity, sex and age groups. To quantify pre-existing health outcomes for each subgroup we relied on measurements from public databases endorsed by the World Health Organization, such as Years of Life Lost (YLLs) and Disability-Adjusted Life Years (DALYs; years of life lost plus years lived with disability).
However, while the model was likely to perform equitably across age groups for cancer conditions specifically, we discovered that it had room for improvement across age groups for non-cancer conditions. For example, those 70+ have the poorest health outcomes related to non-cancer skin conditions, yet the model didn't prioritize performance for this subgroup.
For holistic evaluation, the HEAL metric cannot be employed in isolation. Instead this metric should be contextualized alongside many other factors ranging from computational efficiency and data privacy to ethical values, and aspects that may influence the results (e.g., selection bias or differences in representativeness of the evaluation data across demographic groups).
As an adversarial example, the HEAL metric can be artificially improved by deliberately reducing model performance for the most advantaged subpopulation until performance for that subpopulation is worse than all others. For illustrative purposes, given subpopulations A and B where A has worse health outcomes than B, consider the choice between two models: Model 1 (M1) performs 5% better for subpopulation A than for subpopulation B. Model 2 (M2) performs 5% worse on subpopulation A than B. The HEAL metric would be higher for M1 because it prioritizes performance on a subpopulation with worse outcomes. However, M1 may have absolute performances of just 75% and 70% for subpopulations A and B respectively, while M2 has absolute performances of 75% and 80% for subpopulations A and B respectively. Choosing M1 over M2 would lead to worse overall performance for all subpopulations because some subpopulations are worse-off while no subpopulation is better-off.
Accordingly, the HEAL metric should be used alongside a Pareto condition (discussed further in the paper), which restricts model changes so that outcomes for each subpopulation are either unchanged or improved compared to the status quo, and performance does not worsen for any subpopulation.
The HEAL framework, in its current form, assesses the likelihood that an ML-based model prioritizes performance for subpopulations with respect to pre-existing health disparities for specific subpopulations. This differs from the goal of understanding whether ML will reduce disparities in outcomes across subpopulations in reality. Specifically, modeling improvements in outcomes requires a causal understanding of steps in the care journey that happen both before and after use of any given model. Future research is needed to address this gap.
The HEAL framework enables a quantitative assessment of the likelihood that health AI technologies prioritize performance with respect to health disparities. The case study demonstrates how to apply the framework in the dermatological domain, indicating a high likelihood that model performance is prioritized with respect to health disparities across sex and race/ethnicity, but also revealing the potential for improvements for non-cancer conditions across age. The case study also illustrates limitations in the ability to apply all recommended aspects of the framework (e.g., mapping societal context, availability of data), thus highlighting the complexity of health equity considerations of ML-based tools.
This work is a proposed approach to address a grand challenge for AI and health equity, and may provide a useful evaluation framework not only during model development, but during pre-implementation and real-world monitoring stages, e.g., in the form of health equity dashboards. We hold that the strength of the HEAL framework is in its future application to various AI tools and use cases and its refinement in the process. Finally, we acknowledge that a successful approach towards understanding the impact of AI technologies on health equity needs to be more than a set of metrics. It will require a set of goals agreed upon by a community that represents those who will be most impacted by a model.
The research described here is joint work across many teams at Google. We are grateful to all our co-authors: Terry Spitz, Malcolm Pyles, Heather Cole-Lewis, Ellery Wulczyn, Stephen R. Pfohl, Donald Martin, Jr., Ronnachai Jaroensri, Geoff Keeling, Yuan Liu, Stephanie Farquhar, Qinghan Xue, Jenna Lester, Can Hughes, Patricia Strachan, Fraser Tan, Peggy Bui, Craig H. Mermel, Lily H. Peng, Yossi Matias, Greg S. Corrado, Dale R. Webster, Sunny Virmani, Christopher Semturs, Yun Liu, and Po-Hsuan Cameron Chen. We also thank Lauren Winer, Sami Lachgar, Ting-An Lin, Aaron Loh, Morgan Du, Jenny Rizk, Renee Wong, Ashley Carrick, Preeti Singh, Annisah Um'rani, Jessica Schrouff, Alexander Brown, and Anna Iurchenko for their support of this project.
Go here to see the original:
HEAL: A framework for health equity assessment of machine learning performance - Google Research
- Machine learning-based differentiation of schizophrenia and bipolar disorder using multiscale fuzzy entropy and relative power from resting-state EEG... - April 12th, 2025 [April 12th, 2025]
- Increasing load factor in logistics and evaluating shipment performance with machine learning methods: A case from the automotive industry - Nature - April 12th, 2025 [April 12th, 2025]
- Machine learning-based prediction of the thermal conductivity of filling material incorporating steelmaking slag in a ground heat exchanger system -... - April 12th, 2025 [April 12th, 2025]
- Do LLMs Know Internally When They Follow Instructions? - Apple Machine Learning Research - April 12th, 2025 [April 12th, 2025]
- Leveraging machine learning in precision medicine to unveil organochlorine pesticides as predictive biomarkers for thyroid dysfunction - Nature - April 12th, 2025 [April 12th, 2025]
- Analysis and validation of hub genes for atherosclerosis and AIDS and immune infiltration characteristics based on bioinformatics and machine learning... - April 12th, 2025 [April 12th, 2025]
- AI and Machine Learning - Bentley and Google partner to improve asset analytics - Smart Cities World - April 12th, 2025 [April 12th, 2025]
- Where to find the next Earth: Machine learning accelerates the search for habitable planets - Phys.org - April 10th, 2025 [April 10th, 2025]
- Concurrent spin squeezing and field tracking with machine learning - Nature - April 10th, 2025 [April 10th, 2025]
- This AI Paper Introduces a Machine Learning Framework to Estimate the Inference Budget for Self-Consistency and GenRMs (Generative Reward Models) -... - April 10th, 2025 [April 10th, 2025]
- UCI researchers study use of machine learning to improve stroke diagnosis, access to timely treatment - UCI Health - April 10th, 2025 [April 10th, 2025]
- Assessing dengue forecasting methods: a comparative study of statistical models and machine learning techniques in Rio de Janeiro, Brazil - Tropical... - April 10th, 2025 [April 10th, 2025]
- Machine learning integration of multimodal data identifies key features of circulating NT-proBNP in people without cardiovascular diseases - Nature - April 10th, 2025 [April 10th, 2025]
- How AI, Data Science, And Machine Learning Are Shaping The Future - Forbes - April 10th, 2025 [April 10th, 2025]
- Development and validation of interpretable machine learning models to predict distant metastasis and prognosis of muscle-invasive bladder cancer... - April 10th, 2025 [April 10th, 2025]
- From fax machines to machine learning: The fight for efficiency - HME News - April 10th, 2025 [April 10th, 2025]
- Carbon market and emission reduction: evidence from evolutionary game and machine learning - Nature - April 10th, 2025 [April 10th, 2025]
- Infleqtion Unveils Contextual Machine Learning (CML) at GTC 2025, Powering AI Breakthroughs with NVIDIA CUDA-Q and Quantum-Inspired Algorithms - Yahoo... - March 22nd, 2025 [March 22nd, 2025]
- Karlie Kloss' coding nonprofit offering free AI and machine learning workshop this weekend - KSDK.com - March 22nd, 2025 [March 22nd, 2025]
- Machine learning reveals distinct neuroanatomical signatures of cardiovascular and metabolic diseases in cognitively unimpaired individuals -... - March 22nd, 2025 [March 22nd, 2025]
- Machine learning analysis of cardiovascular risk factors and their associations with hearing loss - Nature.com - March 22nd, 2025 [March 22nd, 2025]
- Weekly Recap: Dual-Cure Inks, AI And Machine Learning Top This Weeks Stories - Ink World Magazine - March 22nd, 2025 [March 22nd, 2025]
- Network-based predictive models for artificial intelligence: an interpretable application of machine learning techniques in the assessment of... - March 22nd, 2025 [March 22nd, 2025]
- Machine learning aids in detection of 'brain tsunamis' - University of Cincinnati - March 22nd, 2025 [March 22nd, 2025]
- AI & Machine Learning in Database Management: Studying Trends and Applications with Nithin Gadicharla - Tech Times - March 22nd, 2025 [March 22nd, 2025]
- MicroRNA Biomarkers and Machine Learning for Hypertension Subtyping - Physician's Weekly - March 22nd, 2025 [March 22nd, 2025]
- Machine Learning Pioneer Ramin Hasani Joins Info-Tech's "Digital Disruption" Podcast to Explore the Future of AI and Liquid Neural Networks... - March 22nd, 2025 [March 22nd, 2025]
- Predicting HIV treatment nonadherence in adolescents with machine learning - News-Medical.Net - March 22nd, 2025 [March 22nd, 2025]
- AI And Machine Learning In Ink And Coatings Formulation - Ink World Magazine - March 22nd, 2025 [March 22nd, 2025]
- Counting whales by eavesdropping on their chatter, with help from machine learning - Mongabay.com - March 22nd, 2025 [March 22nd, 2025]
- Associate Professor - Artificial Intelligence and Machine Learning job with GALGOTIAS UNIVERSITY | 390348 - Times Higher Education - March 22nd, 2025 [March 22nd, 2025]
- Innovative Machine Learning Tool Reveals Secrets Of Marine Microbial Proteins - Evrim Aac - March 22nd, 2025 [March 22nd, 2025]
- Exploring the role of breastfeeding, antibiotics, and indoor environments in preschool children atopic dermatitis through machine learning and hygiene... - March 22nd, 2025 [March 22nd, 2025]
- Applying machine learning algorithms to explore the impact of combined noise and dust on hearing loss in occupationally exposed populations -... - March 22nd, 2025 [March 22nd, 2025]
- 'We want them to be the creators': Karlie Kloss' coding nonprofit offering free AI and machine learning workshop this weekend - KSDK.com - March 22nd, 2025 [March 22nd, 2025]
- New headset reads minds and uses AR, AI and machine learning to help people with locked-in-syndrome communicate with loved ones again - PC Gamer - March 22nd, 2025 [March 22nd, 2025]
- Enhancing cybersecurity through script development using machine and deep learning for advanced threat mitigation - Nature.com - March 11th, 2025 [March 11th, 2025]
- Machine learning-assisted wearable sensing systems for speech recognition and interaction - Nature.com - March 11th, 2025 [March 11th, 2025]
- Machine learning uncovers complexity of immunotherapy variables in bladder cancer - Hospital Healthcare - March 11th, 2025 [March 11th, 2025]
- Machine-learning algorithm analyzes gravitational waves from merging neutron stars in the blink of an eye - The University of Rhode Island - March 11th, 2025 [March 11th, 2025]
- Precision soil sampling strategy for the delineation of management zones in olive cultivation using unsupervised machine learning methods - Nature.com - March 11th, 2025 [March 11th, 2025]
- AI in Esports: How Machine Learning is Transforming Anti-Cheat Systems in Esports - Jumpstart Media - March 11th, 2025 [March 11th, 2025]
- Whats that microplastic? Advances in machine learning are making identifying plastics in the environment more reliable - The Conversation Indonesia - March 11th, 2025 [March 11th, 2025]
- Application of machine learning techniques in GlaucomAI system for glaucoma diagnosis and collaborative research support - Nature.com - March 11th, 2025 [March 11th, 2025]
- Elucidating the role of KCTD10 in coronary atherosclerosis: Harnessing bioinformatics and machine learning to advance understanding - Nature.com - March 11th, 2025 [March 11th, 2025]
- Hugging Face Tutorial: Unleashing the Power of AI and Machine Learning - - March 11th, 2025 [March 11th, 2025]
- Utilizing Machine Learning to Predict Host Stars and the Key Elemental Abundances of Small Planets - Astrobiology News - March 11th, 2025 [March 11th, 2025]
- AI to the rescue: Study shows machine learning predicts long term recovery for anxiety with 72% accuracy - Hindustan Times - March 11th, 2025 [March 11th, 2025]
- New in 2025.3: Reducing false positives with Machine Learning - Emsisoft - March 5th, 2025 [March 5th, 2025]
- Abnormal FX Returns And Liquidity-Based Machine Learning Approaches - Seeking Alpha - March 5th, 2025 [March 5th, 2025]
- Sentiment analysis of emoji fused reviews using machine learning and Bert - Nature.com - March 5th, 2025 [March 5th, 2025]
- Detection of obstetric anal sphincter injuries using machine learning-assisted impedance spectroscopy: a prospective, comparative, multicentre... - March 5th, 2025 [March 5th, 2025]
- JFrog and Hugging Face team to improve machine learning security and transparency for developers - SDxCentral - March 5th, 2025 [March 5th, 2025]
- Opportunistic access control scheme for enhancing IoT-enabled healthcare security using blockchain and machine learning - Nature.com - March 5th, 2025 [March 5th, 2025]
- AI and Machine Learning Operationalization Software Market Hits New High | Major Giants Google, IBM, Microsoft - openPR - March 5th, 2025 [March 5th, 2025]
- FICO secures new patents in AI and machine learning technologies - Investing.com - March 5th, 2025 [March 5th, 2025]
- Study on landslide hazard risk in Wenzhou based on slope units and machine learning approaches - Nature.com - March 5th, 2025 [March 5th, 2025]
- NVIDIA Is Finding Great Success With Vulkan Machine Learning - Competitive With CUDA - Phoronix - March 3rd, 2025 [March 3rd, 2025]
- MRI radiomics based on machine learning in high-grade gliomas as a promising tool for prediction of CD44 expression and overall survival - Nature.com - March 3rd, 2025 [March 3rd, 2025]
- AI and Machine Learning - Identifying meaningful use cases to fulfil the promise of AI in cities - SmartCitiesWorld - March 3rd, 2025 [March 3rd, 2025]
- Prediction of contrast-associated acute kidney injury with machine-learning in patients undergoing contrast-enhanced computed tomography in emergency... - March 3rd, 2025 [March 3rd, 2025]
- Predicting Ag Harvest using ArcGIS and Machine Learning - Esri - March 1st, 2025 [March 1st, 2025]
- Seeing Through The Hype: The Difference Between AI And Machine Learning In Marketing - AdExchanger - March 1st, 2025 [March 1st, 2025]
- Machine Learning Meets War Termination: Using AI to Explore Peace Scenarios in Ukraine - Center for Strategic & International Studies - March 1st, 2025 [March 1st, 2025]
- Statistical and machine learning analysis of diesel engines fueled with Moringa oleifera biodiesel doped with 1-hexanol and Zr2O3 nanoparticles |... - March 1st, 2025 [March 1st, 2025]
- Spatial analysis of air pollutant exposure and its association with metabolic diseases using machine learning - BMC Public Health - March 1st, 2025 [March 1st, 2025]
- The Evolution of AI in Software Testing: From Machine Learning to Agentic AI - CSRwire.com - March 1st, 2025 [March 1st, 2025]
- Wonder Dynamics Helps Boxel Studio Embrace Machine Learning and AI - Animation World Network - March 1st, 2025 [March 1st, 2025]
- Predicting responsiveness to fixed-dose methylene blue in adult patients with septic shock using interpretable machine learning: a retrospective study... - March 1st, 2025 [March 1st, 2025]
- Workplace Predictions: AI, Machine Learning To Transform Operations In 2025 - Facility Executive Magazine - March 1st, 2025 [March 1st, 2025]
- Development and validation of a machine learning approach for screening new leprosy cases based on the leprosy suspicion questionnaire - Nature.com - March 1st, 2025 [March 1st, 2025]
- Machine learning analysis of gene expression profiles of pyroptosis-related differentially expressed genes in ischemic stroke revealed potential... - March 1st, 2025 [March 1st, 2025]
- Utilization of tree-based machine learning models for predicting low birth weight cases - BMC Pregnancy and Childbirth - March 1st, 2025 [March 1st, 2025]
- Machine learning-based pattern recognition of Bender element signals for predicting sand particle-size - Nature.com - March 1st, 2025 [March 1st, 2025]
- Wearable Tech Uses Machine Learning to Predict Mood Swings - IoT World Today - March 1st, 2025 [March 1st, 2025]
- Machine learning can prevent thermal runaway in EV batteries - Automotive World - March 1st, 2025 [March 1st, 2025]
- Integration of multiple machine learning approaches develops a gene mutation-based classifier for accurate immunotherapy outcomes - Nature.com - March 1st, 2025 [March 1st, 2025]
- Data Analytics Market Size to Surpass USD 483.41 Billion by 2032 Owing to Rising Adoption of AI & Machine Learning Technologies - Yahoo Finance - March 1st, 2025 [March 1st, 2025]
- Predictive AI Only Works If Stakeholders Tune This Dial - The Machine Learning Times - March 1st, 2025 [March 1st, 2025]
- Relationship between atherogenic index of plasma and length of stay in critically ill patients with atherosclerotic cardiovascular disease: a... - March 1st, 2025 [March 1st, 2025]