Development and validation of simplified machine learning algorithms to predict prognosis of hospitalized COVID-19 patients: a multi-center,…

J Med Internet Res. 2021 Dec 19. doi: 10.2196/31549. Online ahead of print.

ABSTRACT

BACKGROUND: The current COVID-19 pandemic is unprecedented; under resource-constrained setting, predictive algorithms can help to stratify disease severity, alerting physicians of high-risk patients, however there are few risk scores derived from a substantially large EHR dataset, using simplified predictors as input.

OBJECTIVE: To develop and validate simplified machine learning algorithms which predicts COVID-19 adverse outcomes, to evaluate the AUC (area under the receiver operating characteristic curve), sensitivity, specificity and calibration of the algorithms, to derive clinically meaningful thresholds.

METHODS: We conducted machine learning model development and validation via cohort study using multi-center, patient-level, longitudinal electronic health records (EHR) from Optum COVID-19 database which provides anonymized, longitudinal EHR from across US. The models were developed based on clinical characteristics to predict 28-day in-hospital mortality, ICU admission, respiratory failure, mechanical ventilator usages at inpatient setting. Data from patients who were admitted from Feb 1, 2020 to Sep 7, 2020, is randomly sampled into development, validation and test datasets; data collected from Sep 7, 2020 through Nov 15, 2020 was reserved as post-development prospective test dataset.

RESULTS: Of 3.7M patients in the analysis, a total of 585,867 patients were diagnosed or tested positive for SARS-CoV-2; and 50,703 adult patients were hospitalized with COVID-19 between Feb 1 and Nov 15, 2020. Among the study cohort (N=50,703), there were 6,204 deaths, 9,564 ICU admissions, 6,478 mechanically ventilated or EMCO patients and 25,169 patients developed ARDS or respiratory failure within 28 days since hospital admission. The algorithms demonstrated high accuracy (AUC = 0.89 (0.89 0.89) on test dataset (N=10,752)), consistent prediction through the second wave of pandemic from September to November (AUC = 0.85 (0.85 0.86) on post-development prospective test dataset (N= 14,863)), great clinical relevance and utility. Besides, a comprehensive 386 input covariates from baseline or at admission were included in the analysis; the end-to-end pipeline automates feature selection and model development. The parsimonious model with only 10 input predictors produced comparably accurate predictions; the ten predictors (age, BUN, SpO2, blood pressures, respiration rate, pulse, temperature, albumin, cognitive disorder) are both commonly measured and concordant with recognized risk factors for COVID-19.

CONCLUSIONS: The systematic approach and rigorous validations demonstrate consistent model performance to predict even beyond the time period of data collection, with satisfactory discriminatory power and great clinical utility. Overall, the study offers an accurate, validated, and reliable prediction model based on only ten clinical features as a prognostic tool to stratifying COVID-19 patients into intermediate, high and very high-risk groups. This simple predictive tool could be shared with a wider healthcare community, to enable service as an early warning system to alert physicians of possible high-risk patients, or as a resource triaging tool to optimize healthcare resources.

CLINICALTRIAL: Not applicable.

PMID:34951865 | DOI:10.2196/31549

See the original post:
Development and validation of simplified machine learning algorithms to predict prognosis of hospitalized COVID-19 patients: a multi-center,...

Related Posts

Comments are closed.