Explanatory predictive model for COVID-19 severity risk employing … – Nature.com
*The datasets used and/or analyzed during the current study are available from the corresponding author.
We used a casecontrol study for our research. All patients were recruited from Rabats Cheikh Zaid University Center Hospital. COVID-19 hospitalizations occurred between March 6, 2020, and May 20, 2020, and were screened using clinical features (fever, cough, dyspnea, fatigue, headache, chest pain, and pharyngeal discomfort) and epidemiological histology. Any patient admitted to Cheikh Zaid Hospital with a positive PCR-RT for SARS-CoV-2 was considered a COVID-19 case. According to the severity, the cases were divided into two categories: Cases with COVID symptoms and a positive RT-PCR test requiring oxygen therapy are considered severe. Case not requiring oxygen therapy: any case with or without COVID symptoms, normal lung CT with positive RT-PCR. The Controls were selected from Cheikh Zaid Hospital employees (two to three per week) who exhibited no clinical signs of COVID-19 and whose PCR-RT test was negative for the virus. People with chronic illnesses (high blood pressure, diabetes, cancer, and cardiovascular disease) and those who had used platelet-disrupting medications within the previous two weeks (Aspirin, Prasugrel, Clopidogrel, Ticagrelor, Cangrelor, Cilostazol, Dipyridamole, Abciximab, Eptifibatide, Tirofiban, Non-steroidal anti-inflammatory drugs) are excluded from our study (Fig.2).
Consequently, a total of 87 participants were selected for this study and divided as follows: 57 Patients infected with SARS-CoV-2: Thirty without severe COVID-19 symptoms, twenty-seven with severe symptoms requiring hospitalization, and thirty healthy controls. Table1 displays patients basic demographic and clinical information.
The cytokines investigated in our study are displayed in Table2, it consists of two panels, the first one contains 48 cytokines, while the second panel contains only 21 cytokines.
A data imputation procedure was considered for filling in missing values in entries. In fact, 29 individuals in our dataset had a missingness rate of more than 50 percent for their characteristics (cytokines), therefore our analysis will be significantly impacted by missing values. The most prevalent method for dealing with incomplete information is data imputation prior to classification, which entails estimating and filling in the missing values using known data.
There are a variety of imputation approaches, such as mean, k-nearest neighbors, regression, Bayesian estimation, etc. In this article, we apply the iterative imputation strategy Multiple imputation using chained equations Forest (Mice-Forest) to handle the issue of missing data. The reason for this decision is to employ an imputation approach that can handle any sort of input data and makes as few assumptions as possible about the datas structure55.the chained equation process is broken down into four core steps which are repeated until optimal results are achieved56. The first step involves replacing every missing data with the mean of the observed values for the variable. In the second phase, mean imputations are reset to missing. In the third step, the observed values of a variable (such as x) are regressed on the other variables, with x functioning as the dependent variable and the others as the independent variables. As the variables in this investigation are continuous, predictive mean matching (PPM) was applied.
The fourth stage involves replacing the missing data with the regression models predictions. This imputed value would subsequently be included alongside observed values for other variables in the independent variables. An Iteration is the recurrence of steps 2 through 4 for each variable with missing values. After one iteration, all missing values are replaced by regression predictions based on observed data. In the present study, we examined the results of 10 iterations.
The convergence of the regression coefficients is ideally the product of numerous iterations. After each iteration, the imputed values are replaced, and the number of iterations may vary. In the present study, we investigated the outcomes of 10 iterations. This is a single "imputation." Multiple imputations are performed by holding the observed values of all variables constant and just modifying the missing values to their appropriate imputation predictions. Depending on the number of imputations, this leads to the development of multiply imputed datasets (30, in this study). The number of imputations depends on the values that are missing. The selection of 30 imputations was based on the White et al.57 publication. The fraction of missing data was around 30%. We utilized the version 5.4.0 of the miceforest Python library to impute missing data. The values of the experiments hyper-parameters for the Mice-Forest technique are listed in Table3, and Fig.4 illustrates the distribution of each imputation comparing to original data (in red).
The distribution of each imputation compared to the original data (in red).
Machine learning frameworks have demonstrated their ability to deal with complex data structures, producing impressive results in a variety of fields, including health care. However, a large amount of data is required to train these models58. This is particularly challenging in this study because available datasets are limited (87 records and 48 attributes) due to acquisition accessibility and costs, such limited data cannot be used to analyze and develop models.
To solve this problem, Synthetic Data Generation (SDG) is one of the most promising approaches (SDG) and it opens up many opportunities for collaborative research, such as building prediction models and identifying patterns.
Synthetic Data is artificial data generated by a model trained or built to imitate the distributions (i.e., shape and variance) and structure (i.e., correlations among the variables) of actual data59,60. It has been studied for several modalities within healthcare, including biological signals61, medical pictures62, and electronic health records (EHR)63.
In this paper, a VAE network-based approach is suggested to generate 500 samples of synthetic cytokine data from real data. VAEs process consists of providing labeled sample data (X) to the Encoder, which captures the distribution of the deep feature (z), and the Decoder, which generates data from the deep feature (z) (Fig.1).
The VAE architecture preserved each samples probability and matched the column means to the actual data. Figure5 depicts this by plotting the mean of the real data column on the X-axis and the mean of the synthetic data column on the Y-axis.
Each point represents a column mean in the real and synthetic data. A perfect match would be indicated by all the points lying on the line y=x.
The cumulative feature sum is an extra technique for comparing synthetic and real data. The feature sum can be considered as the sum of patient diagnosis values. As shown in Fig.6, a comparison of the global distribution of feature sums reveals a significant similarity between the data distributions of synthetic and real data.
Plots of each feature in our actual dataset demonstrate the similarity between the synthesized and actual datasets.
Five distinct models are trained on synthetic data (Random Forest, XGBoost, Bagging Classifier, Decision Tree, and Gradient boosting Classifier). Real data is used for testing, and three metrics were applied to quantify the performance of fitting: precision, recall, F1 score, and confusion matrix.
As shown in Figs.7, 8, 9, 10 and 11 the performance of the Gradient Boosting Classifier proved to be superior to that of other models, with higher Precision, Recall, and F1 score for each class, and a single misclassification. Consequently, we expect that SHAP and LIMEs interpretation of the Gradient Boosting model for the testing set will reflect accurate and exhaustive information for the cytokines data set.
Matrix confusion and Report Classification of Random Forest.
Matrix confusion and Report Classification of Gradient Boosting.
Matrix confusion and Report Classification of XGB Classifier.
Matrix confusion and Report Classification of Bagging Classifier.
Matrix confusion and Report Classification of Decision Tree.
Explaining a prediction refers to the presentation of written or visual artifacts that enable qualitative knowledge of the relationship between the instances components and the models prediction. We suggest that if the explanations are accurate and understandable, explaining predictions is an essential component of convincing humans to trust and use machine learning effectively43. Figure12 depicts the process of explaining individual predictions using LIME and SHAP as approaches that resemble the classifiers black box to explain individual predictions. When explanations are provided, a doctor is clearly in a much better position to decide using a model. Gradient Boosting predicts whether a patient has an acute case of COVID-19 in our study, whereas LIME and SHAP highlight the cytokines that contributed to this prediction.
The Flow chart demonstrates how Machine learning can be used to make medical decisions. We entered cytokine data from severe, non-severe, and healthy patients, trained predictive models on cytokine data, and then used LIME and SHAP to explain the most important cytokine for each class of patients (Fig.12).
The SHAP explanation utilized in this study is the Kernel Explainer, a model-agnostic approach that produces a weighted linear regression depending on the data, predictions, and model64. It examines the contribution of a feature by evaluating the model output if the feature is removed from the input for various (theoretically all) combinations of features. The Kernel Explainer makes use of a backdrop dataset to demonstrate how missing inputs are defined, i.e., how a missing feature is approximated during the toggling process.
SHAP computes the impact of each characteristic on the learned systems predictions. Using gradient descent, SHAP values are created for a single prediction (local explanations) and multiple samples (resulting in global explanations).
Figure13 illustrates the top 20 SHAP value features for each class in the cytokine data prediction model (Healthy, Severe, and Non-Severe classes). The distribution of SHAP values for each feature is illustrated using a violin diagram. Here, the displayed characteristics are ordered by their highest SHAP value. The horizontal axis represents the SHAP value. The bigger the positive SHAP value, the greater the positive effect of the feature, and vice versa. The color represents the magnitude of a characteristic value. The color shifts from red to blue as the features value increases and decreases. For example, Mip-1b in Figure8, the positive SHAP value increases as the value of the feature increases. This may be interpreted as the probability of a patient developing COVID-19, severity increasing as MIP-1b levels rise.
Examples of SHAP values computed for individuals predictions (local explanations) for Healthy, Non-Sever, and Sever patients.
In the situation of a healthy patient, TNF, IL-22, and IL-27 are the most influential cytokines, as shown in Fig.14s first SHAP diagram (from left). The second diagram is for a patient with severity, and we can observe that the VEGF-A cytokines value is given greater weight. This can be viewed as an indication that the patient got a serious COVID-19 infection due to the increase in this cytokine.
SHAP diagrams of characteristics with varying conditions: Healthy, Severe, and Non-Severe, respectively.
The last SHAP diagram depicts an instance of a non-Severe patient, and we can see that the higher the feature value, the more positive the direction of IL-27. On the other hand, MDC, PDGF-AB/BB, and VEGF-A cytokines have a deleterious effect. The levels of MDC and PDGF-AB/BB cytokines suggest that the patient may be recovering, however, the presence of VEGF-A suggests that the patient may develop a severe case of COVID-19, despite being underweight.
LIME is a graphical approach that helps explain specific predictions. It can be applied to any supervised regression or classification model, as its name suggests. Behind the operation of LIME is the premise that every complex model is linear on a local scale and that it is possible to fit a simple model to a single observation that mimics the behavior of the global model at that locality. LIME operates in our context by sampling the data surrounding a prediction and training a simple interpretable model to approximate the black box of the Gradient Boosting model. The interpretable model is used to explain the predictions of the black-box model in a local region surrounding the prediction by generating explanations regarding the contributions of the features to these predictions. As shown in Fig.15, a bar chart depicts the distribution of LIME values for each feature, indicating the relative importance of each cytokine for predicting Severity in each instance. The order of shown features corresponds to their LIME value.
In the illustrations explaining various LIME predictions presented in Fig.16. We note that the model has a high degree of confidence that the condition of these patients is Severe, Non-Severe, or Healthy. In the graph where the predicted value is 2, indicating that the expected scenario for this patient is Severe (which is right), we can see for this patient that Mip-1b level greater than 41 and VEGF-A level greater than 62 have the greatest influence on severity, increasing it. However, MCP-3 and IL-15 cytokines have a negligible effect in the other direction.
Explaining individual predictions of Gradient descent classifier by LIME.
Alternatively, there are numerous cytokines with significant levels that influence non-Severity. For example, IL-27 and IL-9, as shown in the middle graph in Fig.14. and that IL-12p40 below a certain value may have the opposite effect on model decision-making. RANTES levels less than 519, on the other hand, indicate that the patient is healthy, as shown in Fig.16.
By comparing the individuals explanation of SHAP values to the individuals explanation of LIME values for the same patients, we may be able to determine how these two models differ in explaining the Severity results of the Gradient descent model. As a result, we can validate and gain insight into the impact of the most significant factors. To do so, we begin by calculating the frequency of the top ten features among all patients for each Explainer. We only consider features that appear in the top three positions, as we believe this signifies the features high value, and we only consider the highest-scoring features that appear at least ten times across all SHAP or LIME explanations (Tables 4, 5, and 6).
Table4 demonstrates that MIP-1b, VEGF-A, and IL-17A have Unanimous Importance according to the SHAP Value and LIME. In addition, we can remark that M-CSF is necessary for LIME but is ranks poor.
In the instance of non-Severity, Table5 reveals that IL-27 and IL-9 are essential in both explanatory models for understanding non-Severity in patients. We can see that IL-12p40 and MCP-3 are also essential for LIME and are highly ranked; hence, we add these two characteristics to the list of vital features for the non-Severity instance. RANTES, TNF, IL-9, IL-27, and MIP-1b are the most significant elements in the Healthy scenario, according to Table6.
The elements that explain the severity of the COVID-19 sickness are summarized in Table7.
See the rest here:
Explanatory predictive model for COVID-19 severity risk employing ... - Nature.com
- New machine learning tool reveals atomic structure of ultra-thin film materials - Phys.org - July 28th, 2025 [July 28th, 2025]
- Optimizing base fluid composition for PEMFC cooling: A machine learning approach to balance thermal and rheological performance - Nature - July 28th, 2025 [July 28th, 2025]
- Overview: Machine learning in the medical space - Scientist Live - July 28th, 2025 [July 28th, 2025]
- IMD develops a novel machine-learning-based tool to predict urban rainfall trends in India - Research Matters - July 28th, 2025 [July 28th, 2025]
- Unsupervised System 2 Thinking: The Next Leap in Machine Learning with Energy-Based Transformers - MarkTechPost - July 27th, 2025 [July 27th, 2025]
- A machine learning-based approach to predict depression in Chinese older adults with subjective cognitive decline: a longitudinal study - Nature - July 27th, 2025 [July 27th, 2025]
- Machine Learning Identifies Role of Impaired Purine Metabolism in Gout Pathogenesis - HCPLive - July 27th, 2025 [July 27th, 2025]
- Detection of breast cancer using machine learning and explainable artificial intelligence - Nature - July 27th, 2025 [July 27th, 2025]
- Investigation of key ferroptosis-associated genes and potential therapeutic drugs for asthma based on machine learning and regression models - Nature - July 27th, 2025 [July 27th, 2025]
- Predicting postoperative trauma-induced coagulopathy in patients with severe injuries by machine learning - Nature - July 27th, 2025 [July 27th, 2025]
- Machine learning based multi-stage intrusion detection system and feature selection ensemble security in cloud assisted vehicular ad hoc networks -... - July 27th, 2025 [July 27th, 2025]
- Comparative analysis of machine learning models for malaria detection using validated synthetic data: a cost-sensitive approach with clinical domain... - July 27th, 2025 [July 27th, 2025]
- Statistical modelling and forecasting of HIV and anti-retroviral therapy cases by time-series and machine learning models - Nature - July 27th, 2025 [July 27th, 2025]
- Seeing Through the Rust: How Machine Learning is Improving Corrosion Detection - Research Matters - July 27th, 2025 [July 27th, 2025]
- Machine-Learning Approach to Increase the Potency and Overcome the Hemolytic Toxicity of Gramicidin S - ACS Publications - July 24th, 2025 [July 24th, 2025]
- Machine learning-based academic performance prediction with explainability for enhanced decision-making in educational institutions - Nature - July 24th, 2025 [July 24th, 2025]
- Can External Validation Tools Can Improve Annotation Quality for LLM-as-a-Judge - Apple Machine Learning Research - July 24th, 2025 [July 24th, 2025]
- How to use learning curves to evaluate the sample size for malaria prediction models developed using machine learning algorithms - Malaria Journal - July 24th, 2025 [July 24th, 2025]
- Development and validation of a dynamic early warning system with time-varying machine learning models for predicting hemodynamic instability in... - July 24th, 2025 [July 24th, 2025]
- Early and non-destructive prediction of the differentiation efficiency of human induced pluripotent stem cells using imaging and machine learning -... - July 24th, 2025 [July 24th, 2025]
- Algorithmica Reports 35% Return in First Fiscal Year, Driven by Machine Learning Trading Technology - PR Newswire - July 24th, 2025 [July 24th, 2025]
- New research using machine learning further links increase in earthquakes, quake intensity, in Raton Basin to wastewater injections - The... - July 24th, 2025 [July 24th, 2025]
- Early modern text transcription revolutionized by ethical machine learning tools - Archaeology News Online Magazine - July 22nd, 2025 [July 22nd, 2025]
- Role of Artificial Intelligence and Machine Learning in Conservative Dentistry and Endodontics: A Review - Cureus - July 22nd, 2025 [July 22nd, 2025]
- NTT Researchers Advance AI and Machine Learning Accuracy, Security and Cost Effectiveness at ICML 2025 - Business Wire - July 22nd, 2025 [July 22nd, 2025]
- Exploring Phase Stability and Transport Properties of Emerging Thermoelectric Materials: Machine Learning and Experimental Insights - ACS Publications - July 22nd, 2025 [July 22nd, 2025]
- Google expands Ad Manager partner guidelines with machine learning restrictions - PPC Land - July 22nd, 2025 [July 22nd, 2025]
- Leveraging Generative AI into Wargaming and Machine Learning to Shape War Termination Scenarios in Ukraine - oodaloop.com - July 22nd, 2025 [July 22nd, 2025]
- Predictive AI Too Hard To Use? GenAI Makes It Easy - Machine Learning Week 2025 - July 22nd, 2025 [July 22nd, 2025]
- Wheat is becoming more climate-resilient through nature-based plant breeding and machine learning - Phys.org - July 22nd, 2025 [July 22nd, 2025]
- Machine learning enhanced ultra-high vacuum system for predicting field emission performance in graphene reinforced aluminium based metal matrix... - July 22nd, 2025 [July 22nd, 2025]
- Machine learning-guided evolution of pyrrolysyl-tRNA synthetase for improved incorporation efficiency of diverse noncanonical amino acids - Nature - July 22nd, 2025 [July 22nd, 2025]
- Dietary intervention optimized using machine learning could lower risk of dementia - Medical Xpress - July 20th, 2025 [July 20th, 2025]
- Application of machine learning algorithms and SHAP explanations to predict fertility preference among reproductive women in Somalia - Nature - July 20th, 2025 [July 20th, 2025]
- From Reactive to Predictive: Forecasting Network Congestion with Machine Learning and INT - Towards Data Science - July 20th, 2025 [July 20th, 2025]
- Artificial intelligence and machine learning in the development of vaccines and immunotherapeuticsyesterday, today, and tomorrow - Frontiers - July 20th, 2025 [July 20th, 2025]
- How Machine Learning is Revolutionizing Threat Detection for Businesses in Real-Time - Eye On Annapolis - July 20th, 2025 [July 20th, 2025]
- Identification of clinical diagnostic and immune cell infiltration characteristics of acute myocardial infarction with machine learning approach -... - July 20th, 2025 [July 20th, 2025]
- Predicting the mechanical performance of industrial waste incorporated sustainable concrete using hybrid machine learning modeling and parametric... - July 20th, 2025 [July 20th, 2025]
- Integrative multi-omics and machine learning reveal critical functions of proliferating cells in prognosis and personalized treatment of lung... - July 20th, 2025 [July 20th, 2025]
- Systematic measurement and machine learning-based profile characterization of community noise in a medium-large city in the United States - Nature - July 20th, 2025 [July 20th, 2025]
- Prediction of birthweight with early and mid-pregnancy antenatal markers utilising machine learning and explainable artificial intelligence - Nature - July 20th, 2025 [July 20th, 2025]
- A comprehensive machine learning for high throughput Tuberculosis sequence analysis, functional annotation, and visualization - Nature - July 20th, 2025 [July 20th, 2025]
- AI and Machine Learning Skills Are Make or Break for Developers: 71% of Tech Leaders Wont Hire Without Them - The National Law Review - July 20th, 2025 [July 20th, 2025]
- Quality-of-life scale machine learning approach to predict immunotherapy response in patients with advanced non-small cell lung cancer - Frontiers - July 20th, 2025 [July 20th, 2025]
- Inversion and validation of soil water-holding capacity in a wild fruit forest, using hyperspectral technology combined with machine learning - Nature - July 20th, 2025 [July 20th, 2025]
- Machine Learning in Drug Discovery Market to Witness Exponential Growth: Key Players, $250M Eli Lilly Deal & Regional Insights for 2025-2034 -... - July 18th, 2025 [July 18th, 2025]
- Automated seafood freshness detection and preservation analysis using machine learning and paper-based pH sensors - Nature - July 18th, 2025 [July 18th, 2025]
- Do You Know What It Means To Train a Machine Learning Model? - LSU - July 18th, 2025 [July 18th, 2025]
- Establishment of an interpretable MRI radiomics-based machine learning model capable of predicting axillary lymph node metastasis in invasive breast... - July 18th, 2025 [July 18th, 2025]
- A Machine Learning-Reconstructed Dataset of River Discharge, Temperature, and Heat Flux into the Arctic Ocean - Nature - July 18th, 2025 [July 18th, 2025]
- Leveraging computational linguistics and machine learning for detection of ultra-high risk of mental health disorders in youths | Schizophrenia -... - July 18th, 2025 [July 18th, 2025]
- Development and validation of machine learning-based diagnostic models using blood transcriptomics for early childhood diabetes prediction - Frontiers - July 18th, 2025 [July 18th, 2025]
- Fatigue and stamina prediction of athletic person on track using thermal facial biomarkers and optimized machine learning algorithm - Nature - July 18th, 2025 [July 18th, 2025]
- Identifying the crucial oncogenic mechanisms of DDX56 based on a machine learning-based integration model of RNA-binding proteins - Nature - July 18th, 2025 [July 18th, 2025]
- AI and Machine Learning Skills Are Make or Break for Developers: 71% of Tech Leaders Wont Hire Without Them - Yahoo Finance - July 18th, 2025 [July 18th, 2025]
- Developing an explainable machine learning and fog computing-based visual rating scale for the prediction of dementia progression - Nature - July 18th, 2025 [July 18th, 2025]
- Prognosis of air quality index and air pollution using machine learning techniques - Nature - July 18th, 2025 [July 18th, 2025]
- Integrating vision transformer-based deep learning model with kernel extreme learning machine for non-invasive diagnosis of neonatal jaundice using... - July 18th, 2025 [July 18th, 2025]
- PlayStation 6 Likely to Feature 24 GB RAM for Advanced Ray Tracing and Machine Learning Without Raising Costs - Wccftech - July 18th, 2025 [July 18th, 2025]
- Machine Learning-Assisted Iterative Screening for Efficient Detection of Drug Discovery Starting Points - ACS Publications - July 16th, 2025 [July 16th, 2025]
- 2025 IT Camp on AI & Machine Learning for Beginners to be held August 5 - Southeastern Oklahoma State University - July 16th, 2025 [July 16th, 2025]
- Utilizing machine learning to predict MRI signal outputs from iron oxide nanoparticles through the PSLG algorithm - Nature - July 16th, 2025 [July 16th, 2025]
- Developing a machine-learning model to enable treatment selection for neoadjuvant chemotherapy for esophageal cancer - Nature - July 16th, 2025 [July 16th, 2025]
- Advancing crop recommendation system with supervised machine learning and explainable artificial intelligence - Nature - July 16th, 2025 [July 16th, 2025]
- Predicting clozapine-induced adverse drug reaction biomarkers using machine learning - Nature - July 16th, 2025 [July 16th, 2025]
- Postoperative complication severity prediction in penile prosthesis implantation: a machine learning-based predictive modeling study - Nature - July 16th, 2025 [July 16th, 2025]
- The Future of AI & Machine Learning: Perspective on Shaping Tomorrows Business Landscape - Vocal - July 16th, 2025 [July 16th, 2025]
- Machine Learning: Your Ticket to a Thriving Career in the Tech World - The Impressive Times - July 14th, 2025 [July 14th, 2025]
- Integrative analysis of multi-omics data and gut microbiota composition reveals prognostic subtypes and predicts immunotherapy response in colorectal... - July 14th, 2025 [July 14th, 2025]
- Comprehensive multi-omics and machine learning framework for glioma subtyping and precision therapeutics - Nature - July 14th, 2025 [July 14th, 2025]
- Development and validation of a machine learning-based nomogram for survival prediction of patients with hilar cholangiocarcinoma after... - July 12th, 2025 [July 12th, 2025]
- Geochemical-integrated machine learning approach predicts the distribution of cadmium speciation in European and Chinese topsoils - Nature - July 12th, 2025 [July 12th, 2025]
- Machine learning-based construction of a programmed cell death-related model reveals prognosis and immune infiltration in pancreatic adenocarcinoma... - July 12th, 2025 [July 12th, 2025]
- Application of supervised machine learning and unsupervised data compression models for pore pressure prediction employing drilling, petrophysical,... - July 12th, 2025 [July 12th, 2025]
- Machine learning identifies lipid-associated genes and constructs diagnostic and prognostic models for idiopathic pulmonary fibrosis - Orphanet... - July 12th, 2025 [July 12th, 2025]
- An evaluation methodology for machine learning-based tandem mass spectra similarity prediction - BMC Bioinformatics - July 12th, 2025 [July 12th, 2025]
- The Rise of AI in Trading: Machine Learning and the Stock Market - Disruption Banking - July 12th, 2025 [July 12th, 2025]
- Integrative analysis identifies IL-6/JUN/MMP-9 pathway destroyed blood-brain-barrier in autism mice via machine learning and bioinformatic analysis -... - July 12th, 2025 [July 12th, 2025]
- Interpretive prediction of hyperuricemia and gout patients via machine learning analysis of human gut microbiome - BMC Microbiology - July 10th, 2025 [July 10th, 2025]