Archive for the ‘Machine Learning’ Category

Machine learning based prediction for oncologic outcomes of renal … – Nature.com

Using the original KORCC database9, two recent studies have been reported28,29. At first, Byun et al.28 assessed the prognosis of non-metastatic clear cell RCC using a deep learning-based survival predictions model. Harrels C-indices of DeepSurv for recurrence and cancer-specific survival were 0.802 and 0.834, respectively. More recently, Kim et al.29 developed ML-based algorithm predicting the probability of recurrence at 5 and 10years after surgery. The highest area under the receiver operating characteristic curve (AUROC) was obtained from the nave Bayes (NB) model, with values of 0.836 and 0.784 at 5 and 10years, respectively.

In the current study, we used the updated KORCC database. It now contains clinical data of more than 10,000 patients. To the best of our knowledge, this is the largest dataset in Asian population with RCC. With this dataset, we could develop much more accurate models with very high accuracy (range, 0.770.94) and F1-score (range, 0.770.97, Table 3). The accuracy values were relatively high compared to the previous models, including the Kattan nomogram, Leibovich model, the GRANT score, which were around 0.75,6,7,8. Among them, the Kattan nomogram was developed using a cohort of 601 patients with clinically localized RCC, and the overall C-index was 74%5. In a subsequent analysis with the same patient group using an additional prognostic variables including tumor necrosis, vascular invasion, and tumor grade, the C-index was as high as 82%30. Their prediction accuracies were not as high as ours yet.

In addition, we could include short-term (3-year) recurrence and survival data, which would be helpful for developing more sophisticated surveillance strategy. The other strength of current study was that most algorithms introduced so far had been applied18,19,20,21,22,23,24,25,26, showing relatively consistent performance with high accuracy. Finally, we also performed an external validation by using a separate (SNUBH) cohort, and achieved well maintained high accuracy and F1-score in both recurrence and survival (Fig.2). External validation of prediction models is essential, especially in case of using the multi-institutional dataset, to ensure and correct for differences between institutions.

AUROC has been mostly used as the standard evaluating performance of prediction models5,6,7,8,29. However, AUROC weighs changes in sensitivity and specificity equally without considering clinically meaningful information6. In addition, the lack of ability to compare performance of different ML models is another limitation of AUROC technique31. Thus, we adopted accuracy and F1-score instead of AUROC as evaluation metrics. F1-score, in addition to SMOTE17, is used as better accuracy metrics to solve the imbalanced data problems27.

RCC is not a single disease, but multiple histologically defined cancers with different genetic characteristics, clinical courses, and therapeutic responses32. With regard to metastatic RCC, the International Metastatic Renal Cell Carcinoma Database Consortium and the Memorial Sloan Kettering Cancer Center risk model have been extensively validated and widely used to predict survival outcomes of patients receiving systemic therapy33,34. However, both risk models had been developed without considering histologic subtypes. Thus, the predictive performance was presumed to have been strongly affected by clear cell type (predominant histologic subtype) RCC. Interestingly, in our previous study using the Korean metastatic RCC registry, we found the both risk models reliably predicted progression and survival even in non-clear cell type RCC35. In the current study, after performing subgroup analysis according to the histologic type (clear vs. non-clear cell type RCC), we also found very high accuracy and F1-score in all tested metrics (Supplemental Tables 3 and 4). Taking together, these findings suggest that the prognostic difference between clear and non-clear cell type RCC seems to be offset both in metastatic and non-metastatic RCC. Further effort is needed to develop and validate a sophisticated prediction model for individual subtypes of non-clear cell type RCC.

The current study had several limitations. First, due to the paucity of long-term follow-up cases at 10years, data imbalance problem could not be avoided. Subsequently, recurrence-free rate at 10-year was reported only to be 45.3%. In the majority of patients, further long-term follow up had not been performed in case of no evidence of disease at five years. However, we adopted both SMOTE and F1-score to solve these imbalanced data problems. The retrospective design of this study was also an inherent limitation. Another limitation was that the developed prediction model only included the Korean population. Validation of the model using data from other countries and races is also needed. In regard of non-clear cell type RCC, the current study cohort is still relatively small due to the rarity of the disease, we could not avoid integrating each subtype and analyzing together. Thus, further studies is still needed to develop and validate a prediction model for each subtypes. In addition, the lack of more accurate classifiers such as cross-validation and bootstrapping is another limitation of current study. Finally, the web-embedded deployment of model should be followed to improve accessibility and transportability.

Originally posted here:
Machine learning based prediction for oncologic outcomes of renal ... - Nature.com

Students Use Machine Learning in Lesson Designed to Reveal … – NC State News

In a new study, North Carolina State University researchers had 28 high school students create their own machine-learning artificial intelligence (AI) models for analyzing data. The goals of the project were to help students explore the challenges, limitations and promise of AI, and to ensure a future workforce is prepared to make use of AI tools.

The study was conducted in conjunction with a high school journalism class in the Northeast. Since then, researchers have expanded the program to high school classrooms in multiple states, including North Carolina. NCState researchers are looking to partner with additional schools to collaborate in bringing the curriculum into classrooms.

We want students, from a very young age, to open up that black box so they arent afraid of AI, said the studys lead author Shiyan Jiang, assistant professor of learning design and technology at NCState. We want students to know the potential and challenges of AI, and so they think about how they, the next generation, can respond to the evolving role of AI and society. We want to prepare students for the future workforce.

For the study, researchers developed a computer program called StoryQ that allows students to build their own machine-learning models. Then, researchers hosted a teacher workshop about the machine learning curriculum and technology in one-and-a-half hour sessions each week for a month. For teachers who signed up to participate further, researchers did another recap of the curriculum for participating teachers, and worked out logistics.

We created the StoryQ technology to allow students in high school or undergraduate classrooms to build what we call text classification models, Jiang said. We wanted to lower the barriers so students can really know whats going on in machine-learning, instead of struggling with the coding. So we created StoryQ, a tool that allows students to understand the nuances in building machine-learning and text classification models.

A teacher who decided to participate led a journalism class through a 15-day lesson where they used StoryQ to evaluate a series of Yelp reviews about ice cream stores. Students developed models to predict if reviews were positive or negative based on the language.

The teacher saw the relevance of the program to journalism, Jiang said. This was a very diverse class with many students who are under-represented in STEM and in computing. Overall, we found students enjoyed the lessons a lot, and had great discussions about the use and mechanism of machine-learning.

Researchers saw that students made hypotheses about specific words in the Yelp reviews, which they thought would predict if a review would be positive, or negative. For example, they expected reviews containing the word like to be positive. Then, the teacher guided the students to analyze whether their models correctly classified reviews. For example, a student who used the word like to predict reviews found that more than half of reviews containing the word were actually negative. Then, researchers said students used trial and error to try to improve the accuracy of their models.

Students learned how these models make decisions, and the role that humans can play in creating these technologies, and the kind of perspectives that can be brought in when they create AI technology, Jiang said.

From their discussions, researchers found that students had mixed reactions to AI technologies. Students were deeply concerned, for example, about the potential to use AI to automate processes for selecting students or candidates for opportunities like scholarships or programs.

For future classes, researchers created a shorter, five-hour program. Theyve launched the program in two high schools in North Carolina, as well as schools in Georgia, Maryland and Massachusetts. In the next phase of their research, they are looking to study how teachers across disciplines collaborate to launch an AI-focused program and create a community of AI learning.

We want to expand the implementation in North Carolina, Jiang said. If there are any schools interested, we are always ready to bring this program to a school. Since we know teachers are super busy, were offering a shorter professional development course, and we also provide a stipend for teachers. We will go into the classroom to teach if needed, or demonstrate how we would teach the curriculum so teachers can replicate, adapt, and revise it. We will support teachers in all the ways we can.

The study, High school students data modeling practices and processes: From modeling unstructured data to evaluating automated decisions, was published online March 13 in the journal Learning, Media and Technology. Co-authors included Hengtao Tang, Cansu Tatar, Carolyn P. Ros and Jie Chao. The work was supported by the National Science Foundation under grant number 1949110.

-oleniacz-

Note to Editors: The study abstract follows.

High school students data modeling practices and processes: From modeling unstructured data to evaluating automated decisions

Authors: Shiyan Jiang, Hengtao Tang, Cansu Tatar, Carolyn P. Ros and Jie Chao.

Published: March 13, 2023, Learning, Media and Technology

DOI: 10.1080/17439884.2023.2189735

Abstract: Its critical to foster artificial intelligence (AI) literacy for high school students, the first generation to grow up surrounded by AI, to understand working mechanism of data-driven AI technologies and critically evaluate automated decisions from predictive models. While efforts have been made to engage youth in understanding AI through developing machine learning models, few provided in-depth insights into the nuanced learning processes. In this study, we examined high school students data modeling practices and processes. Twenty-eight students developed machine learning models with text data for classifying negative and positive reviews of ice cream stores. We identified nine data modeling practices that describe students processes of model exploration, development, and testing and two themes about evaluating automated decisions from data technologies. The results provide implications for designing accessible data modeling experiences for students to understand data justice as well as the role and responsibility of data modelers in creating AI technologies.

Read more here:
Students Use Machine Learning in Lesson Designed to Reveal ... - NC State News

Exploring the Possibilities of IoT-Enabled Quantum Machine Learning – CIOReview

With quantum machine learning, the internet of things can become even more powerful, enabling people to create more efficient and safer systems.

FREMONT, CA: The Internet of Things (IoT) is altering how people interact with their surrounding environment. From intelligent homes to autonomous vehicles, the possibilities are limitless. Researchers are investigating the possibility of merging IoT with quantum machine learning (QML) to create even more powerful and efficient systems.

QML is an artificial intelligence (AI) that processes data using quantum computing. It offers the ability to provide quicker and more precise decision-making than conventional AI. Researchers hope to create a potent new data analysis and prediction tool by merging it with the IoT.

QML and IoT could be combined to create smarter, more efficient systems for various applications. For instance, it might optimize city traffic flow by forecasting traffic patterns and modifying traffic light timing accordingly. It could also be utilized to optimize building energy consumption and monitor and predict disease spread

IoT facilitates the huge potential of QML enabled by IoT. It could transform how people interact with the environment around them and create new opportunities for data analysis and forecasting. As researchers continue to investigate the possibilities, it is evident that this technology can alter the way of life.

Using the IoT to Advance QML

The IoT is altering how people interact with their surrounding environment. IoT technology's potential applications appear limitless, from intelligent homes to self-driving vehicles. Now, scientists are investigating how IoT can transform QML.

QML is a fast-developing research topic that blends quantum computing capabilities with machine learning methods. QML can enable robots to learn more effectively and precisely than ever before by harnessing the potential of quantum computing.

The IoT is ideally suited to supporting QML applications. IoT devices can collect and communicate vast quantities of data, which can be utilized to train and optimize machine learning algorithms. In addition, IoT devices can be used to monitor and control the environment in which QML algorithms are deployed, ensuring that they operate under optimal conditions.

Also, researchers are investigating how IoT devices might be leveraged to enhance the security of QML applications. IoT devices can identify and prevent harmful attacks on QML systems by harnessing the power of distributed networks. IoT devices can also be used to monitor the performance of QML algorithms, enabling the immediate identification and resolution of any problems.

The potential uses of the IoT for QML are vast, and researchers are just beginning to investigate them. By leveraging the power of the IoT, researchers are paving the way for a new era of QML that might transform how people interact with the world.

Visit link:
Exploring the Possibilities of IoT-Enabled Quantum Machine Learning - CIOReview

New study shows the potential of machine learning in the early … – Swansea University

A study by Swansea University has revealed how machine learning can help with the early detection of Ankylosing Spondylitis (AS) inflammatory arthritis and revolutionise how people are detected and diagnosed by their GPs.

Published in the open-access journal PLOS ONE, the study, funded by UCB Pharma and Health and Care Research Wales, has been carried out by data analysts and researchers from the National Centre for Population Health & Wellbeing Research (NCPHWR).

The team used machine learning methods to develop a profile of the characteristics of people likely to be diagnosed with AS, the second most common cause of inflammatory arthritis.

Machine learning, a type of artificial intelligence, is a method of data analysis that automates model building to improve performance and accuracy. Its algorithms build a model based on sample data to make predictions or decisions without being explicitly programmed to do so.

Using the Secure Anonymised Information Linkage (SAIL) Databank based atSwansea University Medical School, a national data repository allowing anonymised person-based data linkage across datasets, patients with AS were identified and matched with those with no record of a condition diagnosis.

The data was analysed separately for men and women, with a model developed using feature/variable selection and principal component analysis to build decision trees.

The findings revealed:

Dr Jonathan Kennedy, Data Lab Manager at NCPHWR and study lead:"Our study indicates the enormous potential machine learning has to help identify people with AS and better understand their diagnostic journeys through the health system.

"Early detection and diagnosis are crucial to secure the best outcomes for patients. Machine learning can help with this. In addition, it can empower GPs helping them detect and refer patients more effectively and efficiently.

"However, machine learning is in the early stages of implementation. To develop this, we need more detailed data to improve prediction and clinical utility."

Professor Ernest Choy, Researcher at NCPHWR and Head of Rheumatology and Translational Research at Cardiff University, added:"On average, it takes eight years for patients with AS from having symptoms to receiving a diagnosis and getting treatment. Machine learning may provide a useful tool to reduce this delay."

Professor Kieran Walshe, Director of Health and Care Research Wales, added: Its fantastic to see the cutting-edge role that machine learning can play in the early identification of patients with health conditions such as AS and the work being undertaken at the National Centre for Population Health and Wellbeing Research.

Though it is in its early stages, machine learning clearly has the potential to transform the way that researchers and clinicians approach the diagnostic journey, bringing about benefits to patients and their future health outcomes.

Read the full publication in the PLOS ONE journal.

View original post here:
New study shows the potential of machine learning in the early ... - Swansea University

Explanatory predictive model for COVID-19 severity risk employing … – Nature.com

*The datasets used and/or analyzed during the current study are available from the corresponding author.

We used a casecontrol study for our research. All patients were recruited from Rabats Cheikh Zaid University Center Hospital. COVID-19 hospitalizations occurred between March 6, 2020, and May 20, 2020, and were screened using clinical features (fever, cough, dyspnea, fatigue, headache, chest pain, and pharyngeal discomfort) and epidemiological histology. Any patient admitted to Cheikh Zaid Hospital with a positive PCR-RT for SARS-CoV-2 was considered a COVID-19 case. According to the severity, the cases were divided into two categories: Cases with COVID symptoms and a positive RT-PCR test requiring oxygen therapy are considered severe. Case not requiring oxygen therapy: any case with or without COVID symptoms, normal lung CT with positive RT-PCR. The Controls were selected from Cheikh Zaid Hospital employees (two to three per week) who exhibited no clinical signs of COVID-19 and whose PCR-RT test was negative for the virus. People with chronic illnesses (high blood pressure, diabetes, cancer, and cardiovascular disease) and those who had used platelet-disrupting medications within the previous two weeks (Aspirin, Prasugrel, Clopidogrel, Ticagrelor, Cangrelor, Cilostazol, Dipyridamole, Abciximab, Eptifibatide, Tirofiban, Non-steroidal anti-inflammatory drugs) are excluded from our study (Fig.2).

Consequently, a total of 87 participants were selected for this study and divided as follows: 57 Patients infected with SARS-CoV-2: Thirty without severe COVID-19 symptoms, twenty-seven with severe symptoms requiring hospitalization, and thirty healthy controls. Table1 displays patients basic demographic and clinical information.

The cytokines investigated in our study are displayed in Table2, it consists of two panels, the first one contains 48 cytokines, while the second panel contains only 21 cytokines.

A data imputation procedure was considered for filling in missing values in entries. In fact, 29 individuals in our dataset had a missingness rate of more than 50 percent for their characteristics (cytokines), therefore our analysis will be significantly impacted by missing values. The most prevalent method for dealing with incomplete information is data imputation prior to classification, which entails estimating and filling in the missing values using known data.

There are a variety of imputation approaches, such as mean, k-nearest neighbors, regression, Bayesian estimation, etc. In this article, we apply the iterative imputation strategy Multiple imputation using chained equations Forest (Mice-Forest) to handle the issue of missing data. The reason for this decision is to employ an imputation approach that can handle any sort of input data and makes as few assumptions as possible about the datas structure55.the chained equation process is broken down into four core steps which are repeated until optimal results are achieved56. The first step involves replacing every missing data with the mean of the observed values for the variable. In the second phase, mean imputations are reset to missing. In the third step, the observed values of a variable (such as x) are regressed on the other variables, with x functioning as the dependent variable and the others as the independent variables. As the variables in this investigation are continuous, predictive mean matching (PPM) was applied.

The fourth stage involves replacing the missing data with the regression models predictions. This imputed value would subsequently be included alongside observed values for other variables in the independent variables. An Iteration is the recurrence of steps 2 through 4 for each variable with missing values. After one iteration, all missing values are replaced by regression predictions based on observed data. In the present study, we examined the results of 10 iterations.

The convergence of the regression coefficients is ideally the product of numerous iterations. After each iteration, the imputed values are replaced, and the number of iterations may vary. In the present study, we investigated the outcomes of 10 iterations. This is a single "imputation." Multiple imputations are performed by holding the observed values of all variables constant and just modifying the missing values to their appropriate imputation predictions. Depending on the number of imputations, this leads to the development of multiply imputed datasets (30, in this study). The number of imputations depends on the values that are missing. The selection of 30 imputations was based on the White et al.57 publication. The fraction of missing data was around 30%. We utilized the version 5.4.0 of the miceforest Python library to impute missing data. The values of the experiments hyper-parameters for the Mice-Forest technique are listed in Table3, and Fig.4 illustrates the distribution of each imputation comparing to original data (in red).

The distribution of each imputation compared to the original data (in red).

Machine learning frameworks have demonstrated their ability to deal with complex data structures, producing impressive results in a variety of fields, including health care. However, a large amount of data is required to train these models58. This is particularly challenging in this study because available datasets are limited (87 records and 48 attributes) due to acquisition accessibility and costs, such limited data cannot be used to analyze and develop models.

To solve this problem, Synthetic Data Generation (SDG) is one of the most promising approaches (SDG) and it opens up many opportunities for collaborative research, such as building prediction models and identifying patterns.

Synthetic Data is artificial data generated by a model trained or built to imitate the distributions (i.e., shape and variance) and structure (i.e., correlations among the variables) of actual data59,60. It has been studied for several modalities within healthcare, including biological signals61, medical pictures62, and electronic health records (EHR)63.

In this paper, a VAE network-based approach is suggested to generate 500 samples of synthetic cytokine data from real data. VAEs process consists of providing labeled sample data (X) to the Encoder, which captures the distribution of the deep feature (z), and the Decoder, which generates data from the deep feature (z) (Fig.1).

The VAE architecture preserved each samples probability and matched the column means to the actual data. Figure5 depicts this by plotting the mean of the real data column on the X-axis and the mean of the synthetic data column on the Y-axis.

Each point represents a column mean in the real and synthetic data. A perfect match would be indicated by all the points lying on the line y=x.

The cumulative feature sum is an extra technique for comparing synthetic and real data. The feature sum can be considered as the sum of patient diagnosis values. As shown in Fig.6, a comparison of the global distribution of feature sums reveals a significant similarity between the data distributions of synthetic and real data.

Plots of each feature in our actual dataset demonstrate the similarity between the synthesized and actual datasets.

Five distinct models are trained on synthetic data (Random Forest, XGBoost, Bagging Classifier, Decision Tree, and Gradient boosting Classifier). Real data is used for testing, and three metrics were applied to quantify the performance of fitting: precision, recall, F1 score, and confusion matrix.

As shown in Figs.7, 8, 9, 10 and 11 the performance of the Gradient Boosting Classifier proved to be superior to that of other models, with higher Precision, Recall, and F1 score for each class, and a single misclassification. Consequently, we expect that SHAP and LIMEs interpretation of the Gradient Boosting model for the testing set will reflect accurate and exhaustive information for the cytokines data set.

Matrix confusion and Report Classification of Random Forest.

Matrix confusion and Report Classification of Gradient Boosting.

Matrix confusion and Report Classification of XGB Classifier.

Matrix confusion and Report Classification of Bagging Classifier.

Matrix confusion and Report Classification of Decision Tree.

Explaining a prediction refers to the presentation of written or visual artifacts that enable qualitative knowledge of the relationship between the instances components and the models prediction. We suggest that if the explanations are accurate and understandable, explaining predictions is an essential component of convincing humans to trust and use machine learning effectively43. Figure12 depicts the process of explaining individual predictions using LIME and SHAP as approaches that resemble the classifiers black box to explain individual predictions. When explanations are provided, a doctor is clearly in a much better position to decide using a model. Gradient Boosting predicts whether a patient has an acute case of COVID-19 in our study, whereas LIME and SHAP highlight the cytokines that contributed to this prediction.

The Flow chart demonstrates how Machine learning can be used to make medical decisions. We entered cytokine data from severe, non-severe, and healthy patients, trained predictive models on cytokine data, and then used LIME and SHAP to explain the most important cytokine for each class of patients (Fig.12).

The SHAP explanation utilized in this study is the Kernel Explainer, a model-agnostic approach that produces a weighted linear regression depending on the data, predictions, and model64. It examines the contribution of a feature by evaluating the model output if the feature is removed from the input for various (theoretically all) combinations of features. The Kernel Explainer makes use of a backdrop dataset to demonstrate how missing inputs are defined, i.e., how a missing feature is approximated during the toggling process.

SHAP computes the impact of each characteristic on the learned systems predictions. Using gradient descent, SHAP values are created for a single prediction (local explanations) and multiple samples (resulting in global explanations).

Figure13 illustrates the top 20 SHAP value features for each class in the cytokine data prediction model (Healthy, Severe, and Non-Severe classes). The distribution of SHAP values for each feature is illustrated using a violin diagram. Here, the displayed characteristics are ordered by their highest SHAP value. The horizontal axis represents the SHAP value. The bigger the positive SHAP value, the greater the positive effect of the feature, and vice versa. The color represents the magnitude of a characteristic value. The color shifts from red to blue as the features value increases and decreases. For example, Mip-1b in Figure8, the positive SHAP value increases as the value of the feature increases. This may be interpreted as the probability of a patient developing COVID-19, severity increasing as MIP-1b levels rise.

Examples of SHAP values computed for individuals predictions (local explanations) for Healthy, Non-Sever, and Sever patients.

In the situation of a healthy patient, TNF, IL-22, and IL-27 are the most influential cytokines, as shown in Fig.14s first SHAP diagram (from left). The second diagram is for a patient with severity, and we can observe that the VEGF-A cytokines value is given greater weight. This can be viewed as an indication that the patient got a serious COVID-19 infection due to the increase in this cytokine.

SHAP diagrams of characteristics with varying conditions: Healthy, Severe, and Non-Severe, respectively.

The last SHAP diagram depicts an instance of a non-Severe patient, and we can see that the higher the feature value, the more positive the direction of IL-27. On the other hand, MDC, PDGF-AB/BB, and VEGF-A cytokines have a deleterious effect. The levels of MDC and PDGF-AB/BB cytokines suggest that the patient may be recovering, however, the presence of VEGF-A suggests that the patient may develop a severe case of COVID-19, despite being underweight.

LIME is a graphical approach that helps explain specific predictions. It can be applied to any supervised regression or classification model, as its name suggests. Behind the operation of LIME is the premise that every complex model is linear on a local scale and that it is possible to fit a simple model to a single observation that mimics the behavior of the global model at that locality. LIME operates in our context by sampling the data surrounding a prediction and training a simple interpretable model to approximate the black box of the Gradient Boosting model. The interpretable model is used to explain the predictions of the black-box model in a local region surrounding the prediction by generating explanations regarding the contributions of the features to these predictions. As shown in Fig.15, a bar chart depicts the distribution of LIME values for each feature, indicating the relative importance of each cytokine for predicting Severity in each instance. The order of shown features corresponds to their LIME value.

In the illustrations explaining various LIME predictions presented in Fig.16. We note that the model has a high degree of confidence that the condition of these patients is Severe, Non-Severe, or Healthy. In the graph where the predicted value is 2, indicating that the expected scenario for this patient is Severe (which is right), we can see for this patient that Mip-1b level greater than 41 and VEGF-A level greater than 62 have the greatest influence on severity, increasing it. However, MCP-3 and IL-15 cytokines have a negligible effect in the other direction.

Explaining individual predictions of Gradient descent classifier by LIME.

Alternatively, there are numerous cytokines with significant levels that influence non-Severity. For example, IL-27 and IL-9, as shown in the middle graph in Fig.14. and that IL-12p40 below a certain value may have the opposite effect on model decision-making. RANTES levels less than 519, on the other hand, indicate that the patient is healthy, as shown in Fig.16.

By comparing the individuals explanation of SHAP values to the individuals explanation of LIME values for the same patients, we may be able to determine how these two models differ in explaining the Severity results of the Gradient descent model. As a result, we can validate and gain insight into the impact of the most significant factors. To do so, we begin by calculating the frequency of the top ten features among all patients for each Explainer. We only consider features that appear in the top three positions, as we believe this signifies the features high value, and we only consider the highest-scoring features that appear at least ten times across all SHAP or LIME explanations (Tables 4, 5, and 6).

Table4 demonstrates that MIP-1b, VEGF-A, and IL-17A have Unanimous Importance according to the SHAP Value and LIME. In addition, we can remark that M-CSF is necessary for LIME but is ranks poor.

In the instance of non-Severity, Table5 reveals that IL-27 and IL-9 are essential in both explanatory models for understanding non-Severity in patients. We can see that IL-12p40 and MCP-3 are also essential for LIME and are highly ranked; hence, we add these two characteristics to the list of vital features for the non-Severity instance. RANTES, TNF, IL-9, IL-27, and MIP-1b are the most significant elements in the Healthy scenario, according to Table6.

The elements that explain the severity of the COVID-19 sickness are summarized in Table7.

See the rest here:
Explanatory predictive model for COVID-19 severity risk employing ... - Nature.com