Archive for the ‘Machine Learning’ Category

Slack has been using data from your chats to train its machine learning models – Engadget

Slack trains machine-learning models on user messages, files and other content without explicit permission. The training is opt-out, meaning your private data will be leeched by default. Making matters worse, youll have to ask your organizations Slack admin (human resources, IT, etc.) to email the company to ask it to stop. (You cant do it yourself.) Welcome to the dark side of the new AI training data gold rush.

Corey Quinn, an executive at DuckBill Group, spotted the policy in a blurb in Slacks Privacy Principles and posted about it on X (via PCMag). The section reads (emphasis ours), To develop AI/ML models, our systems analyze Customer Data (e.g. messages, content, and files) submitted to Slack as well as Other Information (including usage information) as defined in our Privacy Policy and in your customer agreement.

In response to concerns over the practice, Slack published a blog post on Friday evening to clarify how its customers data is used. According to the company, customer data is not used to train any of Slacks generative AI products which it relies on third-party LLMs for but is fed to its machine learning models for products like channel and emoji recommendations and search results. For those applications, the post says, Slacks traditional ML models use de-identified, aggregate data and do not access message content in DMs, private channels, or public channels. That data may include things like message timestamps and the number of interactions between users.

A Salesforce spokesperson reiterated this in a statement to Engadget, also saying that we do not build or train these models in such a way that they could learn, memorize, or be able to reproduce customer data.

I'm sorry Slack, you're doing fucking WHAT with user DMs, messages, files, etc? I'm positive I'm not reading this correctly. pic.twitter.com/6ORZNS2RxC

Corey Quinn (@QuinnyPig) May 16, 2024

The opt-out process requires you to do all the work to protect your data. According to the privacy notice, To opt out, please have your Org or Workspace Owners or Primary Owner contact our Customer Experience team at feedback@slack.com with your Workspace/Org URL and the subject line Slack Global model opt-out request. We will process your request and respond once the opt out has been completed.

The company replied to Quinns message on X: To clarify, Slack has platform-level machine-learning models for things like channel and emoji recommendations and search results. And yes, customers can exclude their data from helping train those (non-generative) ML models.

How long ago the Salesforce-owned company snuck the tidbit into its terms is unclear. Its misleading, at best, to say customers can opt out when customers doesnt include employees working within an organization. They have to ask whoever handles Slack access at their business to do that and I hope they will oblige.

Inconsistencies in Slacks privacy policies add to the confusion. One section states, When developing Al/ML models or otherwise analyzing Customer Data, Slack cant access the underlying content. We have various technical measures preventing this from occurring. However, the machine-learning model training policy seemingly contradicts this statement, leaving plenty of room for confusion.

In addition, Slacks webpage marketing its premium generative AI tools reads, Work without worry. Your data is your data. We dont use it to train Slack AI. Everything runs on Slacks secure infrastructure, meeting the same compliance standards as Slack itself.

In this case, the company is speaking of its premium generative AI tools, separate from the machine learning models its training on without explicit permission. However, as PCMag notes, implying that all of your data is safe from AI training is, at best, a highly misleading statement when the company apparently gets to pick and choose which AI models that statement covers.

Update, May 18 2024, 3:24 PM ET: This story has been updated to include new information from Slack, which published a blog post explaining its practices in response to the community's concerns.

Update, May 19 2024, 12:41 PM ET: This story and headline have been updated to reflect additional context provided by Slack about how it uses customer data.

Original post:
Slack has been using data from your chats to train its machine learning models - Engadget

Machine learning-based integration develops an immunogenic cell death-derived lncRNA signature for predicting … – Nature.com

Genetic characteristics and transcriptional changes in ICD-related genes in LUAD

Summarized 34 ICD-related genes were identified through a large-scale meta-analysis11. The expression of 34 ICD genes in LUAD samples and normal samples was first analyzed (Figure S1A), and most of the ICD genes expressions were significantly different except for ATG5, IL10, CD8A, and CD8B. Secondly, the location of ICD-related genes in the human genome was analyzed (Figure S1B). the variation of ICD-related genes in LUAD patients in the TCGA cohort was also assessed. The results showed that approximately 69.63% (188/270) of LUAD patients had mutations in ICD-related genes, and the top 20 mutations in ICD-related genes were displayed in the study, with the highest frequency of mutations in TLR4 and NLRP3 (Figure S1C and Figure S1D).

The study also performed GO enrichment analysis of ICD-related genes (Figure S1E), which showed that, in terms of biological processes, the main enrichment was in various receptor activities. In terms of cellular components, the main enrichment was in the cytolytic granule and inflammasome complex. In terms of molecular functions, the main enrichment was in the biological processes of interleukin. In addition, KEGG enrichment analysis showed that ICD-related genes were enriched in the NOD-like receptor signaling pathway, Toll-like receptor signaling pathway, and Necroptosis. (Figure S1F).

A total of 1367 characteristic lncRNAs were selected by matching the training dataset with validation datasets for in-depth analysis. We employed consensus cluster analysis to partition the TCGA-LUAD dataset into two groups based on the high-expression and low-expression of ICD-related genes. Subsequently, 473 lncRNAs were identified by conducting differential expression analysis (Fig.2A and B). These lncRNAs were then compared with the 300 lncRNAs obtained by Pearson correlation analysis (Fig.2C) to identify 176 ICD-related lncRNAs (Fig.2D). As a result, 24 ICD-related lncRNAs were ultimately identified by univariate Cox regression analysis (Supplementary Table 2).

(A) Heatmap displaying 34 ICD gene expression profiles among normal and LUAD samples in the TCGA cohort. (B) The location of ICD-related genes in the human genome. (C) Single Nucleotide Polymorphism analysis of ICD-related genes in the TCGA cohort. (E) Bar plot displaying Gene Ontology analysis based on 34 ICD genes. (F) Bar plot displaying KEGG analysis based on 34 ICD genes.

A total of 24 ICD-related lncRNAs were inputted into a comprehensive machine-learning model, which encompassed the 10 aforementioned methodologies for creating prognostic signatures. Figure3A illustrated the acquisition of a total of 101 prognostic models. The predictive signature created by the combination of RSF+Ridge had the greatest mean C index of 0.674, as determined by analyzing the training and test cohorts. This signature was identified as the ICDI signature, (Fig.3A and B). The obtained equation is as follows (see Supplementary Table 3 for detail):

$${text{ICDIscore}} = min Vert beta x - y Vert_{2}^{2} + {uplambda } Vert beta Vert _{2}^{2}$$

(A) A total of 101 combinations of machine learning algorithms for the ICDI signature via a tenfold cross-validation framework based on the TCGA-LUAD cohort. The C-index of each signature was calculated across validation datasets, including the GSE29013, GSE30219, GSE31210, GSE3141, and GSE50081cohort. (B) 24 ICD-related lncRNAs importance ranking in the RSF algorithm and 19 lncRNAs enrolled in the ICDI signature coefficient finally obtained in the Ridge algorithm. (C) KaplanMeier survival curve of OS between patients with a high score of ICDI signature and with a low score of ICDI signature in TCGA-LUAD, GSE29013, GSE30219, GSE31210, GSE3141, and GSE50081 cohort. (D) Receiver operator characteristic (ROC) analysis for ICDI signature in TCGA-LUAD, GSE29013, GSE30219, GSE31210, GSE3141, and GSE50081 cohort.

As the elastic net mixing parameter, was limited with 01. The is defined as (uplambda =frac{1-alpha }{2}{Vert beta Vert }_{2}^{2}+alpha {Vert beta Vert }_{1}).

LUAD patients were categorized into two groups based on their ICDI score: a high-score group and a low-score group. The median value was used as the cut-off point. Consistent with expectations, LUAD patients with low ICDI scores exhibited higher overall survival rates in the TCGA-LUAD, GSE29013, GSE30129, GSE31210, GSE3141, and GSE50081 datasets (Fig.3C).

The AUC values of 1-, 2-, 3-, 4-, and 5-year for the ICDI signature in the TCGA-LUAD cohort were estimated as 0.709, 0.678, 0.697, 0.716, and 0.660, respectively (Fig.3D), demonstrating that ICDI signature has promising predictive value for LUAD patients. It was validated in the GSE30219 cohort (0.891, 0.758, 0.744, 0.700, and 0.716), GSE31210 cohort (0.750, 0.691, 0.653, 0.677 and 0.718), GSE3141 cohort (0.690, 0.716, 0.819, 0.801 and 0.729), GSE50081 cohort (0.685, 0.694, 0.712, 0.638, and 0.639), and GSE3141 cohort (0.639, 0.697, 0.794, 0.670, and 0.521) (Fig.3D). As a result of insufficient survival data, the GSE29013 cohort only computes the AUC values for 2-, 3-, and 4-year periods. Still, it possesses strong predictive capability (Fig.3D).

In addition, we compared the predictive value of the ICDI signature with other clinical variables (Fig.4A). The C-index of the ICDI signature was significantly higher than other clinical variables, covering staging, age, gender, etc.

(A) The C-index of the ICDI signature and other clinical characteristics in the TCGA-LUAD, GSE29013, GSE30219, GSE31210, GSE3141 and GSE50081 cohorts. (B) The C-index of the ICDI signature and other signatures developed in the TCGA-LUAD, GSE29013, GSE30219, GSE31210, GSE3141 and GSE50081 cohorts.

Gene expression analysis based on machine learning can be leveraged to predict the outcome of diseases, which in turn can facilitate in early screening of diseases, as well as in researching new therapeutic modalities. Substantial predictive signatures have emerged in recent years. To compare the ICDI signature with published signatures, we searched for LUAD-related disease prediction model articles. Excluding articles with unclear prediction model formulas and missing corresponding gene expression data in the training and validation groups, 102 LUAD-related predictive signatures were finally enrolled (Supplementary Table 4). These signatures contained various kinds of Biological processes, such as cuproptosis, ferroptosis, autophagy, epithelial-mesenchymal transition, acetylation, amino acid metabolism, anoikis, DNA repair, fatty acid metabolism, hypoxia, Inflammatory, N6-methyladenosine, mitochondrial homeostasis, and mTOR, which was established in TCGA-LUAD, GSE29013, GSE30219, GSE31210, GSE3141, and GSE50081 and compared with the C-index of ICDI, it can be seen that the ICDI signature outperformed the majority of signatures in each cohort (Fig.4B).

To investigate the contribution of ICDI features in the LUAD TIME, we evaluated the correlation of ICDI features with immune infiltrating cells and immune-related processes. Based on TIMER algorithm, CIBERSORT algorithm, quantiseq algorithm, MCPcounter algorithm, xCell algorithm, and EPIC algorithm, the ICDI signature was correlated with most immune infiltrating cells except for a few (such as activated NK cells and CD8+naive T cells) (Fig.5A). Based on the ssGSEA algorithm, the ICDI signature was significantly correlated with most immune-related processes (Fig.5B). Based on the ESTIMATE algorithm, the ICDI signature was negatively correlated with StromalScore, ImmuneScore, and ESTIMATEScore, and positively correlated with TumorPurity (Fig.5C), as expected.

(A) Heatmap displaying the correlation between the ICDI signature and 13 immune-related processes. (B) Heatmap displaying the correlation between the ICDI signature and immune infiltrating cells. (C) Box plot displaying the correlation between the ICDI signature and The ESTIMATE Immune Score, ImmuneScore, StromalScore, and TumorPurity. (D) Box plot displaying the correlation between the ICDI signature and immune modulators.

In addition, the study also evaluated the relationship between ICDI signature and known immune modulators (CYT, TLS, Davoli_IS, Roh_IS, Ayers_expIS, TIS, RIR, and TIDE) (Fig.5D). The values of most of the immune modulators (CYT, TLS, Davoli_IS, Roh_IS, Ayers_expIS, and TIS) were significantly higher in the low ICDI signature scores group. The RIR values and TIDE score were all significantly higher in the high ICDI signature scores group, which suggested a higher potential for immunological escape (Fig.5D) All of these displayed ICDI signature was a potential immunotherapeutic biomarker.

To further investigate the potential of ICDI signature as an immunotherapeutic biomarker, the study calculated ICDI scores for each immunotherapy cohort respectively to appraise its predictive valuation. The findings indicated that those with a low ICDI score were more prone to derive advantages from immunotherapy. (Fig.6A) The receiver operating characteristic (ROC) analysis conducted in the study showed that the ICDI signature exhibited a consistent ability to predict the efficacy of immunotherapy-based treatment. This finding was further supported by the analysis of immunotherapy datasets, including cohort Melanoma-GSE78220, STAD-PRJEB25780, and GBM-PRJNA482620, which yielded ROC values of 0.771, 0.671, and 0.723, respectively (Fig.6B).

(A) Box plot displaying the correlation between the ICDI signature and immunotherapy response in the immunotherapy dataset (Melanoma-GSE78220, STAD-PRJEB25780, and GBM-PRJNA482620). (B) ROC curves of ICDI signature to predict the benefits of immunotherapy in the immunotherapy dataset (Melanoma-GSE78220, STAD-PRJEB25780, and GBM-PRJNA482620). (C) Box plot displaying the correlation between the ICDI signature and chemotherapy drugs.

Chemotherapy resistance is a significant barrier to the effectiveness of chemotherapy and targeted therapy in treating advanced lung cancer. We analyzed to determine the drug sensitivities of various chemotherapeutics in living organisms. We then compared the drug sensitivities using the ICDI signature. Individuals with low ICDI scores exhibited a notable rise in sensitivity to erlotinib, gefitinib, docetaxel, and paclitaxel. However, there was no significant variation in sensitivity to cisplatin and 5-fluorouracil. (Fig.6C) The study offers instructions on the administration of chemotherapeutic medications in individuals with LUAD.

See original here:
Machine learning-based integration develops an immunogenic cell death-derived lncRNA signature for predicting ... - Nature.com

Transforming manufacturing with AI and machine learning: Real-world applications and data management integration – The Manufacturer

The manufacturing industry is at the cusp of a revolution driven by Artificial Intelligence (AI) and Machine Learning (ML). These technologies are poised to transform operations, enhance efficiency, and reduce costs.

Introducing AI and ML into manufacturing organizations involves practical applications that highlight their potential. Additionally, understanding the critical role of data management is essential for ensuring the success of these technologies.

AI and ML are no longer futuristic concepts; they are essential tools for modern manufacturing. The imperative for adopting these technologies stems from the need to remain competitive in a rapidly evolving market. Manufacturers face increasing pressure to improve productivity, reduce waste, and enhance quality. AI and ML offer solutions by providing insights and automating processes that were previously labour-intensive and error prone.

In the manufacturing industry, Machine Learning (ML), a critical subset of Artificial Intelligence (AI), involves the use of sophisticated algorithms to learn from and make predictions based on data. These technologies can analyse vast amounts of production data to identify patterns, optimize workflows, and predict equipment failures. For example, ML algorithms can continuously monitor machinery performance, detecting subtle anomalies that may indicate future breakdowns, thus enabling predictive maintenance. Additionally, ML can be used to refine production schedules in real-time based on demand forecasts and resource availability, ensuring maximum efficiency and minimal downtime. By integrating AI and ML, manufacturers can enhance quality control, streamline supply chains, and drive overall operational excellence.

Managing industry standards is a complex task, but AI and ML can simplify it by automating the classification and tagging of data. These technologies can transform standards into digital formats and continuously learn from new data to provide up-to-date compliance guidelines. For instance, AI algorithms can parse through large datasets, identify relevant industry standards, and ensure that manufacturing processes adhere to the latest regulations, reducing compliance costs and enhancing operational efficiency.

AI and ML can enrich business partner information, offering deep profiling that can be leveraged across the value chain. By analysing data from various sources, AI can provide insights into a partners financial stability, market performance, and strategic alignment. This deep profiling enables manufacturers to make informed decisions about partnerships, negotiate better terms, and predict potential risks. Integrating these insights helps streamline operations and optimize inventory management, leading to cost savings and improved supply chain efficiency.

Predictive maintenance is one of the most impactful applications of AI and ML in manufacturing. These technologies analyse data from sensors and machinery to predict equipment failures before they occur. For example, ML algorithms can monitor the vibration and temperature of a machine to forecast potential issues. By scheduling maintenance activities based on these predictions, manufacturers can prevent unexpected downtime, extend equipment lifespan, and reduce maintenance costs. This proactive approach ensures continuous production and enhances safety.

AI and ML can optimize production scheduling by analysing production data, demand forecasts, and resource availability to create efficient schedules. These systems can dynamically adjust production plans in real-time based on changing conditions, such as delays in raw material supply or shifts in demand. For instance, AI can identify bottlenecks in the production process and suggest adjustments to mitigate delays, ensuring that production targets are met consistently. This flexibility maximizes resource utilization and minimizes idle time.

For AI and ML to function effectively, accurate and consistent data is essential. This is where Master Data Management (MDM) plays a critical role. MDM involves creating a single, authoritative source of truth for critical business data, ensuring that all systems and processes across the organization work with the same accurate information. MDM enhances AI and ML efficiency by providing clean, consistent, and reliable data, which is vital for generating meaningful insights and predictions. For example, in predictive maintenance, the reliability of sensor data is crucial for accurate failure predictions.

The integration of AI and ML into manufacturing processes offers significant benefits, including simplified management of industry standards, enriched business partner profiling, predictive maintenance, and optimized production scheduling. These applications demonstrate how AI and ML can save time and money while enhancing operational efficiency. However, the success of these technologies hinges on the quality of data, underscoring the importance of robust data management practices. By ensuring data accuracy and consistency, MDM enables AI and ML systems to perform at their best, delivering reliable insights and driving informed decision-making. As manufacturers continue to embrace AI and ML, robust MDM practices will be essential to unlocking the full potential of these technologies and achieving sustained operational excellence.

His passion for addressing industry challenges led him to solution provision, working with organisations like Autodesk and Microsoft.

Now, with Stibo Systems, he leverages master data management to help manufacturers thrive in volatile markets.

Follow this link:
Transforming manufacturing with AI and machine learning: Real-world applications and data management integration - The Manufacturer

How the State Department used AI and machine learning to revolutionize records management – FedScoop

In the digital age, government agencies are grappling with unprecedented volumes of data, presenting challenges in effectively managing, accessing and declassifying information.

The State Department is no exception. According to Eric Stein, deputy assistant secretary for the Office of Global Information Services, the departments eRecords archive system currently contains more than 4 billion artifacts, which includes emails and cable traffic. The latter is how we communicate to and from our embassies overseas, Stein said.

Over time, however, department officials need to declare what can be released to the public and what stays classified a time-consuming and labor-intensive process.

The State Department has turned to cutting-edge technologies like artificial intelligence (AI) and machine learning (ML) to find a more efficient solution. Through three pilot projects, the department has successfully streamlined the document review process for declassification and improved the customer experience when it comes to FOIA (Freedom of Information Act) requests.

An ML-driven declassification effort

At the root of the challenge is Executive Order 13526, which requires that classified records of permanent historical value be automatically declassified after 25 years unless a review determines an exemption. For the State Department, cables are among the most historically significant records produced by the agency. However, current processes and resource levels will not work for reviewing electronic records, including classified emails, created in the early 2000s and beyond, jeopardizing declassification reviews starting in 2025.

Recognizing the need for a more efficient process, the department embarked on a declassification review pilot using ML in October 2022. Stein came up with the pilot idea after participating in an AI Federal Leadership Program supported by major cloud providers, including Microsoft.

For the pilot, the department used cables from 1997 and created a review model based on human decisions from 2020 and 2021 concerning cables marked as confidential and secret in 1995 and 1996. The model uses discriminative AI to score and sort cables into three categories: those it was confident should be declassified, those it was confident shouldnt be declassified, and those that needed manual review.

According to Stein, for the 1997 pilot group of more than 78,000 cables, the model performed the same as human reviewers 97% to 99% of the time and reduced staff hours by at least 60%.

We project [this technology] will lead to millions of dollars in cost avoidance over the next several years because instead of asking for more money for human resources or different tools to help with this, we can use this technology, Stein explained. And then we can focus our human resources on the higher-level and analytical thinking and some of the tougher decisions, as opposed to what was a very manual process.

Turning attention to FOIA

Building on the success of the declassification initiative, the State Department embarked on two other pilots to enhance the Freedom of Information Act (FOIA) processes from June 2023 to February 2024.

Like cable declassification efforts, handling a FOIA request is a highly manual process. According to Stein, sometimes those requests are a single sentence; others are multiple pages. But no matter the length, a staff member must acknowledge the request, advise whether the department will proceed with it, and then manually search for terms in those requests in different databases to locate the relevant information.

Using the lessons learned from the declassification pilot, Stein said State Department staff realized there was an opportunity to streamline certain parts of the FOIA process by simultaneously searching what was already in the departments public reading room and in the record holdings.

If that information is already publicly available, we can let the requester know right away, Stein said. And if not, if there are similar searches and reviews that have already been conducted by the agency, we can leverage those existing searches, which would result in a significant savings of staff hours and response time.

Beyond internal operations, the State Department also sought to improve the customer experience for FOIA requesters by modernizing its public-facing website and search functionalities. Using AI-driven search algorithms and automated request processing, the department aims to find and direct a customer to existing released documents and automate customer engagement early in the request process.

Lessons learned

Since launching the first pilot in 2022, team members have learned several things. The first is to start small and provide the space and time to become familiar with the technology. There are always demands and more work to be done, but to have the time to focus and learn is important, Stein said.

Another lesson is the importance of collaboration. Its been helpful to talk across different communities to not only understand how this technology is beneficial but also what concerns are popping upand discussing those sooner than later, he said. The sooner that anyone can start spending some time thinking about AI and machine learning critically, the better.

Another lesson is to recognize the need to continuously train a model because you cant just do this once and then let it go. You have to constantly be reviewing how were training the model (in light of) world events and different things, he said.

These pilots have also shown how this technology will allow State Department staff to better respond to other needs, including FOIA requests. For example, someone may ask for something in a certain way, but thats not how its talked about internally.

This technology allows us to say, Well, they asked for this, but they may have also meant that, Stein said. So, it allows us to make those connections, which may have been missing in the past.

The State Departments strategic adoption of AI and ML technologies in records management and transparency initiatives underscores the transformative potential of these tools. By starting small, fostering collaboration and prioritizing user-centric design, the department has paved the way for broader applications of AI and ML to support more efficient and transparent government operations.

The report was produced by Scoop News Group for FedScoop, as part of aseries on innovation in government, underwritten byMicrosoft Federal.To learn more about AI for government from Microsoft,sign up hereto receive news and updates on how advanced AI can empower your organization.

See original here:
How the State Department used AI and machine learning to revolutionize records management - FedScoop

Characterization of PANoptosis-related genes in Crohn’s disease by integrated bioinformatics, machine learning and … – Nature.com

GEO dataset integration and immune landscape of CD

We constructed a combined dataset covering 279 CD samples and 224 control samples from mucosa after the removal of batch effects (Fig.2A,B). A broadly uncoordinated immune response is an indispensable hallmark of CD. With the aim of revealing the immune landscape, we scored the immune cell infiltration of CD patients and controls via the ssGSEA method. As illustrated in Fig.2C, the infiltration of 20 immune cells in the CD group and control group was significantly different, among which only the scores of T helper 17 (Th17) cells were lower in CD tissues than in control tissues. We then performed a correlation analysis of distinct immune cells, as shown in Fig.2D. Interestingly, Th17 cells, CD56bright natural killer (NK) cells, CD56dim NK cells and monocytes showed inverse correlations with almost all other immune cells, whereas the other immune cells were generally positively correlated with one another, which deserves special attention.

GEO dataset combination and immune landscape of CD. (A) PCA between datasets before removal of batch effects. (B) PCA between integrated datasets after removal of batch effects. (C) Infiltration levels of 28 immune cell subtypes in CD samples and controls. The blue bars represent controls, and the red bars represent CD samples. *p<0.05; **p<0.01; ***p<0.001; ****p<0.0001. (D) Pearson correlation analysis of distinct immune cells. The purple squares represent positive correlations, and the orange squares represent inverse correlations. GEO Gene Expression Omnibus, CD Crohns disease, PCA principal component analysis.

A total of 1265 DEGs, consisting of 592 upregulated and 673 downregulated genes, were identified through differential expression analysis (Fig.3A). A list of possible PRGs was produced from previous research (Supplementary file 1: Table S1). Subsequently, we intersected the 1265 DEGs with 930 PRGs via a Venn diagram; thus, 130 DE-PRGs were identified (Fig.3B), which were further grouped in a heatmap (Fig.3C). The overall expression of these DE-PRGs in the CD group and control group is shown in Supplementary file 3: Fig. S1. We could conclude that the vast majority of DE-PRGs were expressed at higher levels in CD tissues than in control tissues.

Identification of DE-PRGs. (A) Volcano map of the DEGs with the cutoff threshold set at |log2 (fold change)|>1 and adj. p<0.05. The blue dots represent downregulated DEGs, the red dots represent upregulated DEGs, and the gray dots represent genes with no significant difference. (B) Venn diagram of DEGs and PRGs. Pink circle represents DEGs, blue circle represents PRGs, and their overlapping area represents DE-PRGs. (C) Clustered heatmap of the top 40 DE-PRGs. Each row represents one of the top 40 DE-PRGs, and each column represents one sample, either CD or normal. DE-PRGs differentially expressed PANoptosis-related genes, DEGs differentially expressed genes, PRGs PANoptosis-related genes, CD Crohns disease.

We then examined the latent functions and signaling pathways of the DE-PRGs. GO analysis revealed that these DE-PRGs were predominantly involved in regulation of apoptotic signaling pathway, leukocyte cellcell adhesion, regulation of inflammatory response (biological process); membrane raft, membrane microdomain, focal adhesion (cellular component); ubiquitin-like protein ligase binding, ubiquitin protein ligase binding, and phosphatase binding (molecular function) (Supplementary file 4: Fig. S2A). Additionally, DE-PRGs were notably enriched in apoptosis, proteoglycans in cancer, NOD-like receptor signaling pathway, among others, according to the KEGG results (Supplementary file 4: Fig. S2B). Moreover, a PPI network analysis of the DE-PRGs was performed and a complex network of the DE-PRGs was constructed (Supplementary file 5: Fig. S3).

To screen the hub DE-PRGs, we first capitalized on three algorithms, LASSO, SVM and RF, and discovered 20, 34 and 33 potential hub DE-PRGs, respectively (Fig.4AE). Afterward, 10 hub DE-PRGs were identified through the intersection of the machine learning results, namely CD44, cell death inducing DFFA like effector c (CIDEC), N-myc downstream regulated 1 (NDRG1), nuclear mitotic apparatus protein 1 (NUMA1), proliferation and apoptosis adaptor protein 15 (PEA15), recombination activating 1 (RAG1), S100 calcium binding protein A8 (S100A8), S100 calcium binding protein A9 (S100A9), TIMP metallopeptidase inhibitor 1 (TIMP1) and X-box binding protein 1 (XBP1) (Fig.4F). Next, we probed their interactions, as shown in Fig.4G. Most hub DE-PRGs, such as CD44, PEA15, S100A8, S100A9, TIMP1 and XBP1, were closely interrelated. Moreover, NDRG1, NUMA1 and RAG1 generally presented antagonistic effects on the other hub DE-PRGs. Finally, the diagnostic value of each hub DE-PRG in predicting CD was calculated based on our combined dataset (Fig.4H). All 10 hub DE-PRGs exhibited outstanding predictive performance with area under the curve (AUC) values greater than 0.740. Notably, the AUC reached as high as 0.871 when the 10 hub DE-PRGs were combined (Fig.4H). In addition, we conducted external validation on the GSE102133 and GSE207022 datasets, respectively. The results were satisfactory, with high AUC values (Supplementary file 6: Fig. S4).

Identification of the hub DE-PRGs. (A) Cross-validations of adjusted parameter selection in the LASSO model. Each curve corresponds to one gene. (B) LASSO coefficient analysis. Vertical dashed lines are plotted at the best lambda. (C) SVM algorithm for hub gene selection. (D) Relationship between the number of random forest trees and error rates. (E) Ranking of the relative importance of genes. (F) Venn diagram showing the 10 hub DE-PRGs identified by LASSO, SVM and RF. Pink circle represents potential hub DE-PRGs identified by RF, blue circle represents potential hub DE-PRGs identified by SVM, green circle represents potential hub DE-PRGs identified by LASSO, and their overlapping area represents the final hub DE-PRGs. (G) Chord diagram showing the correlations between the hub DE-PRGs. Red represents positive correlations between different genes and green represents negative correlations between different genes. (H) ROC curves of the hub DE-PRGs in CD diagnosis. DE-PRGs differentially expressed PANoptosis-related genes, LASSO least absolute shrinkage and selection operator, RF random forest, SVM support vector machine, ROC receiver operating characteristic, AUC area under the curve, CD Crohns disease.

Spearman correlation analysis was carried out to determine the interactions between the hub DE-PRGs and immune cells (Fig.5). CD44, PEA15, S100A8, S100A9, TIMP1 and XBP1 demonstrated noteworthy positive correlations with the infiltration of an abundance of immune cells, except for certain immune cells, such as monocytes and CD56bright NK cells. In contrast, NDRG1, NUMA1, and RAG1 were negatively associated with most types of immune cells, excluding a few immune cells such as monocytes. In addition, the CIDEC fell somewhere between these two extremes.

Spearman correlation analysis of hub DE-PRGs with immune cells. The correlations between CD44 (A), CIDEC (B), NDRG1 (C), NUMA1 (D), PEA15 (E), RAG1 (F), S100A8 (G), S100A9 (H), TIMP1 (I) and XBP1 (J) gene expressions with immune cells, respectively. The size of the dots represents the strength of gene correlation with immune cells; the larger the dot, the stronger the correlation. The color of the dots represents the p-value; the greener the color, the lower the p-value. p<0.05 was considered statistically significant. DE-PRGs differentially expressed PANoptosis-related genes.

The top 30 crucial genes related to CD were extracted from the GeneCards database, and their expression levels were compared between CD samples and normal samples (Fig.6A). We could easily conclude that a majority of the CD-related genes (26 out of 30) were differentially expressed, especially COL1A1, CTLA4, IL10 and NOD2. Pearson correlation analysis was subsequently conducted to scrutinize the relationships between these CD-related genes and the hub DE-PRGs (Fig.6B). Notably, CTLA4, one of the most differentially expressed CD-related genes, was significantly associated with each hub DE-PRG. COL1A1, IL10 and NOD2 also presented varying levels of correlation with the hub DE-PRGs. Nevertheless, there were no significant correlations between the hub DE-PRGs and some CD-related genes, including CYBB, IL10RA, RET and VCP.

Expression levels of the top 30 CD-related genes and relationships between them and hub DE-PRGs. (A) Boxplot of the top 30 crucial genes in relation to CD. The blue bars represent controls, and the red bars represent CD samples. (B) Pearson correlation analysis between the top 30 CD-related genes and the 10 hub DE-PRGs. *p<0.05; **p<0.01; ***p<0.001. CD Crohns disease, DE-PRGs differentially expressed PANoptosis-related genes.

Subsequently, a genemiRNA interaction network of the 10 hub DE-PRGs consisting of 226 nodes and 338 edges was constructed (Supplementary file 7: Fig. S5 and Supplementary file 8: Table S3). Apparently, miR-124-3p, miR-34a-5p and miR-27a-3p were most strongly associated with the hub DE-PRGs in CD. After that, we generated a geneTF regulatory network of the 10 hub DE-PRGs (Supplementary file 9: Fig. S6). The 10 hub DE-PRGs were regulated by 35 total TFs. Among them, FOXC1 was found to regulate as many as 7 hub DE-PRGs and S100A8 was regulated by 13 miRNAs (Supplementary file 10: Table S4). In addition, we looked for available drugs that act on the hub DE-PRGs, and a host of drugs were involved (Supplementary file 11: Fig. S7 and Supplementary file 12: Table S5). Specifically, a total of 19 drugs interacted with XBP1, 8 of which inhibited it.

To distinguish different PANoptosis patterns in CD patients, we adopted the NMF method for unsupervised clustering on the basis of the 10 hub DE-PRGs. At k=2, the most stable and optimal PANclusters were identified (Fig.7A). There were 101 and 178 CD samples in PANcluster A and PANcluster B, respectively. The geometrical distance between the two clusters is shown in Fig.7B, validating their gene expression heterogeneity. Thereafter, a boxplot and a heatmap were generated to compare the expression levels of the hub DE-PRGs between PANcluster A and PANcluster B (Fig.7C,D). Specifically, PANcluster A was distinguished by the considerably high expression levels of CIDEC, NDRG1, NUMA1 and RAG1, while the other hub DE-PRGs, that is, CD44, PEA15, S100A8, S100A9, TIMP1 and XBP1, were expressed at higher levels in PANcluster B.

Recognition of PANclusters in CD. (A) Unsupervised clustering matrix generated using NMF method when k=2. (B) PCA plot showing the distribution of PANcluster A and PANcluster B. The red dots represent PANcluster A and the blue dots represent PANcluster B. (C) Boxplot of the expression levels of the hub DE-PRGs in PANcluster A and PANcluster B. The red bars represent PANcluster A, and the blue bars represent PANcluster B. (D) Heatmap of the expression levels of the hub DE-PRGs in PANcluster A and PANcluster B. Each row represents one hub DE-PRG, and each column represents one CD sample. PANclusters PANoptosis patterns, CD Crohns disease, NMF nonnegative matrix factorization, PCA principal component analysis, DE-PRGs differentially expressed PANoptosis-related genes.

GSVA was performed with the aim of shedding light on the functional diversity patterns of the recognized PANclusters. With regard to Hallmark pathways, increased activity of p53 pathway, androgen response and hypoxia were detected in PANcluster A, whereas mTORC1 signaling, inflammatory response, TNF- signaling via NF-B, IL-6/JAK/STAT3 signaling and epithelial mesenchymal transition were increased in PANcluster B (Supplementary file 13: Fig. S8A). In addition, results from the KEGG analysis suggested that PANcluster A had hypoactive ECMreceptor interaction and endocytosis but expressed high levels of genes associated with cytokinecytokine receptor interaction and numerous signaling pathways, including toll-like receptor signaling pathway and NOD-like receptor signaling pathway (Supplementary file 13: Fig. S8B). Concerning the Reactome-based pathways, PANcluster A showed an increase in the cell cycle pathway, while most pathways, such as cytokine signaling in immune system and extracellular matrix-related pathways, were significantly enriched in PANcluster B (Supplementary file 13: Fig. S8C).

To clarify the disparities in the immune system among the PANclusters, we compared their immune microenvironments, as shown in Fig.8A. Remarkably, the enrichment scores of 26 immune cells were much greater in PANcluster B than in PANcluster A. Consequently, CD56bright NK cells and monocytes were the only two exceptions with higher infiltration degrees in PANcluster A, the explanations behind which demand further investigation. In addition, differential gene analysis revealed 533 DEGs, including 171 upregulated and 362 downregulated genes (Fig.8B). To learn more about the biological functions and processes linked to these DEGs, GO and KEGG analyses were performed. The 533 DEGs were markedly enriched in the following terms: positive regulation of cell adhesion, leukocyte cellcell adhesion, and extracellular matrix organization (biological process); collagen-containing extracellular matrix, secretory granule membrane, and basement membrane (cellular component); and extracellular matrix structural constituent, glycosaminoglycan binding, and integrin binding (molecular function) (Fig.8C,D). Moreover, the 533 DEGs were principally involved in many pathways, such as cell adhesion molecules, ECMreceptor interaction and PI3K-Akt signaling pathway (Fig.8E).

Characterization of different PANclusters. (A) Infiltration levels of 28 immune cell subtypes in PANclusters A and B. The red bars represent PANcluster A, and the blue bars represent PANcluster B. (B) Volcano map of DEGs between PANclusters A and B. The blue dots represent downregulated DEGs, the red dots represent upregulated DEGs, and the gray dots represent genes with no significant difference. (C,D) Enriched items in GO analysis based on the DEGs between PANclusters A and B. (E) Enriched items in KEGG analysis based on the DEGs between PANclusters A and B. Node color indicates gene expression level; quadrilateral color indicates z-score. PANclusters PANoptosis patterns, DEGs differentially expressed genes, BP biological process, CC cellular component, MF molecular function, GO Gene Ontology, KEGG Kyoto Encyclopedia of Genes and Genomes.

CD and control samples were acquired from 10 patients who were diagnosed with CD, and their demographic and clinical information is presented in Table 1. qRT-PCR was subsequently conducted to determine the relative expression levels of the 10 hub DE-PRGs (Fig.9A). As expected, the levels of CD44, PEA15, S100A8, S100A9, TIMP1 and XBP1 increased in CD samples compared with those in control samples; while the opposite trend was observed for NDRG1. Moreover, there was no significant difference in the mRNA expression levels of CIDEC, NUMA1 or RAG1. Furthermore, we established classic TNBS and DSS mouse models of CD and collected colon tissues to analyze the expression levels of the hub DE-PRGs in murine colon tissues from the TNBS, DSS and control groups (Fig.9B,C). Generally, the results of the TNBS model were in line with expectations. Specifically, in TNBS-induced colitis, Cd44, Numa1, S100a8, S100a9, Timp1 and Xbp1 were more highly expressed, while Cidec and Rag1 were less expressed. In addition, the levels of Ndrg1 and Pea15a did not significantly differ between the TNBS group and the control group. Consistent with previous work, in the DSS mouse model, the expression levels of Cd44, S100a8, S100a9 and Timp1 were greater in the mice with colitis; while the expression level of Ndrg1 was lower in the mice with colitis. In addition, no significant difference in the expression levels of Cidec, Pea15a or Xbp1 was detected. Unexpectedly, the expression levels of Numa1 and Rag1 in the DSS group were different from those in the CD and TNBS colitis groups.

qRT-PCR validation of the hub DE-PRGs in CD patients (A), TNBS-induced colitis model (B) and DSS-induced colitis model (C). The blue dots represent the normal/control tissues, and the red dots represent the diseased tissues. qRT-PCR quantitative real-time PCR, DE-PRGs differentially expressed PANoptosis-related genes, CD Crohns disease, TNBS 2,4,6-trinitrobenzene sulfonic acid, DSS dextran sodium sulfate, GAPDH glyceraldehyde-3-phosphate dehydrogenase.

Continued here:
Characterization of PANoptosis-related genes in Crohn's disease by integrated bioinformatics, machine learning and ... - Nature.com