Machine learningbased observation-constrained projections reveal elevated global socioeconomic risks from wildfire – Nature.com

Applying traditional EC for global fire carbon emissions

The recently developed emergent constraint (EC) approach has demonstrated robust capability in reducing the uncertainty in characterizing or projecting Earth system variables simulated by a multimodel ensemble25,26. The basic concept of EC is that, despite the distinct model structures and parameters, there exists various across-model relationships (emergent constraints) between pairs of quantities when we analyze outputs from multiple models27. Therefore, the EC concept is especially useful to derive the relationship between a variable that is difficult or impossible to measure (e.g., future wildfires) and a second, measurable variable (e.g., historical wildfires), across multiple ESMs. We start with global total values and find significant linear relationship between historical and future global total fire carbon emission across 38 ensemble members of 13 ESMs (Supplementary Fig.2a). Because we are particularly interested in the spatial distribution of future wildfires, which are critical for quantifying future socioeconomic risks from wildfires, we further apply the EC concept to every grid cell of the globe, using either a single constraint variable (historical fire carbon emissions) or multiple constraint variables (the atmospheric and terrestrial variables in Supplementary Table2), with the latter being shown in Supplementary Fig.2b. We find insignificant linear relationships between these historical fire-relevant variables and future wildfires in the historically fire-prone regions across the analyzed 38 members of 13 ESMs. The failure of the traditional EC concept in constraining fire carbon emissions at local scales could be attributed to the highly nonlinear interactions between fire and its cross-section drivers, which is likely inadequately captured by the linear relationship under the EC assumption. Therefore, we further develop an MLT-based constraint to deal with the complex response of wildfires to environmental and socioeconomic drivers.

MLT provide powerful tools for capturing the nonlinear and interactive roles among regulators of an Earth system feature, thereby facilitating effective, multivariate constraint on wildfire activity, which represents an integrated function of climate, terrestrial ecosystem, and socioeconomic conditions. MLT have been widely applied for identifying empirical regulators32 and building prediction systems for global and regional fire activity35. To constrain the projected fire carbon emissions simulated by 13 ESMs using observational data, the current study establishes an MLT-based emergent relationship between the future fire carbon emissions and historical fire carbon emissions, climate, terrestrial ecosystem, and socioeconomic drivers.

Here, we use MLT to examine the empirical relationships between historical, observed influencing factors of wildfires and future fire carbon emissions from ESMs and then feed observational data into the trained machine learning models (Supplementary Fig.3). To train the MLT to use historical states for the prediction of future fire carbon emission, the historical and future simulations from the SSP (Shared Socioeconomic Pathway) 5-8536, a high-emission scenario, are analyzed for the currently available 13 ESMs in CMIP6 (Supplementary Table1). A subset of these ESMs (i.e., nine ESMs that provide simulation in a lower-emission scenario, SSP2-45) is also analyzed to examine the dependence of fire regimes on socioeconomic pathway. The training is conducted using the spatial sample of decadal mean predictors and target variable, both individually from each ESM and from their aggregation, with the later referred to as multimodel mean and subsequently analyzed for projecting fire carbon emission and its socioeconomic risks. Corresponding to the spatial resolution of the observational products of fire carbon emission, all model outputs are bilinearly interpolated to a 0.250.25 grid, resulting in a spatial sample of 11,325 points per model for the training. To perform the observational constraint, the historical observed predictors are then fed into the trained machine learning models. The historical predictors are listed in Supplementary Table2 with their observational data sources, temporal coverages, and spatial resolutions. For the atmospheric and terrestrial variables, the annual mean value and climatology in each of 12 calendar months are included as predictors. This training and observational constraining is performed for target decades (20112020, 20212030, 20912100), and the historical period is always 20012010. Future changes in fire carbon emission are quantified and expressed as the relative trend (% decade1) (i.e., the ratio between the absolute trend and the mean value during the 2010s), for both the default and observation-constrained ensembles.

The current spatial sample training approach establishes a history-future relationship for each pixel using the entire global sample. To minimize local prediction errors for a certain pixel, MLT search all pixels, regardless of their geographical location, to optimize the prediction model of future fires at the target pixel. In this way, a physically robust history-future relationship is established based on the global sample of locations, whereas influences of localized features, such as socioeconomic development, on wildfire trends are naturally damped in our approach (Supplementary Figs.10 and 11). The reliability of MLT is degraded when the actual observational data space is insufficiently covered by the training (historical CMIP6 simulation) data space, namely the extrapolation uncertainty. Here, we further evaluate the data space of both observation and historical simulation of the climate and fire variables (Supplementary Fig.14), and we find all these assessed variables are largely overlapped, indicating minimal extrapolation error involved in the current MLT application.

To minimize the projection uncertainty associated with the selected machine learning algorithms, this study examines three MLTrandom forest (rf), support vector machine with Radial Basis Function Kernel (svmRadialCost), and gradient boosting machine (gbm). These three algorithms differ substantially in their function. The average among these algorithms is thus believed to better capture the complex interrelation between the historical predictors and future fire carbon emissions than any single algorithm. The MLT analysis is performed using the caret, dplyr, randomForest, kernlab, and gbm packages in the R statistical software. The prediction model is fitted for each MLT using the training data set that targets each future decade, with parameters optimized for the minimum RMSE via 10-fold cross-validationin other words, using a randomly chosen nine-tenth of the entire spatial sample (n=10,193) for model fitting and the remaining one-tenth of the entire spatial sample (n=1,132) for validation, and repeating the process 10 times. For svmRadialCost, the optimal pair of cost parameter (C) and kernel parameter sigma (sigma) is searched from 30 (tuneLength=30) C candidates and their individually associated optimal sigma. For gbm, we set the complexity of trees (interaction.depth) to 3, and learning rate (shrinkage) to 0.2, and let the train function search for the optimal number of trees from 10 to 200 with an increment of 5 (10, 15, 20, , 200). For rf, the number of variables available for splitting at each tree node (mtry) is allowed to search between 5 and 50 with an increment of 1 (5, 6, 7, , 50); the number of trees is determined by the algorithm provided by randomForest package and the train function by the caret package. The cross-validation R2s exceed 0.8 (n=1,132) for all optimized MLT and all future periods. The currently examined ESMs, MLT, and hundreds of observational data set combinations constitute a multimodel, multidata set ensemble of projected fire carbon emissions for the twenty-first century. This multimodel, multidata set ensemble allows natural quantification of uncertainty in the future projection derived from observational sources and MLT, compared with a previous single-MLT, single-observation approach67.

This MLT-based observational constraining approach is validated for a historical period using the emergent relation between the fire-climate-ecosystem-socioeconomics during 19972006 and fire carbon emission during 20072016. The spatial correlation and RMSE with the observed decadal mean fire carbon emission (n=11,325) is evaluated and compared for the constrained and unconstrained ensemble, reported in the main text (Figs.1 and 2). The RMSE and R2 produced by the traditional EC approach that constrains fire carbon emissions during 20072016 with fire carbon emissions during 19972006 are reported along with the MLT-based observational constraint in Fig.1e, f. The MLT-based observational constraining approach is also applied to six ESMs that report burned area fraction, and validation is also conducted and reported in Supplementary Fig.6.

Because the MLT are trained using the global spatial sample, we expect the performance of MLT to be sensitive to the spatial resolution of the training data set. This assumption is tested by varying the interpolation grids (1, 2.5, 5, and 10 latitude by longitude) of the ESMs and fitting MLT using this specific-resolution training data for the validation period (Supplementary Fig.7). Observational data sets at 0.25 resolution are subsequently fed into the fitted MLT models, regardless of the input model data resolution. This sensitive test sheds light on the importance of spatial resolution to our observational constraining and thereby implies potential accuracy improvement of our MLT-based observation constraint with the development of higher-resolution ESMs.

Here, we define the socioeconomic exposure to wildfires as a product of decadal mean fire carbon emission and number of people, amount of GDP, and agricultural area exposed to the burning in each grid cell, following previous definition for extreme heat68. These exposure metrics measure the amount of population, GDP, and agricultural area affected by wildfires, whose severity is represented by the amount of fire carbon emission. The projected population at 1/81/8 resolution under SSP5-85 is obtained from the National Center for Atmospheric Researchs Integrated Assessment Modeling Group and the City University of New York Institute for Demographic Research69. The projected GDP at 1km resolution under SSP5 is disaggregated from national GDP projections using nighttime light and population70. The agricultural area projection at 0.050.05 resolution under SSP5-85 is obtained from the Global Change Analysis Model and a geospatial downscaling model (Demeter)71. All the projected socioeconomic variables are resampled to 0.250.25 resolution before the calculation of exposure to fire carbon emission fraction. Future changes in socioeconomic exposure to wildfires are quantified as the relative trend (% decade1) (i.e., the ratio between the absolute trend and the mean value during the 2010s) for the default and observation-constrained ensembles. These relative changes provide direct implications on what the future would be like compared with the current state, regardless of the potential biases simulated by the default ESMs.

The mechanisms underlying the projected evolution in fire carbon emissions are explored in two tasks, addressing the importance of drivers in the historical and dynamical perspectives. The first task assesses the relative contribution of each environmental and socioeconomic drivers historical distribution to the projected future wildfire distribution, for directly understanding how the current observational constraint works (Supplementary Fig.8). The second task examines the relative contribution of each drivers projected trend to the projected wildfires trends in a specific region, for disentangling the dynamical mechanisms underlying future evolution of regional wildfires (Supplementary Fig.9). These tasks benefit from the importance score as an output of MLT. Although the calculation of importance scores varies substantially by MLT, all the importance scores qualitatively reflect relative importance of each predictor when making a prediction. For each tree in both rf and gbm, the prediction accuracy on the out-of-bag portion of the data is recorded. Then, the same is done after permuting each predictor variable. For rf, the differences are averaged for each tree and normalized by the standard error. For gbm, the importance order is first calculated for each tree and then summed up over each boosting iteration. For svm, we estimate the contribution of a single variable by training the model on all variables except that specific variable. The difference in performance between that model and the one with all variables is then considered the marginal contribution of that particular variable; such marginal contribution of each variable is standardized to derive the variables relative importance. Because we apply multiple MLT in this study, the average importance scores from these MLT are reported in the corresponding figures for robustness.

In the first task, the importance of each historical driver to future global wildfire distributions is examined in three MLT models (random forest, support vector machine, and gradient boosting machine) that are trained for projecting future fire carbon emissions (Supplementary Fig.8). For the atmospheric and terrestrial variables that include annual mean and monthly climatology as predictors, to account for the overall importance of a particular variable while considering the possible information overlapping contained in each month and annual mean, the importance of each variable is represented by the highest importance score among these 13 predictors (annual mean, January, February, , December). The importance score of each historical driver reflects the relative weight of each historical, environmental driver in determining the spatial pattern of fire carbon emissions in each future decade.

In the second task, the dynamical importance of each environmental drivers future evolution is assessed for targeted tropical regions (i.e., Amazon and Congo) and major land cover types (tropical forests, other forest, shrubland, savannas, grasslands, and croplands) in both default and constrained ensembles through the importance of each drivers trend to the projected wildfire trend. For the default ensemble, the three MLT models (random forest, support vector machine, and gradient boosting machine) are used to predict the spatial distribution of simulated trends in fire carbon emission using the simulated trends in the socioeconomic, atmospheric, and terrestrial variables that are considered in our observational constraint for wildfires, for each ESM and their multimodel mean. This analysis excludes flash rate, another predictor in constraining future wildfires, because it is not dynamically simulated by most ESMs. For the observation-constrained ensemble, we first constrain the projected atmospheric and terrestrial variables in each future decade, using a similar approach as we constrain future fire carbon emissions, for each individual ESM and their multimodel aggregation. In this constraint for environmental drivers, all the variables in Supplementary Table2 are considered as predictors, thereby achieving self-consistency of the constrained future evolution of all these fire-relevant variables. Noticing that the socioeconomic trends are determined by the SSPs, future socioeconomic developments are therefore not constrained in the current approach. Then, the same three MLT models are used to predict the spatial distribution of constrained trends in fire carbon emissions using the constrained trends in those environmental and socioeconomic drivers. For computational efficiency, only the annual mean trends in the environmental drivers are constrained and analyzed in this task. The importance scores of projected trends in socioeconomic and environmental drivers reflect their dynamic role in future evolution of wildfires in the target tropical regions. Here, the Amazon and Congo regions are shown as examples of how this analysis is applied to understand regional wildfire evolutions, though the mechanism underlying the future evolution of wildfires in other regions could be similarly explored.

Read the rest here:
Machine learningbased observation-constrained projections reveal elevated global socioeconomic risks from wildfire - Nature.com

Related Posts

Comments are closed.