Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Spatial prediction of basal area and volume in Eucalyptus stands using Landsat TM data: an assessment of prediction methods

Spatial prediction of basal area and volume in Eucalyptus stands using Landsat TM data: an... Background: In fast-growing forests such as Eucalyptus plantations, the correct determination of stand productivity is essential to aid decision making processes and ensure the efficiency of the wood supply chain. In the past decade, advances in remote sensing and computational methods have yielded new tools, techniques, and technologies that have led to improvements in forest management and forest productivity assessments. Our aim was to estimate and map the basal area and volume of Eucalyptus stands through the integration of forest inventory, remote sensing, parametric, and nonparametric methods of spatial prediction. Methods: This study was conducted in 20 5-year-old clonal stands (362 ha) of Eucalyptus urophylla S.T.Blake x Eucalyptus camaldulensis Dehnh. The stands are located in the northwest region of Minas Gerais state, Brazil. Basal area and volume data were obtained from forest inventory operations carried out in the field. Spectral data were collected from a Landsat 5 TM satellite image, composed of spectral bands and vegetation indices. Multiple linear regression (MLR), random forest (RF), support vector machine (SVM), and artificial neural network (ANN) methods were used for basal area and volume estimation. Using ordinary kriging, we spatialised the residuals generated by the spatial prediction methods for the correction of trends in the estimates and more detailing of the spatial behaviour of basal area and volume. Results: The ND54 index was the spectral variable that had the best correlation values with basal area (r = − 0.91) and volume (r = − 0.52) and was also the variable that most contributed to basal area and volume estimates by the MLR and RF methods. The RF algorithm presented smaller basal area and volume errors when compared to other machine learning algorithms and MLR. The addition of residual kriging in spatial prediction methods did not necessarily result in relative improvements in the estimations of these methods. Conclusions: Random forest was the best method of spatial prediction and mapping of basal area and volume in the study area. The combination of spatial prediction methods with residual kriging did not result in relative improvement of spatial prediction accuracy of basal area and volume in all methods assessed in this study, and there is not always a spatial dependency structure in the residuals of a spatial prediction method. The approaches used in this study provide a framework for integrating field and multispectral data, highlighting methods that greatly improve spatial prediction of basal area and volume estimation in Eucalyptus stands. This has potential to support fast growth plantation monitoring, offering options for a robust analysis of high-dimensional data. Keywords: Forest inventory, Machine learning algorithms, Multiple linear regression, Random forest, Support vector machine, Artificial neural networks * Correspondence: alinyreis@hotmail.com Department of Forest Science, Federal University of Lavras – UFLA, PO Box 3037, Lavras, Minas Gerais 37200-000, Brazil Full list of author information is available at the end of the article © The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. dos Reis et al. New Zealand Journal of Forestry Science (2018) 48:1 Page 2 of 17 Background region of Brazil from Landsat 5 TM images. Canavesi et The Brazilian forestry sector represents an important al. (2010) used hyperspectral data from the Hyperion EO- share of the products, taxes, jobs, and income generation 1 sensor for the volume estimation of Eucalyptus planta- of the country and accounts for 3.5% of the national tions under different relief conditions. The results found GDP (IBÁ 2015). This is in large part due to the success- by these authors corroborate the potential use of data col- ful establishment of fast-grown plantations of Eucalyptus lected by remote sensing to estimate the productivity of species, which currently occupy around 5.6 million Eucalyptus plantations. hectares (71.9% of the total planted forest area in Brazil) In parallel to the advances in remote sensing, compu- and represent 17% of the harvested wood in the world tational techniques, such as machine learning algorithms (IBÁ 2014, 2015). (MLA), have been increasingly used to model spectral The Eucalyptus genus has more than 500 species, and a and biological data. These techniques overcome the subset of these are used in fast-growing plantations (Barrios difficulties of classical statistical methods such as spatial et al. 2015), commonly located in tropical and sub-tropical correlation, non-linearity of data, and overfitting (Were regions, and more recently in temperate regions. Spain et al. 2015). In addition, these algorithms allow the use (González-García et al. 2015), Portugal (Lopes et al. 2009), of categorical data, with statistical noise and incomplete Uruguay (Barrios et al. 2015), Chile (Watt et al. 2014), data, and therefore are able to address needs under South Africa (Dye et al. 2004), Australia (Verma et al. different dataset scenarios (Breiman 2001). 2014), and the USA (Wear et al. 2015) are some examples Several studies have shown the superiority of machine of productive Eucalyptus plantations in temperate regions learning algorithms in relation to classical statistics in that have cutting cycles ranging from 8 to 12 years. In trop- several areas, such as in forest management. For ical regions such as Brazil, the cutting cycles of Eucalyptus instance, Ahmed et al. (2015) modelled a Landsat time- plantations range from 5 to 7 years (Guedes et al. 2015, series data structure in conjunction with LiDAR data Scolforo et al. 2016). and found that the random forest algorithm achieved Timber production is the main ecosystem service of better results than multiple regression for all forest planted forests and the main management objective for classes. In another study, García-Gutiérrez et al. (2015) these plantations (Gao et al. 2016). In the case of fast- found that machine learning algorithms (mainly support growing plantations, the correct determination of stand vector machine) were superior for modelling a range of productivity is essential to support forest management forest variables (viz., aboveground biomass, basal area, planning strategies (González-García et al. 2015, Retslaff dominant height, mean height, and volume) compared et al. 2015). Traditionally, productivity assessments of a with multiple linear regression. Machine learning algo- plantation are carried out based on field measurements rithms have also been shown to provide an economical of the diameter at breast height (DBH) and tree height and accurate way to estimate aboveground biomass in via forest inventory. However, in fast-growing planta- forests from Landsat satellite images (Wu et al. 2016). tions, field-based inventory programmes may not be These studies highlight the benefits of applying more sufficient to capture productivity differences across the robust techniques in solving problems previously entire area, such as those arising from losses due to pest and resolved by traditional statistical modelling. disease attacks (Coops et al. 2006), or from climatic anomal- In this context, the aims of this study were: (i) to esti- ies (González-García et al. 2015, Scolforo et al. 2016). mate and map basal area and volume of a Eucalyptus In the past decade, advances in geographical informa- plantation through the integration of forest inventory, tion systems (GIS), global positioning systems (GPS), remote sensing, and parametric and nonparametric and remote sensing have provided new tools, techniques, methods of spatial prediction; (ii) to compare the per- and technologies to support forest management. Thus, formance of machine learning algorithms (random low-cost and accurate forest productivity assessment can forest, support vector machine, and artificial neural net- be made, as well as allowing the collection of information works) with the linear regression model; and (iii) to in areas not sampled by forest inventory (Morgenroth and assess the improvement in basal area and volume esti- Visser 2013). The analysis of remote sensing information mation with the addition of residual kriging in spatial combined with field data has been used by several authors prediction methods. to fill the information gap left by data collected only in the field (Watt et al. 2016, Boisvenue et al. 2016, Moreno et al. Methods 2016, Fayad et al. 2016, Vicharnakorn et al. 2014). Ponzoni Study area et al. (2015) used data collected from Landsat 5 thematic The study area is located in Minas Gerais state, the fourth mapper (TM) images for spectral-temporal characterisa- largest state in Brazil, with an area of 586,521 km . Minas tion of Eucalyptus canopies. Berra et al. (2012) estimated Gerais state has the largest area occupied by plantations of the volume of a Eucalyptus plantation in the southern the Eucalyptus genus in the country (1,400,232 ha), dos Reis et al. New Zealand Journal of Forestry Science (2018) 48:1 Page 3 of 17 2 −1 3 −1 corresponding to 25.2% of Brazilian Eucalyptus planta- (m ha ), and total stem volume (m ha ) were obtained tions. The wood from these plantations is mainly used for from the information collected in the plots. the production of charcoal, as well as pulp, lumber, and panels (IBÁ 2015). Remote sensing data and processing The Eucalyptus clonal stands under study are located Spectral data were obtained from a Landsat 5 TM satellite in Lagoa Grande municipality, in the northwest of Minas image, with spatial resolution of 30 m, on the date of June Gerais state (lat. 17° 43′ 00″ S–17° 44′ 00″ S, long. 46° 25, 2009, corresponding with field data collection, in orbit 32′ 00″ W–46° 33′ 00″ W, elevation 560 m a.s.l.) 220, point 072, in bands TM1 (0.45–0.52 μm), TM2 (Fig. 1). According to the Köppen climatic classification (0.52–0.60 μm), TM3 (0.63–0.69 μm), TM4 (0.76–0.90 μm), system, the climate in this region is Aw, classified as a TM5 (1.55–1.75 μm), and TM7 (2.18–2.35 μm). The tropical savanna climate, with drier months during the Landsat 5 TM Surface Reflectance Climate Data Record winter, high annual precipitation in the summer and (CDR) was used, which is a Landsat Level-2A product gener- average temperature of all months greater than 18 °C ated by the Landsat Ecosystem Disturbance Adaptive (Alvares et al. 2013). The average annual rainfall and the Processing System (LEDAPS) (Masek et al. 2006) obtained average monthly rainfall of the dry and wet seasons are from the USGS (United States Geological Survey) database 1430, 8, and 257 mm, respectively. (USGS 2017). These images already contain radiometric calibration, and geometric and atmospheric corrections. Field data description and sampling In addition, vegetation indices using the red, near infra- This study was undertaken in a set of 20 clonal stands of red and short wave infrared spectral bands of Landsat 5 Eucalyptus urophylla S.T.Blake x Eucalyptus camaldulensis TM (Table 2) were calculated, as described by Lu et al. Dehnh, totalling an area of 362.2 ha. These stands were (2004) and Ponzoni et al. (2012). The normalised differ- planted in April and May 2004, with initial spacing of either ence vegetation index (NDVI) is the most widely used 3 × 2 m or 3 × 3 m. The forest inventory was carried out in vegetation index for retrieval of forest biophysical parame- June andJuly2009ona setof35georeferenced square ters (Rouse et al. 1973, Lu et al. 2004). The soil-adjusted plots of 400 m . The plots were georeferenced in the field vegetation index (SAVI) and modified soil-adjusted vege- with GPS (Garmin 60CSx, Garmin Ltd., Olathe, Kansas, tation index (MSAVI) are soil adjusted vegetation indices USA). The sampling procedure adopted was systematic, used to reduce the effect of soil background reflectance allocating approximately one plot per 10 ha of forest. In (Qi et al. 1994). The enhanced vegetation index (EVI) was each plot, the diameter at breast height (DBH) of all stems developed to optimise the vegetation signal, correcting was measured, as well as the total height of the first 15 trees reflected light distortions caused by particulate matter sus- with normal stems (without bifurcation or any other defect) pended in the air, as well as by influence of background and height of dominant trees (the 100 largest diameter trees data under the vegetation canopy (Justice et al. 1998). The per hectare). Descriptive statistics of the variables collected global environment monitoring index (GEMI) minimises in the field are shown in Table 1. Estimates of basal area atmospheric effects, similar to the EVI and minimises Fig. 1 Geographic location of the Eucalyptus stands and sampling grid dos Reis et al. New Zealand Journal of Forestry Science (2018) 48:1 Page 4 of 17 Table 1 Descriptive statistics of the variables collected in the field divided into two datasets: prediction or fitting set (70% of the database) and validation set (30% of the database). Statistic DBH H Hd Therefore, 25 plots were used for basal area and volume Minimum 11.98 16.98 19.40 predictions, and 10 plots were used for validation of the Maximum 15.45 24.63 26.38 different approaches to estimate basal area and volume Mean 14.02 21.18 22.98 in the Eucalyptus stands under study. Standard deviation 0.85 2.33 1.92 Pearson correlation analysis was carried out among DBH diameter at breast height (cm), H total height (m), Hd dominant basal area, volume, values of spectral bands, and vegeta- height (m) tion indices. From these correlations, the relationship between the dendrometric characteristics of Eucalyptus observational angular effects in the observed vegetation stands and its spectral response in Landsat images was index signal (Pinty and Verstraete 1992). explored. Dataset integration Multiple linear regression (MLR) analysis The choice of an appropriate pixel size is one of the issues Basal area and volume estimation were accomplished to be considered when using remote sensing data to esti- through MLR analysis. A stepwise variable elimination mate dendrometric characteristics. Due to easy accessibil- method was used in conjunction with the Akaike infor- ity and affordability, a number of studies have employed mation criterion (AIC) to select only those spectral vari- Landsat images and found statistically significant correla- ables that “best” explained basal area and volume tions between remotely sensed data and dendrometric variation. The residuals from regression models were characteristics using ground plots ranging from 315 to analysed to assess the existence of trends in the errors. 2500 m (Dube and Mutanga 2015, López-Sánchez et al. The variance inflation factor (VIF) was used to detect 2014, Zhang et al. 2014, López-Serrano et al. 2016). possible correlations between explanatory variables Although the size of a single plot (20 × 20 m) in this (multicollinearity). The adopted VIF cutoff value was 10. study does not cover a Landsat pixel, we considered that a plot represents an area larger than its size. As the Random forest (RF) sampling design was one plot per hectare, we ensured The RF algorithm, initially proposed by Breiman (2001), that each plot matched with the reference pixel in order is an ensemble method that generates a set of individu- to extract reliable data. ally trained decision trees and combines their results. The greatest advantage of these decision trees as regres- Spatial modelling and prediction methods sion methods is that they are able to accurately describe Exploratory data analysis complex relationships among multiple variables, and by Spectral response was extracted from the Landsat TM aggregating these decision trees, more accurate solutions bands and vegetation indices from the geographical are generated (Gleason and Im 2012). In addition to coordinates of the forest inventory plots. Thus, the plot these characteristics, RF is an easy parameterisation 2 −1 database was composed of basal area (m ha ), volume method (Immitzer et al. 2012). This method has shown 3 −1 (m ha ), spectral band values, and vegetation index great potential in regression studies with integration of values. The total database (35 plots) was systematically spectral data, in some cases generating better results Table 2 Vegetation indices used in the spectral characterisation of the Eucalyptus stands Vegetation indices Formulation Reference NDVI (TM4 − TM3)/(TM4 + TM3) Rouse et al. (1973) ND53 (TM5 − TM3)/(TM5 + TM3) Huete et al. (2002) ND54 (TM5 − TM4)/(TM5 + TM4) Huete et al. (2002) ND57 (TM5 − TM7)/(TM5 + TM7) Huete et al. (2002) SAVI [(TM4 − TM3)/(TM4 + TM3 + 0.5)].(1.5) Huete (1988) qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi MSAVI ðÞ 2TM4 þ 1 −ðÞ 2TM4 þ 1 −8ðÞ TM4−2TM3 =2 Qi et al. (1994) EVI 2.5 × [(TM4 − TM3)/(TM4 + 6TM3 − 7.5TM1 + 1)] Justice et al. (1998) GEMI n(1 − 0.25n). [(TM3 − 0.125)/(1 − TM3)] Pinty and Verstraete (1992) 2 2 2 TM4 −TM3 þ1:5TM4þ0:5TM3 ðÞ n ¼ TM4þTM3þ0:5 TM thematic mapper, ND normalised difference, NDVI normalised difference vegetation index, SAVI soil-adjusted vegetation index, MSAVI modified soil-adjusted vegetation index, EVI enhanced vegetation index, GEMI global environment monitoring index dos Reis et al. New Zealand Journal of Forestry Science (2018) 48:1 Page 5 of 17 than conventional techniques (Stojanova et al. 2010, activation function was employed in all neurons. Deter- Dube et al. 2014, García-Gutiérrez et al. 2015, Görgens mined by previous tests, ANNs were structured with 14 et al. 2015, Wu et al. 2016). The RF algorithm fitted in neurons in the input layer (number of variables), 1 this study is implemented in the open-source software neuron in the hidden layer, and 1 neuron in the output WEKA 3.8 (Frank et al. 2016). Tests were carried out layer, corresponding to estimated basal area or volume. with the exchange of tree numbers and attribute num- The learning rate, the momentum term, and iteration bers to be drawn. Then, 20 trees with 10 attributes to be numbers were fixed at 0.3, 0.5, and 500 for basal area, drawn by node for basal area and 80 trees and 11 attri- and 0.2, 0.7, and 500 for volume, respectively. butes for volume were fixed. Relative importance evaluation Support vector machine (SVM) The variable importance was assessed for each model SVMs operate by assuming that each set of inputs will with a removal-based approach in order to avoid the have a unique relation to the response variable and that limited interpretability of the MLA and to verify how the grouping and the relation of these predictors to one each independent variable contribute to the performance another is sufficient to identify rules that can be used to of machine learning algorithms (RF, SVM, and ANN). predict the response variable from new input sets. For All algorithms were adjusted n times, with n being the this, SVMs project the input space data into a feature number of available variables. At each time, one variable space with a much larger dimension, enabling linearly was removed from the training set and then the root non-separable data to become separable in the feature mean square error (RMSE) of the algorithm was quanti- space. This method has been successfully used in fied. At the end, the obtained errors were normalised by forestry classification problems (Huang et al. 2008, Shao the ratio of the largest RMSE so that they were between and Lunetta 2012) and more recently in regression prob- 0 and 1 and multiplied by 100 (Were et al. 2015). The lems with the use of spectral data (García-Gutiérrez et variable that results in the highest RMSE when removed al. 2015, Wu et al. 2016). The Kernel function used in from the database is the variable with the highest rela- the present study was the Gaussian or radial basis func- tive importance within the model. This methodology tion (RBF). The algorithm used is implemented in was chosen because it can be consistently applied to all WEKA 3.8 software under the sequential minimal algorithms, allowing comparisons of variable contribu- optimization (SMO) function. Values of parameters C tion between the methods. and σ (bandwidth or influence range of each training point in the RBF) were tested within the interval (10 ) Geostatistical modelling of prediction methods errors i= , where the least squared mean error Spatial prediction methods capture the average behaviour − 3, − 2, − 1, 0, 1, 2, 3 configuration was chosen for application. For basal area of the main variable, allowing the identification of its and volume, selected C and σ values were 10 and 0.1, general spatial behaviour, without detailing more specific and 100 and 0.01, respectively. areas or regions. For details of specific regions, estimates obtained exclusively from the auxiliary variables need to Artificial neural networks (ANNs) be corrected. Thus, residuals generated by spatial predic- ANNs are a parallel-distributed information processing tion methods (MLR, RF, SVM, and ANN) were used for system that simulates the working of neurons in the hu- the correction of trends in the estimates and for detailing man brain, being able to learn from examples. Artificial the spatial behaviour of the main variables (basal area and neural networks are widely used to model complex and volume) using ordinary kriging. The interpolated values of non-linear relations between inputs and outputs or to the residuals were then added to the estimates of the determine patterns in data (Diamantopoulou 2012). The spatial prediction methods (MLR, RF, SVM, and ANN). use of this technique in conjunction with remote sensing Thus, we obtained the basal area and volume estimates data is consolidated in several studies (Cluter et al. 2012, corrected by the ordinary kriging of the residuals for each García-Gutiérrez et al. 2015, Rodriguez-Galiano et al. spatial prediction method. 2015, Were et al. 2015). We used the ANN obtained by For the application of ordinary kriging to the spatial running the Multilayer Perceptron function (of the prediction method residuals, we considered the station- multilayer perceptron type) provided by WEKA 3.8 soft- arity presupposition of the intrinsic hypothesis (Journel ware. The training of neural networks occurred through and Huijbregts 1978), through fitting of theoretical func- the back-propagation algorithm, which fit the weights of tions to experimental semivariogram models. Spherical, all the layers of the network from the backpropagation exponential, and Gaussian models were fitted to the of the error, obtained in the output layer. The weights semivariogram of the residuals from each spatial predic- updating was carried out according to the error, learning tion method using weighted least squares. The semivar- 2 2 rate, and momentum terms (Delta rule). The sigmoidal iogram parameters (nugget (τ ), sill (σ ), and range (ϕ)) dos Reis et al. New Zealand Journal of Forestry Science (2018) 48:1 Page 6 of 17 were calculated from the best fitted models, which pro- Data analysis for this study was performed using vided information about the spatial structure as well as the following software: R (R Core Team 2016) with input parameters for the kriging interpolation. The nug- the geoR package (Ribeiro Júnior and Diggle 2001), get represents the minimum semivariance among differ- WEKA 3.8 (Frank et al. 2016), and ArcGis version ent sampling intervals. Nugget values greater than zero 10.1 (Esri 2010) with Geostatistical Analyst extension represent a combination of experimental error and of (Esri 2010). unresolved spatial variability occurring at scales smaller than inter-sampling lag distance. Sill is the plateau Results reached by the values of semivariance and indicates the Descriptive statistic of the measured basal area and amount of variation than can be explained by the spatial volume structure of the data. Range is the distance at which the 2 −1 Basal area ranged from 10.07 to 21.63 m ha , with semivariogram reaches the plateau, indicating the dis- 2 −1 average of 16.86 m ha and standard deviation of tance which values are spatially correlated. The evalu- 2 −1 2.4 m ha (Table 3). The average volume was ation of the performance of each semivariogram model 3 −1 3 169.34 m ha with a standard deviation of 29.66 m ha and the selection of the best models were based on −1 3 −1 and range from 95.80 up to 213.85 m ha . Basal area cross-validation, which estimates the reduced average had a lower coefficient of variation (CV = 14.26%) com- error (RAE) and the standard deviation of the reduced pared to volume (CV = 17.51%), demonstrating a consid- average error (SRE) (Yamamoto and Landim 2013). erable homogeneity of this dendrometric characteristic in the evaluated Eucalyptus stands. Validation and assessment of the prediction methods The different approaches to basal area and volume esti- mation of Eucalyptus stands were evaluated by compar- Correlation among basal area, volume, spectral bands, ing the basic statistics of the predicted maps (mean and and vegetation indices standard deviation) with the estimates obtained from the The correlation between plot basal area and the different forest inventory, and through the discrepancies between spectral bands and their ratios (Table 4) ranged from − 0.91 observed and predicted values in the fitting and valid- (ND54) to 0.15 (TM2). The SAVI, MSAVI, GEMI, and EVI ation datasets. These discrepancies were evaluated using were also highly correlated with basal area (r > 0.85). The the mean error (ME), the mean absolute error (MAE), correlation between plot volume and the spectral bands and the root mean square error (RMSE), as described in and ratios ranged from − 0.52 (ND54) to − 0.02 (TM2). Eqs. 1–3. The NDVI (r =0.49) and SAVI (r = 0.47) also had high 1 N correlations with volume, but these were lower in ME ¼ X −X ð1Þ i i i¼1 N magnitude when compared with those for basal area. Many of the spectral bands and ratios were also 1 N ^ highly correlated with each other (r > 0.90), which can MAE ¼ X −X ð2Þ i i i¼1 be considered a drawback due possible to multicolli- rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi nearity problems in linear regression models. 1 N RMSE ¼ X −X ð3Þ i i i¼1 Table 3 Descriptive statistics obtained from forest inventory where N is the number of values in the dataset; X is the processing using the estimators of simple random sampling estimated value of the main variable; X is the observed (SRS) value in the prediction and validation sets. Estimators Basal area Volume The relative improvement (RI) achieved by residual 2 a 2 −1 3 a 3 −1 (m ) (m ha )(m ) (m ha ) kriging for a particular spatial prediction method was calculated by comparing the change in RMSE when the Minimum 0.91 10.07 8.62 95.80 residual kriging was applied using Eq. 4. Maximum 1.95 21.63 19.25 213.85 Mean 1.52 16.86 15.24 169.34 RMSE −RMSE spm spm‐RK RI ¼  100% ð4Þ Standard deviation 0.22 2.4 2.67 29.66 RMSE spm Coefficient of variation (%) 14.26 17.51 where RMSE is the root mean square error of a Sampling error (%) 4.89 6.00 spm spatial prediction method, RMSE is the root mean spm ‐ RK Total confidence interval 5807.9–6405.0 57,652.7–65,018.7 square error of the spatial prediction method when a 2 Estimates obtained for an area of 900 m (corresponding to the area of each residual kriging is added to this method. pixel of the Landsat images) dos Reis et al. New Zealand Journal of Forestry Science (2018) 48:1 Page 7 of 17 Table 4 Pearson’s correlation coefficient (r) among basal area, volume, and spectral data for the Eucalyptus stands Variables 1 2 3 4 5 6789 10 11 12 13 14 15 16 1. G 1.00 2. V 0.70* 1.00 ns ns 3. TM1 − 0.24 0.10 1.00 ns ns 4. TM2 0.15 − 0.02 0.59* 1.00 ns ns 5. TM3 − 0.20 − 0.10 0.80* 0.72* 1.00 ns ns 6. TM4 0.82* 0.41* − 0.05 0.43* 0.12 1.00 ns ns 7. TM5 − 0.66* − 0.36 0.53* 0.31 0.68* − 0.40* 1.00 ns 8. TM7 − 0.68* − 0.40* 0.56* 0.32 0.66* − 0.42* 0.90* 1.00 ns 9. NDVI 0.83* 0.49* − 0.53* − 0.13 − 0.55* 0.75* − 0.78* − 0.82* 1.00 ns ns ns ns ns 10. ND53 − 0.60* − 0.32 − 0.29 − 0.50* − 0.37 − 0.66* 0.43* 0.31 − 0.31 1.00 ns ns ns 11. ND54 − 0.91* − 0.52* 0.31 − 0.09 0.30 − 0.86* 0.80* 0.78* − 0.93* 0.65* 1.00 ns ns ns ns 12. ND57 0.45* 0.27 − 0.49* − 0.28 − 0.49* 0.27 − 0.50* − 0.82* 0.60* 0.00 − 0.48* 1.00 ns ns ns 13. SAVI 0.88* 0.47* − 0.23 0.25 − 0.12 0.97* − 0.57* − 0.60* 0.89* − 0.57* − 0.94* 0.41* 1.00 ns ns ns 14. MSAVI 0.88* 0.45* − 0.36 0.13 − 0.27 0.92 − 0.65* − 0.67* 0.94* − 0.50* − 0.95* 0.46* 0.99* 1.00 ns ns ns ns 15. GEMI 0.86* 0.45* − 0.14 0.34 0.00 0.99* − 0.49* − 0.52* 0.83* − 0.62* − 0.91* 0.35 0.99* 0.96* 1.00 ns ns 16. EVI 0.87* 0.42* − 0.41* 0.12 − 0.28 0.92* − 0.64* − 0.67* 0.94* − 0.48* − 0.94* 0.47* 0.98* 1.00* 0.96* 1.00 3 −1 2 −1 V volume (m ha ), G basal area (m ha ), TM thematic mapper, ND normalised difference, NDVI normalised difference vegetation index, SAVI soil-adjusted vege- tation index, MSAVI modified soil-adjusted vegetation index, GEMI global environment monitoring index, EVI enhanced vegetation index ns Not significant at 5%; *significant at 5% Spatial prediction of basal area and volume by MLR, RF, ANN and SVM models of both basal area and volume. SVM, and ANN The TM1 band, selected by the MLR for volume estima- The spectral data examined had several significant corre- tion, also had high importance in the ANN and SVM lations with the basal area and volume data (Table 4). models of volume. However, they contributed in a reduced form to the Comparisons of measured values and estimated values regression models due to multicollinearity problems, of basal area (Fig. 3) showed that basal area was under- which resulted in final regression models with few estimated by the ANN model (Fig. 3d). The model fitted significant explanatory variables (Table 5). The basal area using the RF algorithm produced values of basal area model only included the ND54 vegetation index that were in closer agreement with measured values (Table 5), while the volume model included the TM1 (Fig. 3b). Similar results were seen for the volume band and NDVI. The coefficient of determination was models, but with a slight overestimation for the plots high for the basal area model (R = 0.81), but was much with small volumes and an underestimation of the plots lower for the volume model (R = 0.37). with high volumes. The model fitted using ANN In the case of basal area and volume predictions using algorithm did not produce estimates of volume that machine learning algorithms, the increases in RMSEs were consistent with measured values (Fig. 3h). The when the predictors were excluded one by one from the models fitted using the MLR and SVM (Fig. 3e, g) algo- SVM, ANN, and RF models are shown in Fig. 2. The rithms produced predicted values that were more closely variable ranking by relative importance differed for each related to the measured values than those from the algorithm. The ND54 index, chosen for basal area model ANN algorithm. by the MLR, also had the greatest effect on the accuracy Prediction and validation sets of basal area and volume of the RF model, both for basal area and volume. The were compared by means of Student’s t test, in order to TM2 band had the highest relative importance for the check if they provided unbiased subsets of the original Table 5 Regression model fitted for basal area and volume estimation for the Eucalyptus stands Model β β β R S S (%) 0 1 2 aj xy xy G= β + β ND54 0.78*** − 1.80*** – 0.81 0.09 5.76 0 1 V= β + β NDVI + β TM1 − 24.11* 42.69*** 241.61* 0.37 2.01 13.08 0 1 2 2 3 2 G basal area (m ), V volume (m ), β , β , and β coefficients, R adjusted coefficient of determination, S residual standard error, TM thematic mapper, ND 0 1 2 aj xy normalised difference, NDVI normalised difference vegetation index ***Significant at 1%; *significant at 10% dos Reis et al. New Zealand Journal of Forestry Science (2018) 48:1 Page 8 of 17 Fig. 2 Relative importance of the variables within each machine learning algorithm: RF, SVM, and ANN for basal area and volume 2 −1 data (Viana et al. 2012). Average basal area (17.03 m ha ) SVM had the best performance and MLR the poorest 3 −1 and volume (171.10 m ha ) obtained from the prediction performance. set did not statistically differ from average basal area 2 −1 3 −1 (16.45 m ha ) and volume (164.92 m ha ) obtained Geostatistical modelling of prediction method errors from the validation set, considering two-tailed Student’s t The semivariogram models were selected based on RAE ns test (Basal area: t = 0.629 , df = 33, p value = 0.533; volume: and SRE values close to 0 and 1, respectively (Yamamoto ns t = 0.550 ,df=33, p value = 0.585). and Landim 2013). The experimental semivariograms The evaluation of spatial prediction methods, based on constructed from the residuals of the basal area and prediction and validation sets, was done by comparing volume prediction methods had a spatial dependence the statistics presented in Eqs. 1 through 4 (Table 6). structure defined in six of the eight analysed situations The mean error (ME) should ideally be close to zero if (Fig. 4 and Table 7). The volume residuals from MLR the prediction method is unbiased, and the values of this and ANN methods had a pure nugget effect, i.e. no parameter suggested that all predictions generated im- spatial dependence structure. This result indicated a partial estimates when evaluated from both prediction random spatial distribution of the residuals in these two and validation sets. Both the MAE and RMSE showed situations. that basal area estimates were more accurate than The residuals of the spatial prediction methods that volume estimates for all spatial prediction methods. The had defined spatial dependence structures (Fig. 4) were MAE and RMSE results obtained from the validation set interpolated using ordinary kriging, and their estimates demonstrated that there were no significant differences were added to basal area and volume estimates of the among the MLR, RF, SVM, and ANN for basal area respective spatial prediction methods. The relative estimates. For the volume estimates, the models fitted by improvement (RI) of the addition of basal area residual dos Reis et al. New Zealand Journal of Forestry Science (2018) 48:1 Page 9 of 17 Table 6 Prediction methods evaluation using the prediction and validation sets for the Eucalyptus stands 2 3 Method Statistic Basal area error (m ) Volume error (m ) Prediction Validation Prediction Validation set set set set MLR ME 0.00 − 0.05 0.00 − 0.74 MAE 0.07 0.09 1.56 2.08 RMSE 0.08 0.14 1.89 2.48 RMSE 5.50 9.42 12.27 16.72 (%) RF ME 0.01 − 0.03 0.08 − 0.90 MAE 0.03 0.09 0.62 1.63 RMSE 0.04 0.14 0.73 2.21 RMSE 2.48 9.54 4.77 14.91 (%) SVM ME − 0.01 − 0.05 0.00 − 0.66 MAE 0.04 0.09 1.19 1.59 RMSE 0.06 0.14 1.60 2.02 RMSE 4.14 9.39 10.41 13.58 (%) ANN ME 0.09 0.03 0.94 0.45 MAE 0.10 0.09 1.70 1.68 RMSE 0.14 0.13 1.98 2.05 RMSE 8.87 8.52 12.88 13.82 (%) MLR multiple linear regression, RF random forest, SVM support vector machine, ANN artificial neural networks, ME mean error, MAE mean absolute error, RMSE root mean square error and volume were not within the confidence interval generated by the forest inventory. Maps showing the spatial distribution of basal area Fig. 3 Scatter plots of measured values versus estimated values by: MLR (a) and (e); RF (b) and (f); SVM (c) and (g); and ANN (d) and (h) and volume identified the same areas with high and low for basal area and volume, respectively. A 1:1 line (black, dashed) is productivity, regardless of the spatial prediction method provided for reference (Figs. 5 and 6). The maps obtained by ANN had a smaller difference between maximum and minimum kriging by the ANN method was 25%, i.e. there was a estimated values for basal area and volume, while the reduction from 8.52 to 6.37% in the RMSE (Table 8). For mapping obtained from the SVM models had a greater the RF method, the RMSE increased from 9.54 to difference between these values. MLR and RF methods 10.08%, which corresponds to a 5.7% increase in the provided similar estimates in the basal area and volume error of the basal area estimates by kriging of the resid- mapping. uals. For the volume, the addition of residual kriging The addition of residual kriging in the basal area and improved the precision of SVM estimates and reduced volume mapping (Fig. 7) resulted in a greater difference the precision of the RF estimates. between maximum and minimum estimated values in all spatial prediction methods. For ANN, residual kriging resulted in estimates that were more in agreement with Mapping of basal area and volume for Eucalyptus stands the field observations, correcting the basal area under- Basal area and volume estimates obtained by different estimation behaviour for the Eucalyptus stands under spatial prediction methods (Table 9) had average values study. However, the addition of residual kriging to the very close to each other, and were in agreement with the models fitted by RF and SVM methods did not result in forest inventory estimates (Table 3). Only the ANN significant differences in basal area and volume map- method generated underestimated values for both basal ping, and also led to increases in estimation errors in area and volume, so that the total values of basal area non-sampled areas in the field (Table 8). dos Reis et al. New Zealand Journal of Forestry Science (2018) 48:1 Page 10 of 17 Discussion Remote detection of forest canopies is complex due to the size, shape, and dielectric properties of its scatter elements (leaves, branches, and stems) (Galeana-Pizaña et al. 2014). The spatial diversity of forest canopies makes the relationship between forest parameters and remote sensing data a major challenge, although several studies have already demonstrated correlation between spectral data and forest characteristics of interest (Stojanova et al. 2010, Viana et al. 2012, Castillo-Santiago et al. 2013, Fayad et al. 2016, Gao et al. 2016). For instance, plantations comprised of different Eucalyptus species may have very similar values of basal area and volume, but have different spectral characteristics due to differences in spectral behav- iour of the species that form the canopies. Also, according to Ponzoni et al. (2015), the canopy reflectance of older Eucalyptus plantations (between 4 and 6 years) tend to contain a greater contribution from green leaves and a lower contribution from shadows, the background, and from dry branches inside the canopies than the canopy reflectance of young Eucalyptus plantations (< 4 years). Thus, the canopy reflectance of older Eucalyptus planta- tions generated highest correlations with bands of the in- frared region of the electromagnetic spectrum and, therefore, with vegetation indices that include these bands in their compositions (Ponzoni et al. 2015). These results are consistent with the best correlations found in this study among the infrared bands, vegetation indices derived from these bands, basal area, and volume. This same behaviour was observed in the studies of Gebreslasie et al. (2008), Canavesi et al. (2010), Berra et al. (2012), and Pacheco et al. (2012). Basal area was more strongly correlated with the spec- tral data because this variable is derived from only the diameter of the trees, which is directly related to size of the tree canopies, and determines the canopy reflectance Fig. 4 Experimental semivariograms of residuals from: MLR (a)and (e); (Ponzoni et al. 2012). On the other hand, volume is RF (b)and (f); SVM (c) and (g); and ANN (d) and (h)for basal area and derived from the diameter, form factor, and height of the volume, respectively trees. Height estimates are obtained from empirical equations that add errors during the volume estimation 2 2 Table 7 Nugget (τ ), sill (σ ), and range (ϕ) parameters for the selected semivariance function models for each of the variables in study 2 2 Variables Residual Selected model τ σ ϕ (m) RAE SRE Basal area MLR Exponential 0.0016 0.0067 1350 − 0.0092 1.0818 RF Spherical 0.0004 0.0009 737 − 0.0079 1.0586 SVM Gaussian 0.0017 0.0037 1577 0.0089 0.9610 ANN Exponential 0.0000 0.0119 1430 − 0.0303 1.1393 Volume MLR Exponential PNE PNE PNE PNE PNE RF Spherical 0.3316 0.2505 773 − 0.0051 1.0258 SVM Exponential 0.0000 2.5582 858 − 0.0039 0.9958 ANN Exponential PNE PNE PNE PNE PNE MLR multiple linear regression, RF random forest, SVM support vector machine, ANN artificial neural networks, RAE reduced average error, SRE standard deviation of the reduced average error, PNE pure nugget effect dos Reis et al. New Zealand Journal of Forestry Science (2018) 48:1 Page 11 of 17 Table 8 Prediction methods with addition of the residual estimation by ordinary kriging using the prediction and validation sets for the Eucalyptus stands 2 3 Method Statistic Basal area error (m ) Volume error (m ) Prediction set Validation set Prediction set Validation set MLR-RK ME 0.00 − 0.03 –– MAE 0.03 0.09 –– RMSE 0.04 0.14 –– RMSE (%) 2.80 9.30 –– RI 49.09 1.27 –– RF-RK ME 0.01 − 0.05 0.00 − 1.03 MAE 0.04 0.10 0.63 1.70 RMSE 0.05 0.15 0.77 2.26 RMSE (%) 3.08 10.08 5.02 15.25 RI − 24.19 − 5.66 − 5.24 − 2.28 SVM-RK ME 0.01 − 0.03 − 0.32 − 0.57 MAE 0.05 0.10 0.80 1.22 RMSE 0.06 0.15 1.11 1.74 RMSE (%) 4.09 9.83 7.19 11.72 RI 1.21 − 4.69 30.93 13.70 ANN-RK ME 0.02 − 0.06 –– MAE 0.04 0.06 –– RMSE 0.09 0.09 –– RMSE (%) 5.79 6.37 –– RI 34.72 25.23 –– MLR multiple linear regression, RF random forest, SVM support vector machine, ANN artificial neural networks, RK residual estimation by ordinary kriging, ME mean error, MAE mean absolute error, RMSE root mean square error, RI relative improvement process. This acts to reduce the strength of relationships regression assumptions is that no linear relationship may between volume and variables obtained from remotely exist between any independent variables or linear combina- sensed images. The ND54 index was the spectral variable tions of these (Montgomery et al. 2006). that had the strongest correlation with basal area (r = − 0.91) For the MLR method, the best volume estimation and volume (r = − 0.52). However, it was also significantly model was obtained from the TM1 band and the NDVI correlated with the other spectral variables. During multiple (Table 5), yet was only able to explain approximately linear regression analysis, the fact that two or more explana- 37% of the variation in this stand attribute. Conversely, tory variables are highly correlated may generate multicolli- the best model for basal area estimation used the ND54 nearity problems in the fitted models, since one of the index as the predictor variable and was able to explain Table 9 Statistics of basal area and volume maps estimated by spatial predictions methods MLR, RF, SVM, and ANN 2 3 Method Basal area (m ) Volume (m ) Min Max Mean Standard deviation Total estimate Min Max Mean Standard deviation Total estimate MLR 0.62 1.83 1.52 0.20 6151.9 4.51 19.99 15.30 2.30 61,739.5 MLR-RK 0.65 1.93 1.52 0.21 6141.0 –– – – – RF 0.96 1,89 1.51 0.17 6101.5 9.26 18.08 15.27 1.81 61,600.1 RF-RK 0.93 1.93 1.53 0.17 6166.6 9.02 18.37 15.36 1.91 61,965.7 SVM 0.88 2.12 1.57 0.18 6326.2 1.36 19.64 15.31 2.57 61,760.7 SVM-RK 0.76 2.10 1.56 0.19 6284.2 1.10 21.78 15.29 2.92 61,683.8 ANN 0.97 1.65 1.42 0.22 5715.3 8.32 15.68 13.93 2.70 56,223.8 ANN-RK 0.90 1.94 1.50 0.23 6070.3 –– – – – Min minimum value, Max maximum value, MLR multiple linear regression, RF random forest, SVM support vector machine, ANN artificial neural networks, RK residual estimation by ordinary kriging dos Reis et al. New Zealand Journal of Forestry Science (2018) 48:1 Page 12 of 17 Fig. 5 Spatial distribution of the basal area in Eucalyptus stands, estimated by: MLR (a), RF (b), SVM (c), and ANN (d) more than 80% of the variation in this attribute, con- species in southern KwaZulu Natal, South Africa. These firming the explanatory power of spectral data for basal authors applied a MLR using MSAVI and band 3 as pre- area estimation in Eucalyptus stands. Gebreslasie et al. dictor variables and were able to explain slightly more of the 2 2 (2010) assessed the suitability of both visible and short- variation in basal area (R = 0.67) than volume (R = 0.65). wave infrared ASTER data and vegetation indices for Although the MLR model for volume does not have a high estimating forest structural attributes of Eucalyptus coefficient of determination, the spectral data can efficiently Fig. 6 Spatial distribution of the volume in Eucalyptus stands, estimated by: MLR (a); RF (b); SVM (c); and ANN (d) dos Reis et al. New Zealand Journal of Forestry Science (2018) 48:1 Page 13 of 17 Fig. 7 Spatial distribution of the basal area in Eucalyptus stands estimated by: MLR (a): RF (b); SVM (c); and ANN (d) with addition of the residual estimation by ordinary kriging; and for volume estimated by RF (e) and SVM (f) with addition of the residual estimation by ordinary kriging explain the volumetric variations in non-sampled areas in allows them to work with all available explanatory vari- the field. In a similar study for Eucalyptus stands located in ables, without loss of information in the process of vari- the southern region of Brazil, Berra et al. (2012) concluded able selection and reduction (Görgens et al. 2015). For the that spectral data obtained from Landsat images were effi- models fitted using ANN and SVM algorithms, the TM2 cient in mapping the volume in the study area, even when band was the most important predictor variable for basal the regression models did not present high coefficients of area and volume. The linear correlation between this determination (R < 0.70). variable and basal area and volume is low to non-existent Divergence among variables that were deemed import- (r =0.15 and − 0.02, respectively). However, this band is ant between the different methods was observed with the usually applied in vegetation vigour assessment (Meng et machine learning algorithms. For basal area modelling, al. 2009), a characteristic that is indirectly related to the ND54 index and NDVI had a higher importance value volume and basal area, and which may explain the greater for RF. Statistically, these indices had high correlation contribution of the TM2 band in the ANN and SVM algo- values with the variable of interest (r = − 0.91 and 0.83, rithms, since trees that are more vigorous tend to have respectively) and high multicollinearity (r = − 0.93). The higher values of basal area and volume. ND54 index also was the variable that most contributed The models of basal area and volume developed by to the volume estimate by the RF method. The fact that the RF algorithm had smaller errors compared with the explanatory variables are correlated does not affect the those developed by other machine learning algorithms performance of these algorithms. These methods do not and MLR. The performance of this algorithm has been rely on underlying assumptions about the data, which proven in many modelling and remote sensing studies dos Reis et al. New Zealand Journal of Forestry Science (2018) 48:1 Page 14 of 17 (Lafiti et al. 2010, Rodriguez-Galiano et al. 2015, Wu et analysis and machine learning algorithms). However, in al. 2016). In the study by Shataee et al. (2012), volume some situations, hybrid methods provide less-accurate prediction models developed by RF performed better estimates in regions where the data collected in the field than those developed using k-nearest neighbour (k-NN) are sparse (Palmer et al. 2009). and SVM. Employing ASTER satellite data, the relative The high growth rate of Eucalyptus stands in Brazil RMSE obtained for all three volume models was higher reinforces the importance of robust methods that con- than for the models developed in our study: 28.54% for sider auxiliary information in the process of estimating k-NN, 25.86% for SVM, and 26.86% for RF, and only the variables of interest, such as basal area and volume. The RF algorithm produced unbiased volume estimations. methodologies presented here are powerful tools for For basal area, RF produced models with lower RMSE estimating basal area and volume from spectral data ob- (18.39%) when compared with SVM (RMSE = 19.35%) tained from Landsat 5 TM or from other multispectral and k-NN (RMSE = 20.20%); however, only k-NN was optical sensors. According to Görgens et al. (2015), ma- able to generate unbiased estimation compared with the chine learning algorithms can continuously learn from other two algorithms used. new data and keep all the accumulated knowledge of One of the positive features of RF is that it achieves previous datasets. This fact allows the implementation of satisfactory performance even with a limited number of these algorithms in other situations where only limited samples and with many independent variables (attri- amounts of data are available. The use of all auxiliary butes), as in the case of this current study. It is an variables in the estimation process is another advantage ensemble method, which combines several regression over traditional regression methods, since machine trees to generate an average estimate, in which different learning algorithms are not restricted by correlation attributes are used in each tree, making the results take between input variables, thus avoiding the loss of into account the information of all available attributes. important information in the estimation process of the Stojanova et al. (2010) also concluded that ensemble variable of interest. Nevertheless, these methods have as methods (RF) were significantly better in height and disadvantage the transparency of the resulting models, canopy cover modelling using remote sensing data than so an alternative to overcome this obstacle is the evalu- single- and multi-target regression trees. The ANN and ation of the relative importance of the explanatory vari- SVM algorithms also have proven good performance ables. Furthermore, the causal relation between inputs and robustness in several studies (e.g. Shao and Lunetta and outputs of the estimation process is not clear, which 2012, Were et al. 2015). However, the parameterisation implies a limited biological interpretation (Aertsen et al. of these methods is laborious, and they are very sensitive 2010, Özçelik et al. 2013). to the variation of input parameters, with ANN being The results from the current study do need to be more sensitive than other methods (Rodriguez-Galiano interpreted cautiously, as they are limited to a et al. 2015). This same behaviour was observed in this homogenous and relatively small study area. While this study, where the use of a restricted dataset by ANN work uses a small number of plots, it represents the resulted in estimates that were not compatible with the sampling intensity adopted by most Brazilian forestry forest inventory estimates (Tables 3 and 9). companies, i.e. one plot (usually 200–500 m in size) for The addition of residual kriging in spatial prediction each 10 ha of Eucalyptus plantation (Raimundo et al. methods did not necessarily result in relative improve- 2017, Scolforo et al. 2016) and the results from this ments in the estimation of these methods. In the case of research showcase the importance of using remotely MLR and ANN methods, residual kriging contributed to sensed data and robust prediction methods for basal better accuracy of the basal area estimates. These results area and volume estimation. The data used here were are consistent with the results of Dai et al. (2014), who also from a relatively old sensor, Landsat 5 TM, and a reported that the combination of the residual kriging study by Fassnacht et al. (2014) concluded that predictor with artificial neural networks provides an improvement data (sensor) type is the most important factor for the in the estimate accuracy of the variables of interest. The accuracy of biomass estimates and that the prediction combination of MLR with residual kriging also provided method had a substantial effect on accuracy and was improvements in estimates in the studies of Viana et al. generally more important than the sample size. (2012), Castillo-Santiago et al. (2013), and Galeana- Fassnacht et al. (2014) also suggested that choosing the Pizaña et al. (2014). For basal area and volume estima- appropriate statistical method may be more effective tion, the addition of residual kriging in the RF and SVM than obtaining additional field data for obtaining good methods resulted in a lower precision of the estimates. biomass estimates. Hybrid methods are advantageous in the ability to use Considering the cost of improving accuracy of timber spatial information (ordinary kriging of residuals) and production estimates by field measurements in Eucalyptus non-spatial information (multiple linear regression stands, it seems sensible to invest in further studies that dos Reis et al. New Zealand Journal of Forestry Science (2018) 48:1 Page 15 of 17 focus on more test sites and a wider range of sensor sys- machine; S : Residual standard error; TM: Thematic mapper; USGS: United xy States Geological Survey; V: Volume; VIF: Variance inflation factor tems (particularly RADAR and LIDAR). This would further increase our understanding of the role of the statistical Acknowledgements model set-up in remote sensing-based estimates of forest We thank CAPES - Coordenadoria de Aperfeiçoamento do Pessoal do Ensino Superior (Brazilian Federal Agency for Support and Evaluation of Graduate variables in Eucalyptus stands. Further studies could also Education) for the scholarships provided to AAR and MCC. investigate whether other prediction methods, such as non- linear regression or partial least squares regression (PLSR) Funding Not applicable approaches, alter our findings. The integration of additional predictors (e.g. topographic information or climate vari- Availability of data and materials ables) would be a further possible extension of our work. Not applicable Authors’ contributions Conclusions All authors contributed substantially to the work reported here. AAR, MCC, LRG, and ACFF analysed and interpreted the data. ARR and MCC wrote the Machine learning algorithms, particularly the random for- manuscript. JMM, ACFF, LRG, and FWAJ reviewed and edited the manuscript. est (RF) and support vector machine (SVM) algorithms, All authors read and approved the final manuscript. were able to develop models that estimate basal area and Competing interests volume in Eucalyptus stands using spectral data collected The authors declare that they have no competing interests. from Landsat 5 TM images. The artificial neural network (ANN) method did not perform well in this context, due Publisher’sNote in part to the limited data availability. Springer Nature remains neutral with regard to jurisdictional claims in Random forest was the best method of spatial prediction published maps and institutional affiliations. and mapping of basal area and volume in Eucalyptus Author details stands in Minas Gerais state. However, due to the close 1 Department of Forest Science, Federal University of Lavras – UFLA, PO Box performance to the support vector machine and multiple 3037, Lavras, Minas Gerais 37200-000, Brazil. CPCE, Federal University of Piauí, BR 135 - km 3, Bom Jesus, Piauí 64900-000, Brazil. linear regression methods, we propose that both methods should be tested and then the best result applied for Received: 3 February 2017 Accepted: 27 December 2017 spatial prediction of basal area and volume in other regions with Eucalyptus stands. The approaches used in References this study provide a framework for integrating field and Aertsen, W, Kint, V, Van Orshoven, J, Özkan, KA, Muys, B. (2010). Comparison and multispectral data, highlighting methods that greatly ranking of different modelling techniques for prediction of site index in Mediterranean mountain forests. Ecological Modelling, 221, 1119–1130. improve spatial prediction of basal area and volume esti- https://doi.org/10.1016/j.ecolmodel.2010.01.007. mation in Eucalyptus stands. Although the sensor TM of Ahmed, OS, Franklin, SE, Wulder, MA, White, JC. (2015). Characterizing stand-level Landsat satellites is no longer operational, the concepts forest canopy cover and height using Landsat time series, samples of airborne LiDAR, and the random forest algorithm. ISPRS Journal of presented in this study are expected to be consistent Photogrammetry and Remote Sensing, 101,89–101. https://doi.org/10.1016/j. regardless of the sensor. Thus, the approach used in this isprsjprs.2014.11.007. study can be more broadly applied to basal area and Alvares, CA, Stape, JL, Sentelhas, PC, Gonçalves, JLM, Sparovek, G. (2013). Köppen’s climate classification map for Brazil. Meteorologische Zeitschrift, 6, volume estimation in Eucalyptus stands using the new 711–728. https://doi.org/10.1127/0941-2948/2013/0507. optical sensors such as Landsat 8 OLI and Sentinel-2. Barrios, PG, Bidegain, MP, Gutiérrez, L. (2015). Effects of tillage intensities on The combination of spatial prediction methods with spatial soil variability and site-specific management in early growth of Eucalyptus grandis. Forest Ecology and Management, 346,41–50. https://doi. residual kriging should be used with caution, since the org/10.1016/j.foreco.2015.02.031. relative improvement of spatial prediction accuracy of Berra, EF, Brandelero, C, Pereira, RS, Sebem, E, Goergen, LCG, Benedetti, ACP, basal area and volume did not occur in all methods, and Lippert, DB. (2012). Estimativa do volume total de madeira em espécies de eucalipto a partir de imagens de satélite Landsat. Ciência Florestal, 22(4), 853– there is not always a spatial dependency structure in the 864. https://doi.org/10.5902/198050987566. residuals of a spatial prediction method. Boisvenue, C, Smiley, BP, White, JC, Kurz, WA, Wulder, MA. (2016). Integration of Landsat time series and field plots for forest productivity estimates in Abbreviations decision support models. Forest Ecology and Management, 376, 284–297. AIC: Akaike information criterion; ANN: Artificial neural networks; https://doi.org/10.1016/j.foreco.2016.06.022. EVI: Enhanced vegetation index; G: Basal area; GDP: Gross domestic product; Breiman, L. (2001). Random forests. Machine Learning, 45,5–32. https://doi.org/10. GEMI: Global environment monitoring index; GIS: Geographical information 1023/A:1010933404324. systems; GPS: Global positioning systems; MAE: Mean absolute error; Canavesi, V, Ponzoni, FJ, Valeriano, MM. (2010). Estimativa de volume de madeira ME: Mean error; MLA: Machine learning algorithms; MLR: Multiple linear em plantios de Eucalyptus spp. utilizando dados hiperespectrais e dados regression; MSAVI: Modified soil-adjusted vegetation index; ND: Normalised topográficos. Revista Árvore, 4(3), 539–549. https://doi.org/10.1590/S0100- difference; NDVI: Normalised difference vegetation index; PNE: Pure nugget 67622010000300018. effect; R : Adjusted coefficient of determination; RAE: Reduced average error; Castillo-Santiago, MA, Ghilardi, A, Oyama, K, Hernández-Stefanoni, JL, Torres, I, aj RBF: Radial basis function; RF: Random forest; RI: Relative improvement; Flamenco-Sandoval, A, Fernández, A, Mas, JF. (2013). Estimating the spatial RK: Residual estimation by ordinary kriging; RMSE: Root mean square error; distribution of woody biomass suitable for charcoal making from remote SAVI: Soil-adjusted vegetation index; SMO: Sequential minimal optimization; sensing and geostatistics in central Mexico. Energy for Sustainable SRE: Standard deviation of the reduced average error; SVM: Support vector Development, 17, 177–188. https://doi.org/10.1016/j.esd.2012.10.007. dos Reis et al. New Zealand Journal of Forestry Science (2018) 48:1 Page 16 of 17 Cluter, MEJ, Boyd, DS, Foody, GM, Vetrivel, A. (2012). Estimating tropical forest nitens (Deane & Maiden) maiden short rotation woody crops in Northwest biomass with a combination of SAR image texture and Landsat TM data: An Spain. New Forests, 46, 387–407. https://doi.org/10.1007/s11056-015-9467-7. assessment of predictions between regions. ISPRS Journal of Photogrammetry Görgens, EB, Montaghi, A, Rodriguez, LCE. (2015). A performance comparison of and Remote Sensing, 70,66–77. https://doi.org/10.1016/j.isprsjprs.2012.03.011. machine learning methods to estimate the fast-growing forest plantation Coops, NC, Johnson, M, Wulder, MA, White, JC. (2006). Assessment of QuickBird yield based on laser scanning metrics. Computers and Electronics in high spatial resolution imagery to detect red attack damage due to Agriculture, 116, 221–227. https://doi.org/10.1016/j.compag.2015.07.004. mountain pine beetle infestation. Remote Sensing of Environment, 103(1), 67– Guedes, ICL, Mello, JM, Silveira, EMO, Mello, CR, Reis, AA, Gomide, LR. (2015). 80. https://doi.org/10.1016/j.rse.2006.03.012. Spatial continuity of dendrometric characteristics in clonal cultivated Dai, F, Zhou, Q, Lv, Z, Wang, X, Liu, G. (2014). Spatial prediction of soil organic Eucalyptus sp. throughout the time. Cerne, 21(4), 527–534. https://doi.org/10. matter content integrating artificial neural network and ordinary kriging in 1590/01047760201521041824. Tibetan plateau. Ecological Indicators, 45, 184–194. https://doi.org/10.1016/j. Huang, C, Song, K, Kim, S, Townshend, JRG, Davis, P, Masek, JG, Goward, SN. ecolind.2014.04.003. (2008). Use of a dark object concept and support vector machines to Diamantopoulou, MJ. (2012). Assessing a reliable modeling approach of features automate forest cover change analysis. Remote Sensing of Environment, 112, of trees through neural network models for sustainable forests. Sustainable 970–985. https://doi.org/10.1016/j.rse.2007.07.023. Computing: Informatics and Systems, 2, 190–197. https://doi.org/10.1016/j. Huete, A, Didan, K, Miura, T, Rodriguez, EP, Gao, X, Ferreira, LG. (2002). Overview suscom.2012.10.002. of the radiometric and biophysical performance of the MODIS vegetation Dube, T, & Mutanga, O. (2015). Investigating the robustness of the new Landsat-8 indices. Remote Sensing of Environment, 83, 195–213. https://doi.org/10.1016/ operational land imager derived texture metrics in estimating plantation S0034-4257(02)00096-2. forest aboveground biomass in resource constrained areas. ISPRS Journal of Huete, AR. (1988). A soil-adjusted vegetation index (SAVI). Remote Sensing of Photogrammetry and Remote Sensing, 108,12–32. https://doi.org/10.1016/j. Environment, 25, 295–309. https://doi.org/10.1016/0034-4257(88)90106-X. isprsjprs.2015.06.002. Immitzer, M, Atzberger, C, Koukal, T. (2012). Tree species classification with Dube, T, Mutanga, O, Adam, E, Ismail, R. (2014). Intra-and-inter species biomass random forest using very high spatial resolution 8-band WorldView-2 satellite prediction in a plantation forest: Testing the utility of high spatial resolution data. Remote Sensing, 4, 2661–2693. https://doi.org/10.3390/rs4092661. Spaceborne multispectral RapidEye sensor and advanced machine learning Indústria Brasileira de Árvores (2014). Anuário estatístico da indústria brasileira de algorithms. Sensors, 14, 15348–15370. https://doi.org/10.3390/s140815348. árvores: ano base 2014. Brasília: IBA. Dye, PJ, Jacobs, S, Drew, D. (2004). Verification of 3-PG growth and water-use Indústria Brasileira de Árvores (2015). Anuário estatístico da indústria brasileira de predictions in twelve Eucalyptus plantation stands in Zululand, South Africa. árvores: ano base 2015. Brasília: IBA. Forest Ecology and Management, 193, 197–218. https://doi.org/10.1016/j. Journel, AG, & Huijbregts, CJ (1978). Mining geostatistics. London: Academic. foreco.2004.01.030. Justice, CO, Vermote, E, Townshend, JRG, Defries, R, Roy, DO, Hall, DK, Environmental Systems Research Institute (2010). ArcGIS desktop: Release 10.1. Salomonson, VV, Privette, JL, Riggs, G, Strahler, A, Lucht, W, Myneni, RB, Redlands: ESRI. Knyazikhin, Y, Running, SW, Nemani, RR, Wan, Z, Huete, AR, Leeuwen, WV, Fassnacht, FE, Hartig, F, Latifi, H, Berger, C, Hernández, J, Corvalán, P, Koch, Wolfe, RE, Giglio, L, Muller, J, Lewis, P, Barnsley, MJ. (1998). The moderate B. (2014). Importance of sample size, data type and prediction method resolution imaging spectroradiometer (MODIS): Land remote sensing for for remote sensing-based estimations of aboveground forest biomass. global change research. IEEE Transactions on Geoscience and Remote Sensing, Remote Sensing of Environment, 154, 102–114. https://doi.org/10.1016/j.rse. 36(4), 1228–1249. https://doi.org/10.1109/36.701075. 2014.07.028. Lafiti, H, Nothdurft, A, Koch, B. (2010). Non-parametric prediction and mapping of Fayad, I, Baghdadi, N, Guitet, S, Bailly, JS, Hérault, B, Gond, V, Hajj, ME, Minh, DHT. standing timber volume and biomass in a temperate forest: Application of (2016). Aboveground biomass mapping in French Guiana by combining multiple optical/LiDAR-derived predictors. Forestry, 83(4), 395–407. https://doi. remote sensing, forest inventories and environmental data. International org/10.1093/forestry/cpq022. Journal of Applied Earth Observation and Geoinformation, 52, 502–514. https:// Lopes, DM, Aranha, JT, Walford, N, O’Brien, J, Lucas, N. (2009). Accuracy of remote doi.org/10.1016/j.jag.2016.07.015. sensing data versus other sources of information for estimating net primary Frank, E, Hall, MA, Witten, I (2016). The WEKA workbench [online appendix]. In I production in Eucalyptus globulus Labill. and Pinus pinaster Ait. ecosystems in Witten, E Frank, M Hall, C Pal (Eds.), Data mining: Practical machine learning Portugal. Canadian Journal of Remote Sensing, 35(1), 37–53. https://doi.org/10. tools and techniques, (4th ed., ). Burlington: Morgan Kaufmann. 5589/m08-078. Galeana-Pizaña, JM, López-Caloca, A, López-Quiroza, P, Silván-Cárdenasa, JL, López-Sánchez, CA, García-Ramírez, P, Resl, R, José, C, Hernández-Díaz, JC, López- Couturier, S. (2014). Modeling the spatial distribution of above-ground Serrano, PM, Wehenkel, C. (2014). Modelling dasometric attributes of mixed and carbon in Mexican coniferous forests using remote sensing and a uneven-aged forests using Landsat-8 OLI spectral data in the Sierra Madre geostatistical approach. International Journal of Applied Earth Observation and Occidental, Mexico. iForest, 10,288–295. https://doi.org/10.3832/ifor1891-009. Geoinformation, 30, 179–189. https://doi.org/10.1016/j.jag.2014.02.005. López-Serrano, PM, Corral-Rivas, JJ, Díaz-Varela, RA. (2016). Evaluation of Gao, T, Zhu, J, Deng, S, Zheng, X, Zhang, J, Shang, G, Huang, L. (2016). Timber radiometric and atmospheric correction algorithms for aboveground forest production assessment of a plantation forest: An integrated framework with biomass estimation using Landsat 5 TM data. Remote Sensing, 8(5), 369. field-based inventory, multi-source remote sensing data and forest https://doi.org/10.3390/rs8050369. management history. International Journal of Applied Earth Observation and Lu, D, Mausel, P, Brondízio, E, Moran, E. (2004). Relationships between forest Geoinformation, 52, 155–165. https://doi.org/10.1016/j.jag.2016.06.004. stand parameters and Landsat TM spectral responses in the Brazilian Amazon García-Gutiérrez, J, Martínez-Álvarez, F, Troncoso, A, Riquelme, JC. (2015). A Basin. Forest Ecology and Management, 198, 149–167. https://doi.org/10.1016/ comparison of machine learning regression techniques for LiDAR-derived j.foreco.2004.03.048. estimation of forest variables. Neurocomputing, 167,24–31. https://doi.org/10. Masek, JG, Vermote, EF, Saleous, NE, Wolfe, R, Hall, FG, Huemmrich, KF, Gao, F, 1016/j.neucom.2014.09.091. Kutler, J, Lim, TK. (2006). A Landsat surface reflectance dataset for North Gebreslasie, MT, Ahmed, FB, Aardt, JAN. (2008). Estimating plot-level forest America, 1990 – 2000. IEEE Geoscience and Remote Sensing Letters, 3(1), 68–72. structural attributes using high spectral resolution ASTER satellite data in https://doi.org/10.1109/LGRS.2005.857030. even-aged Eucalyptus plantations in southern KwaZulu-Natal, South Africa. Meng, Q, Cieszewski, C, Madden, M. (2009). Large area forest inventory using Southern Forests, 70(3), 227–236. https://doi.org/10.2989/SF.2008.70.3.6.667. Landsat ETM+: A geostatistical approach. ISPRS Journal of Photogrammetry Gebreslasie, MT, Ahmed, FB, Aardt, JAN. (2010). Predicting forest structural and Remote Sensing, 64,27–36. https://doi.org/10.1016/j.isprsjprs.2008.06.006. attributes using ancillary data and ASTER satellite data. International Journal Montgomery, DC, Peck, EA, Vining, GG (2006). Introduction to linear regression of Applied Earth Observation and Geoinformation, 12S, S23–S26. https://doi. analysis. New York: Wiley. org/10.1016/j.jag.2009.11.006. Moreno, A, Neumann, M, Hasenauer, H. (2016). Optimal resolution for linking Gleason, CJ, & Im, J. (2012). Forest biomass estimation from airborne LiDAR data remotely sensed and forest inventory data in Europe. Remote Sensing of using machine learning approaches. Remote Sensing of Environment, 125,80–91. Environment, 183, 109–119. https://doi.org/10.1016/j.rse.2016.05.021. https://doi.org/10.1016/j.rse.2012.07.006. Morgenroth, J, & Visser, R. (2013). Uptake and barriers to the use of geospatial González-García, M, Hevia, A, Majada, J, Anta, RC, Barrio-Anta, M. (2015). Dynamic technologies in forest management. New Zealand Journal of Forestry Science, growth and yield model including environmental factors for Eucalyptus 43(16), 1–9. https://doi.org/10.1186/1179-5395-43-16. dos Reis et al. New Zealand Journal of Forestry Science (2018) 48:1 Page 17 of 17 Özçelik, R, Diamantopoulou, MJ, Crecente-Campo, F, Eler, U. (2013). Estimating Savannakhet, Lao PDR. Remote Sensing, 6, 5452–5479. https://doi.org/10.3390/ Crimean juniper tree height using nonlinear regression and artificial neural rs6065452. network models. Forest Ecology and Management, 306,52–60. https://doi.org/ Watt, MS, Dash, JP, Watt, P, Bhandari, S. (2016). Multi-sensor modelling of a forest 10.1016/j.foreco.2013.06.009. productivity index for radiata pine plantations. New Zealand Journal of Pacheco, LRF, Ponzoni, FJ, Santos, SB, Andrades Filho, CO, Mello, MP, Campos, RC. Forestry Science, 46, 9. https://doi.org/10.1186/s40490-016-0065-z. (2012). Structural characterization of canopies of Eucalyptus spp. using Watt, MS, Rubilar, R, Kimberley, MO, Kriticos, DJ, Emhart, V, Mardones, O, Acevedo, radiometric data from TM/Landsat 5. Cerne, 18(1), 105–116. https://doi.org/10. M, Pincheira, M, Stape, J, Fox, T. (2014). Using seasonal measurements to 1590/S0104-77602012000100013. inform ecophysiology: Extracting cardinal growth temperatures for process- based growth models of five Eucalyptus species/crosses from simple field Palmer, DJ, Höck, BK, Kimberley, MO, Watt, MS, Lowe, DJ, Payn, TW. (2009). trials. New Zealand Journal of Forestry Science, 44, 9. https://doi.org/10.1186/ Comparison of spatial prediction techniques for developing Pinus radiata s40490-014-0009-4. productivity surfaces across New Zealand. Forest Ecology and Management, Wear, DN, Dixon IV, E, Abt, RC, Singh, N. (2015). Projecting potential adoption of 258(9), 2046–2055. https://doi.org/10.1016/j.foreco.2009.07.057. genetically engineered freeze-tolerant Eucalyptus in the United States. Forest Pinty, B, & Verstraete, MM. (1992). GEMI: A non-linear index to monitor global Science, 61(3), 466–480. https://doi.org/10.5849/forsci.14-089. vegetation from satellites. Vegetatio, 101(1), 15–20. https://doi.org/10.1007/ Were, K, Bui, DT, Dick, OB, Singh, BR. (2015). A comparative assessment of support BF00031911. vector regression, artificial neural networks, and random forests for Ponzoni, FJ, Pacheco, LRF, Santos, SB, Andrades Filho, CO. (2015). Caracterização predicting and mapping soil organic carbon stocks across an Afromontane espectro-temporal de dosséis de Eucalyptus spp. mediante dados landscape. Ecological Indicators, 52, 394–403. https://doi.org/10.1016/j.ecolind. radiométricos TM/Landsat 5. Cerne, 21(2), 267–275. https://doi.org/10.1590/ 2014.12.028. Wu, C, Shen, H, Shen, A, Deng, J, Gan, M, Zhu, J, Xu, H, Wang, K. (2016). Ponzoni, FJ, Shimabukuro, YE, Kuplich, TM (2012). Sensoriamento Remoto da Comparison of machine-learning methods for above-ground biomass Vegetação, (2nd ed., ). São Paulo: Oficina de Textos. estimation based on Landsat imagery. Journal of Applied Remote Sensing, 10, Qi, J, Chehbouni, A, Huete, AR, Kerr, YH, Sorooshian, S. (1994). A modified soil 3. https://doi.org/10.1117/1.JRS.10.035010. adjusted vegetation index. Remote Sensing of Environment, 48, 119–126. Yamamoto, JK, & Landim, PMB (2013). Geoestatística: conceitos e aplicações. São https://doi.org/10.1016/0034-4257(94)90134-1. Paulo: Oficina de Textos. R Core Team (2016). R: A language and environment for statistical computing. Zhang, J, Huang, S, Hogg, EH, Lieffers, V, Qin, Y, He, F. (2014). Estimating spatial Vienna: R Foundation for Statistical Computing. variation in Alberta forest biomass from a combination of forest inventory Raimundo, MR, Scolforo, HF, Mello, JM, Scolforo, JRS, McTague, JP, Reis, AA. and remote sensing data. Biogeosciences, 11, 2793–2808. https://doi.org/10. (2017). Geostatistics applied to growth estimates in continuous forest 5194/bg-11-2793-2014. inventories. Forest Science, 63(1), 29–38. https://doi.org/10.5849/FS.2016-056. Retslaff, FAS, Figueiredo Filho, A, Dias, AN, Bernett, LG, Figura, MA. (2015). Curvas de sítio e relações hipsométricas para Eucalyptus grandis na Região dos Campos Gerais, Paraná. Cerne, 2(2), 199–207. https://doi.org/10.1590/ Ribeiro Júnior, PJ, & Diggle, PJ. (2001). GeoR: A package for geostatistical analysis. R-NEWS, 1(2), 15–18. Rodriguez-Galiano, V, Castillo, MS, Chica-Olmo, M, Chica-Rivas, M. (2015). Machine learning predictive models for mineral prospectivity: An evaluation of neural networks, random forest, regression trees and support vector machines. Ore Geology Reviews, 71, 804–818. https://doi.org/10.1016/j.oregeorev.2015.01.001. Rouse, J, Haas, R, Schell, J, Deering, D, Harlan, J (1973). Monitoring the vernal advancements and retrogradation (greenwave effect) of nature vegetation. NASA/GSFC final report. Greenbelt: NASA. Scolforo, HF, Castro Neto, F, Scolforo, JRS, Burkhart, H, McTague, JP, Raimundo, MR, Loos, RA, Fonseca, S, Sartório, RC. (2016). Modeling dominant height growth of Eucalyptus plantations with parameters conditioned to climatic variations. Forest Ecology and Management, 380, 182–195. https://doi.org/10. 1016/j.foreco.2016.09.001. Shao, Y, & Lunetta, RS. (2012). Comparison of support vector machine, neural network, and CART algorithms for the land-cover classification using limited training data points. ISPRS Journal of Photogrammetry and Remote Sensing, 70, 78–87. https://doi.org/10.1016/j.isprsjprs.2012.04.001. Shataee, S, Kalbi, S, Fallah, A, Pelz, D. (2012). Forest attribute imputation using machine-learning methods and ASTER data: Comparison of k-NN, SVR and random forest regression algorithms. International Journal of Remote Sensing, 33, 6254–6280. https://doi.org/10.1080/01431161.2012.682661. Stojanova, D, Panov, P, Gjorgjioski, V, Kobler, A, Džeroski, S. (2010). Estimating vegetation height and canopy cover from remotely sensed data with machine learning. Ecological Informatics, 5, 256–266. https://doi.org/10.1016/j. ecoinf.2010.03.004. United States Geological Survey (2017). Landsat imagery. Available online at: https://earthexplorer.usgs.gov. Accessed Jan 2017. Verma, NK, Lamb, DW, Reid, N, Wilson, B. (2014). An allometric model for estimating DBH of isolated and clustered Eucalyptus trees from measurements of crown projection area. Forest Ecology and Management, 326, 125–132. https://doi.org/10.1016/j.foreco.2014.04.003. Viana, H, Aranha, J, Lopes, D, Cohen, WB. (2012). Estimation of crown biomass of Pinus pinaster stands and shrubland above-ground biomass using forest inventory data, remotely sensed imagery and spatial prediction models. Ecological Modelling, 226,22–35. https://doi.org/10.1016/j.ecolmodel.2011.11.027. Vicharnakorn, P, Shrestha, RP, Nagai, M, Salam, AP, Kiratiprayoon, S. (2014). Carbon stock assessment using remote sensing and forest inventory data in http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png New Zealand Journal of Forestry Science Springer Journals

Spatial prediction of basal area and volume in Eucalyptus stands using Landsat TM data: an assessment of prediction methods

Loading next page...
 
/lp/springer-journals/spatial-prediction-of-basal-area-and-volume-in-eucalyptus-stands-using-hhip7D30ay
Publisher
Springer Journals
Copyright
2018 The Author(s).
eISSN
1179-5395
DOI
10.1186/s40490-017-0108-0
Publisher site
See Article on Publisher Site

Abstract

Background: In fast-growing forests such as Eucalyptus plantations, the correct determination of stand productivity is essential to aid decision making processes and ensure the efficiency of the wood supply chain. In the past decade, advances in remote sensing and computational methods have yielded new tools, techniques, and technologies that have led to improvements in forest management and forest productivity assessments. Our aim was to estimate and map the basal area and volume of Eucalyptus stands through the integration of forest inventory, remote sensing, parametric, and nonparametric methods of spatial prediction. Methods: This study was conducted in 20 5-year-old clonal stands (362 ha) of Eucalyptus urophylla S.T.Blake x Eucalyptus camaldulensis Dehnh. The stands are located in the northwest region of Minas Gerais state, Brazil. Basal area and volume data were obtained from forest inventory operations carried out in the field. Spectral data were collected from a Landsat 5 TM satellite image, composed of spectral bands and vegetation indices. Multiple linear regression (MLR), random forest (RF), support vector machine (SVM), and artificial neural network (ANN) methods were used for basal area and volume estimation. Using ordinary kriging, we spatialised the residuals generated by the spatial prediction methods for the correction of trends in the estimates and more detailing of the spatial behaviour of basal area and volume. Results: The ND54 index was the spectral variable that had the best correlation values with basal area (r = − 0.91) and volume (r = − 0.52) and was also the variable that most contributed to basal area and volume estimates by the MLR and RF methods. The RF algorithm presented smaller basal area and volume errors when compared to other machine learning algorithms and MLR. The addition of residual kriging in spatial prediction methods did not necessarily result in relative improvements in the estimations of these methods. Conclusions: Random forest was the best method of spatial prediction and mapping of basal area and volume in the study area. The combination of spatial prediction methods with residual kriging did not result in relative improvement of spatial prediction accuracy of basal area and volume in all methods assessed in this study, and there is not always a spatial dependency structure in the residuals of a spatial prediction method. The approaches used in this study provide a framework for integrating field and multispectral data, highlighting methods that greatly improve spatial prediction of basal area and volume estimation in Eucalyptus stands. This has potential to support fast growth plantation monitoring, offering options for a robust analysis of high-dimensional data. Keywords: Forest inventory, Machine learning algorithms, Multiple linear regression, Random forest, Support vector machine, Artificial neural networks * Correspondence: alinyreis@hotmail.com Department of Forest Science, Federal University of Lavras – UFLA, PO Box 3037, Lavras, Minas Gerais 37200-000, Brazil Full list of author information is available at the end of the article © The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. dos Reis et al. New Zealand Journal of Forestry Science (2018) 48:1 Page 2 of 17 Background region of Brazil from Landsat 5 TM images. Canavesi et The Brazilian forestry sector represents an important al. (2010) used hyperspectral data from the Hyperion EO- share of the products, taxes, jobs, and income generation 1 sensor for the volume estimation of Eucalyptus planta- of the country and accounts for 3.5% of the national tions under different relief conditions. The results found GDP (IBÁ 2015). This is in large part due to the success- by these authors corroborate the potential use of data col- ful establishment of fast-grown plantations of Eucalyptus lected by remote sensing to estimate the productivity of species, which currently occupy around 5.6 million Eucalyptus plantations. hectares (71.9% of the total planted forest area in Brazil) In parallel to the advances in remote sensing, compu- and represent 17% of the harvested wood in the world tational techniques, such as machine learning algorithms (IBÁ 2014, 2015). (MLA), have been increasingly used to model spectral The Eucalyptus genus has more than 500 species, and a and biological data. These techniques overcome the subset of these are used in fast-growing plantations (Barrios difficulties of classical statistical methods such as spatial et al. 2015), commonly located in tropical and sub-tropical correlation, non-linearity of data, and overfitting (Were regions, and more recently in temperate regions. Spain et al. 2015). In addition, these algorithms allow the use (González-García et al. 2015), Portugal (Lopes et al. 2009), of categorical data, with statistical noise and incomplete Uruguay (Barrios et al. 2015), Chile (Watt et al. 2014), data, and therefore are able to address needs under South Africa (Dye et al. 2004), Australia (Verma et al. different dataset scenarios (Breiman 2001). 2014), and the USA (Wear et al. 2015) are some examples Several studies have shown the superiority of machine of productive Eucalyptus plantations in temperate regions learning algorithms in relation to classical statistics in that have cutting cycles ranging from 8 to 12 years. In trop- several areas, such as in forest management. For ical regions such as Brazil, the cutting cycles of Eucalyptus instance, Ahmed et al. (2015) modelled a Landsat time- plantations range from 5 to 7 years (Guedes et al. 2015, series data structure in conjunction with LiDAR data Scolforo et al. 2016). and found that the random forest algorithm achieved Timber production is the main ecosystem service of better results than multiple regression for all forest planted forests and the main management objective for classes. In another study, García-Gutiérrez et al. (2015) these plantations (Gao et al. 2016). In the case of fast- found that machine learning algorithms (mainly support growing plantations, the correct determination of stand vector machine) were superior for modelling a range of productivity is essential to support forest management forest variables (viz., aboveground biomass, basal area, planning strategies (González-García et al. 2015, Retslaff dominant height, mean height, and volume) compared et al. 2015). Traditionally, productivity assessments of a with multiple linear regression. Machine learning algo- plantation are carried out based on field measurements rithms have also been shown to provide an economical of the diameter at breast height (DBH) and tree height and accurate way to estimate aboveground biomass in via forest inventory. However, in fast-growing planta- forests from Landsat satellite images (Wu et al. 2016). tions, field-based inventory programmes may not be These studies highlight the benefits of applying more sufficient to capture productivity differences across the robust techniques in solving problems previously entire area, such as those arising from losses due to pest and resolved by traditional statistical modelling. disease attacks (Coops et al. 2006), or from climatic anomal- In this context, the aims of this study were: (i) to esti- ies (González-García et al. 2015, Scolforo et al. 2016). mate and map basal area and volume of a Eucalyptus In the past decade, advances in geographical informa- plantation through the integration of forest inventory, tion systems (GIS), global positioning systems (GPS), remote sensing, and parametric and nonparametric and remote sensing have provided new tools, techniques, methods of spatial prediction; (ii) to compare the per- and technologies to support forest management. Thus, formance of machine learning algorithms (random low-cost and accurate forest productivity assessment can forest, support vector machine, and artificial neural net- be made, as well as allowing the collection of information works) with the linear regression model; and (iii) to in areas not sampled by forest inventory (Morgenroth and assess the improvement in basal area and volume esti- Visser 2013). The analysis of remote sensing information mation with the addition of residual kriging in spatial combined with field data has been used by several authors prediction methods. to fill the information gap left by data collected only in the field (Watt et al. 2016, Boisvenue et al. 2016, Moreno et al. Methods 2016, Fayad et al. 2016, Vicharnakorn et al. 2014). Ponzoni Study area et al. (2015) used data collected from Landsat 5 thematic The study area is located in Minas Gerais state, the fourth mapper (TM) images for spectral-temporal characterisa- largest state in Brazil, with an area of 586,521 km . Minas tion of Eucalyptus canopies. Berra et al. (2012) estimated Gerais state has the largest area occupied by plantations of the volume of a Eucalyptus plantation in the southern the Eucalyptus genus in the country (1,400,232 ha), dos Reis et al. New Zealand Journal of Forestry Science (2018) 48:1 Page 3 of 17 2 −1 3 −1 corresponding to 25.2% of Brazilian Eucalyptus planta- (m ha ), and total stem volume (m ha ) were obtained tions. The wood from these plantations is mainly used for from the information collected in the plots. the production of charcoal, as well as pulp, lumber, and panels (IBÁ 2015). Remote sensing data and processing The Eucalyptus clonal stands under study are located Spectral data were obtained from a Landsat 5 TM satellite in Lagoa Grande municipality, in the northwest of Minas image, with spatial resolution of 30 m, on the date of June Gerais state (lat. 17° 43′ 00″ S–17° 44′ 00″ S, long. 46° 25, 2009, corresponding with field data collection, in orbit 32′ 00″ W–46° 33′ 00″ W, elevation 560 m a.s.l.) 220, point 072, in bands TM1 (0.45–0.52 μm), TM2 (Fig. 1). According to the Köppen climatic classification (0.52–0.60 μm), TM3 (0.63–0.69 μm), TM4 (0.76–0.90 μm), system, the climate in this region is Aw, classified as a TM5 (1.55–1.75 μm), and TM7 (2.18–2.35 μm). The tropical savanna climate, with drier months during the Landsat 5 TM Surface Reflectance Climate Data Record winter, high annual precipitation in the summer and (CDR) was used, which is a Landsat Level-2A product gener- average temperature of all months greater than 18 °C ated by the Landsat Ecosystem Disturbance Adaptive (Alvares et al. 2013). The average annual rainfall and the Processing System (LEDAPS) (Masek et al. 2006) obtained average monthly rainfall of the dry and wet seasons are from the USGS (United States Geological Survey) database 1430, 8, and 257 mm, respectively. (USGS 2017). These images already contain radiometric calibration, and geometric and atmospheric corrections. Field data description and sampling In addition, vegetation indices using the red, near infra- This study was undertaken in a set of 20 clonal stands of red and short wave infrared spectral bands of Landsat 5 Eucalyptus urophylla S.T.Blake x Eucalyptus camaldulensis TM (Table 2) were calculated, as described by Lu et al. Dehnh, totalling an area of 362.2 ha. These stands were (2004) and Ponzoni et al. (2012). The normalised differ- planted in April and May 2004, with initial spacing of either ence vegetation index (NDVI) is the most widely used 3 × 2 m or 3 × 3 m. The forest inventory was carried out in vegetation index for retrieval of forest biophysical parame- June andJuly2009ona setof35georeferenced square ters (Rouse et al. 1973, Lu et al. 2004). The soil-adjusted plots of 400 m . The plots were georeferenced in the field vegetation index (SAVI) and modified soil-adjusted vege- with GPS (Garmin 60CSx, Garmin Ltd., Olathe, Kansas, tation index (MSAVI) are soil adjusted vegetation indices USA). The sampling procedure adopted was systematic, used to reduce the effect of soil background reflectance allocating approximately one plot per 10 ha of forest. In (Qi et al. 1994). The enhanced vegetation index (EVI) was each plot, the diameter at breast height (DBH) of all stems developed to optimise the vegetation signal, correcting was measured, as well as the total height of the first 15 trees reflected light distortions caused by particulate matter sus- with normal stems (without bifurcation or any other defect) pended in the air, as well as by influence of background and height of dominant trees (the 100 largest diameter trees data under the vegetation canopy (Justice et al. 1998). The per hectare). Descriptive statistics of the variables collected global environment monitoring index (GEMI) minimises in the field are shown in Table 1. Estimates of basal area atmospheric effects, similar to the EVI and minimises Fig. 1 Geographic location of the Eucalyptus stands and sampling grid dos Reis et al. New Zealand Journal of Forestry Science (2018) 48:1 Page 4 of 17 Table 1 Descriptive statistics of the variables collected in the field divided into two datasets: prediction or fitting set (70% of the database) and validation set (30% of the database). Statistic DBH H Hd Therefore, 25 plots were used for basal area and volume Minimum 11.98 16.98 19.40 predictions, and 10 plots were used for validation of the Maximum 15.45 24.63 26.38 different approaches to estimate basal area and volume Mean 14.02 21.18 22.98 in the Eucalyptus stands under study. Standard deviation 0.85 2.33 1.92 Pearson correlation analysis was carried out among DBH diameter at breast height (cm), H total height (m), Hd dominant basal area, volume, values of spectral bands, and vegeta- height (m) tion indices. From these correlations, the relationship between the dendrometric characteristics of Eucalyptus observational angular effects in the observed vegetation stands and its spectral response in Landsat images was index signal (Pinty and Verstraete 1992). explored. Dataset integration Multiple linear regression (MLR) analysis The choice of an appropriate pixel size is one of the issues Basal area and volume estimation were accomplished to be considered when using remote sensing data to esti- through MLR analysis. A stepwise variable elimination mate dendrometric characteristics. Due to easy accessibil- method was used in conjunction with the Akaike infor- ity and affordability, a number of studies have employed mation criterion (AIC) to select only those spectral vari- Landsat images and found statistically significant correla- ables that “best” explained basal area and volume tions between remotely sensed data and dendrometric variation. The residuals from regression models were characteristics using ground plots ranging from 315 to analysed to assess the existence of trends in the errors. 2500 m (Dube and Mutanga 2015, López-Sánchez et al. The variance inflation factor (VIF) was used to detect 2014, Zhang et al. 2014, López-Serrano et al. 2016). possible correlations between explanatory variables Although the size of a single plot (20 × 20 m) in this (multicollinearity). The adopted VIF cutoff value was 10. study does not cover a Landsat pixel, we considered that a plot represents an area larger than its size. As the Random forest (RF) sampling design was one plot per hectare, we ensured The RF algorithm, initially proposed by Breiman (2001), that each plot matched with the reference pixel in order is an ensemble method that generates a set of individu- to extract reliable data. ally trained decision trees and combines their results. The greatest advantage of these decision trees as regres- Spatial modelling and prediction methods sion methods is that they are able to accurately describe Exploratory data analysis complex relationships among multiple variables, and by Spectral response was extracted from the Landsat TM aggregating these decision trees, more accurate solutions bands and vegetation indices from the geographical are generated (Gleason and Im 2012). In addition to coordinates of the forest inventory plots. Thus, the plot these characteristics, RF is an easy parameterisation 2 −1 database was composed of basal area (m ha ), volume method (Immitzer et al. 2012). This method has shown 3 −1 (m ha ), spectral band values, and vegetation index great potential in regression studies with integration of values. The total database (35 plots) was systematically spectral data, in some cases generating better results Table 2 Vegetation indices used in the spectral characterisation of the Eucalyptus stands Vegetation indices Formulation Reference NDVI (TM4 − TM3)/(TM4 + TM3) Rouse et al. (1973) ND53 (TM5 − TM3)/(TM5 + TM3) Huete et al. (2002) ND54 (TM5 − TM4)/(TM5 + TM4) Huete et al. (2002) ND57 (TM5 − TM7)/(TM5 + TM7) Huete et al. (2002) SAVI [(TM4 − TM3)/(TM4 + TM3 + 0.5)].(1.5) Huete (1988) qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi MSAVI ðÞ 2TM4 þ 1 −ðÞ 2TM4 þ 1 −8ðÞ TM4−2TM3 =2 Qi et al. (1994) EVI 2.5 × [(TM4 − TM3)/(TM4 + 6TM3 − 7.5TM1 + 1)] Justice et al. (1998) GEMI n(1 − 0.25n). [(TM3 − 0.125)/(1 − TM3)] Pinty and Verstraete (1992) 2 2 2 TM4 −TM3 þ1:5TM4þ0:5TM3 ðÞ n ¼ TM4þTM3þ0:5 TM thematic mapper, ND normalised difference, NDVI normalised difference vegetation index, SAVI soil-adjusted vegetation index, MSAVI modified soil-adjusted vegetation index, EVI enhanced vegetation index, GEMI global environment monitoring index dos Reis et al. New Zealand Journal of Forestry Science (2018) 48:1 Page 5 of 17 than conventional techniques (Stojanova et al. 2010, activation function was employed in all neurons. Deter- Dube et al. 2014, García-Gutiérrez et al. 2015, Görgens mined by previous tests, ANNs were structured with 14 et al. 2015, Wu et al. 2016). The RF algorithm fitted in neurons in the input layer (number of variables), 1 this study is implemented in the open-source software neuron in the hidden layer, and 1 neuron in the output WEKA 3.8 (Frank et al. 2016). Tests were carried out layer, corresponding to estimated basal area or volume. with the exchange of tree numbers and attribute num- The learning rate, the momentum term, and iteration bers to be drawn. Then, 20 trees with 10 attributes to be numbers were fixed at 0.3, 0.5, and 500 for basal area, drawn by node for basal area and 80 trees and 11 attri- and 0.2, 0.7, and 500 for volume, respectively. butes for volume were fixed. Relative importance evaluation Support vector machine (SVM) The variable importance was assessed for each model SVMs operate by assuming that each set of inputs will with a removal-based approach in order to avoid the have a unique relation to the response variable and that limited interpretability of the MLA and to verify how the grouping and the relation of these predictors to one each independent variable contribute to the performance another is sufficient to identify rules that can be used to of machine learning algorithms (RF, SVM, and ANN). predict the response variable from new input sets. For All algorithms were adjusted n times, with n being the this, SVMs project the input space data into a feature number of available variables. At each time, one variable space with a much larger dimension, enabling linearly was removed from the training set and then the root non-separable data to become separable in the feature mean square error (RMSE) of the algorithm was quanti- space. This method has been successfully used in fied. At the end, the obtained errors were normalised by forestry classification problems (Huang et al. 2008, Shao the ratio of the largest RMSE so that they were between and Lunetta 2012) and more recently in regression prob- 0 and 1 and multiplied by 100 (Were et al. 2015). The lems with the use of spectral data (García-Gutiérrez et variable that results in the highest RMSE when removed al. 2015, Wu et al. 2016). The Kernel function used in from the database is the variable with the highest rela- the present study was the Gaussian or radial basis func- tive importance within the model. This methodology tion (RBF). The algorithm used is implemented in was chosen because it can be consistently applied to all WEKA 3.8 software under the sequential minimal algorithms, allowing comparisons of variable contribu- optimization (SMO) function. Values of parameters C tion between the methods. and σ (bandwidth or influence range of each training point in the RBF) were tested within the interval (10 ) Geostatistical modelling of prediction methods errors i= , where the least squared mean error Spatial prediction methods capture the average behaviour − 3, − 2, − 1, 0, 1, 2, 3 configuration was chosen for application. For basal area of the main variable, allowing the identification of its and volume, selected C and σ values were 10 and 0.1, general spatial behaviour, without detailing more specific and 100 and 0.01, respectively. areas or regions. For details of specific regions, estimates obtained exclusively from the auxiliary variables need to Artificial neural networks (ANNs) be corrected. Thus, residuals generated by spatial predic- ANNs are a parallel-distributed information processing tion methods (MLR, RF, SVM, and ANN) were used for system that simulates the working of neurons in the hu- the correction of trends in the estimates and for detailing man brain, being able to learn from examples. Artificial the spatial behaviour of the main variables (basal area and neural networks are widely used to model complex and volume) using ordinary kriging. The interpolated values of non-linear relations between inputs and outputs or to the residuals were then added to the estimates of the determine patterns in data (Diamantopoulou 2012). The spatial prediction methods (MLR, RF, SVM, and ANN). use of this technique in conjunction with remote sensing Thus, we obtained the basal area and volume estimates data is consolidated in several studies (Cluter et al. 2012, corrected by the ordinary kriging of the residuals for each García-Gutiérrez et al. 2015, Rodriguez-Galiano et al. spatial prediction method. 2015, Were et al. 2015). We used the ANN obtained by For the application of ordinary kriging to the spatial running the Multilayer Perceptron function (of the prediction method residuals, we considered the station- multilayer perceptron type) provided by WEKA 3.8 soft- arity presupposition of the intrinsic hypothesis (Journel ware. The training of neural networks occurred through and Huijbregts 1978), through fitting of theoretical func- the back-propagation algorithm, which fit the weights of tions to experimental semivariogram models. Spherical, all the layers of the network from the backpropagation exponential, and Gaussian models were fitted to the of the error, obtained in the output layer. The weights semivariogram of the residuals from each spatial predic- updating was carried out according to the error, learning tion method using weighted least squares. The semivar- 2 2 rate, and momentum terms (Delta rule). The sigmoidal iogram parameters (nugget (τ ), sill (σ ), and range (ϕ)) dos Reis et al. New Zealand Journal of Forestry Science (2018) 48:1 Page 6 of 17 were calculated from the best fitted models, which pro- Data analysis for this study was performed using vided information about the spatial structure as well as the following software: R (R Core Team 2016) with input parameters for the kriging interpolation. The nug- the geoR package (Ribeiro Júnior and Diggle 2001), get represents the minimum semivariance among differ- WEKA 3.8 (Frank et al. 2016), and ArcGis version ent sampling intervals. Nugget values greater than zero 10.1 (Esri 2010) with Geostatistical Analyst extension represent a combination of experimental error and of (Esri 2010). unresolved spatial variability occurring at scales smaller than inter-sampling lag distance. Sill is the plateau Results reached by the values of semivariance and indicates the Descriptive statistic of the measured basal area and amount of variation than can be explained by the spatial volume structure of the data. Range is the distance at which the 2 −1 Basal area ranged from 10.07 to 21.63 m ha , with semivariogram reaches the plateau, indicating the dis- 2 −1 average of 16.86 m ha and standard deviation of tance which values are spatially correlated. The evalu- 2 −1 2.4 m ha (Table 3). The average volume was ation of the performance of each semivariogram model 3 −1 3 169.34 m ha with a standard deviation of 29.66 m ha and the selection of the best models were based on −1 3 −1 and range from 95.80 up to 213.85 m ha . Basal area cross-validation, which estimates the reduced average had a lower coefficient of variation (CV = 14.26%) com- error (RAE) and the standard deviation of the reduced pared to volume (CV = 17.51%), demonstrating a consid- average error (SRE) (Yamamoto and Landim 2013). erable homogeneity of this dendrometric characteristic in the evaluated Eucalyptus stands. Validation and assessment of the prediction methods The different approaches to basal area and volume esti- mation of Eucalyptus stands were evaluated by compar- Correlation among basal area, volume, spectral bands, ing the basic statistics of the predicted maps (mean and and vegetation indices standard deviation) with the estimates obtained from the The correlation between plot basal area and the different forest inventory, and through the discrepancies between spectral bands and their ratios (Table 4) ranged from − 0.91 observed and predicted values in the fitting and valid- (ND54) to 0.15 (TM2). The SAVI, MSAVI, GEMI, and EVI ation datasets. These discrepancies were evaluated using were also highly correlated with basal area (r > 0.85). The the mean error (ME), the mean absolute error (MAE), correlation between plot volume and the spectral bands and the root mean square error (RMSE), as described in and ratios ranged from − 0.52 (ND54) to − 0.02 (TM2). Eqs. 1–3. The NDVI (r =0.49) and SAVI (r = 0.47) also had high 1 N correlations with volume, but these were lower in ME ¼ X −X ð1Þ i i i¼1 N magnitude when compared with those for basal area. Many of the spectral bands and ratios were also 1 N ^ highly correlated with each other (r > 0.90), which can MAE ¼ X −X ð2Þ i i i¼1 be considered a drawback due possible to multicolli- rffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi nearity problems in linear regression models. 1 N RMSE ¼ X −X ð3Þ i i i¼1 Table 3 Descriptive statistics obtained from forest inventory where N is the number of values in the dataset; X is the processing using the estimators of simple random sampling estimated value of the main variable; X is the observed (SRS) value in the prediction and validation sets. Estimators Basal area Volume The relative improvement (RI) achieved by residual 2 a 2 −1 3 a 3 −1 (m ) (m ha )(m ) (m ha ) kriging for a particular spatial prediction method was calculated by comparing the change in RMSE when the Minimum 0.91 10.07 8.62 95.80 residual kriging was applied using Eq. 4. Maximum 1.95 21.63 19.25 213.85 Mean 1.52 16.86 15.24 169.34 RMSE −RMSE spm spm‐RK RI ¼  100% ð4Þ Standard deviation 0.22 2.4 2.67 29.66 RMSE spm Coefficient of variation (%) 14.26 17.51 where RMSE is the root mean square error of a Sampling error (%) 4.89 6.00 spm spatial prediction method, RMSE is the root mean spm ‐ RK Total confidence interval 5807.9–6405.0 57,652.7–65,018.7 square error of the spatial prediction method when a 2 Estimates obtained for an area of 900 m (corresponding to the area of each residual kriging is added to this method. pixel of the Landsat images) dos Reis et al. New Zealand Journal of Forestry Science (2018) 48:1 Page 7 of 17 Table 4 Pearson’s correlation coefficient (r) among basal area, volume, and spectral data for the Eucalyptus stands Variables 1 2 3 4 5 6789 10 11 12 13 14 15 16 1. G 1.00 2. V 0.70* 1.00 ns ns 3. TM1 − 0.24 0.10 1.00 ns ns 4. TM2 0.15 − 0.02 0.59* 1.00 ns ns 5. TM3 − 0.20 − 0.10 0.80* 0.72* 1.00 ns ns 6. TM4 0.82* 0.41* − 0.05 0.43* 0.12 1.00 ns ns 7. TM5 − 0.66* − 0.36 0.53* 0.31 0.68* − 0.40* 1.00 ns 8. TM7 − 0.68* − 0.40* 0.56* 0.32 0.66* − 0.42* 0.90* 1.00 ns 9. NDVI 0.83* 0.49* − 0.53* − 0.13 − 0.55* 0.75* − 0.78* − 0.82* 1.00 ns ns ns ns ns 10. ND53 − 0.60* − 0.32 − 0.29 − 0.50* − 0.37 − 0.66* 0.43* 0.31 − 0.31 1.00 ns ns ns 11. ND54 − 0.91* − 0.52* 0.31 − 0.09 0.30 − 0.86* 0.80* 0.78* − 0.93* 0.65* 1.00 ns ns ns ns 12. ND57 0.45* 0.27 − 0.49* − 0.28 − 0.49* 0.27 − 0.50* − 0.82* 0.60* 0.00 − 0.48* 1.00 ns ns ns 13. SAVI 0.88* 0.47* − 0.23 0.25 − 0.12 0.97* − 0.57* − 0.60* 0.89* − 0.57* − 0.94* 0.41* 1.00 ns ns ns 14. MSAVI 0.88* 0.45* − 0.36 0.13 − 0.27 0.92 − 0.65* − 0.67* 0.94* − 0.50* − 0.95* 0.46* 0.99* 1.00 ns ns ns ns 15. GEMI 0.86* 0.45* − 0.14 0.34 0.00 0.99* − 0.49* − 0.52* 0.83* − 0.62* − 0.91* 0.35 0.99* 0.96* 1.00 ns ns 16. EVI 0.87* 0.42* − 0.41* 0.12 − 0.28 0.92* − 0.64* − 0.67* 0.94* − 0.48* − 0.94* 0.47* 0.98* 1.00* 0.96* 1.00 3 −1 2 −1 V volume (m ha ), G basal area (m ha ), TM thematic mapper, ND normalised difference, NDVI normalised difference vegetation index, SAVI soil-adjusted vege- tation index, MSAVI modified soil-adjusted vegetation index, GEMI global environment monitoring index, EVI enhanced vegetation index ns Not significant at 5%; *significant at 5% Spatial prediction of basal area and volume by MLR, RF, ANN and SVM models of both basal area and volume. SVM, and ANN The TM1 band, selected by the MLR for volume estima- The spectral data examined had several significant corre- tion, also had high importance in the ANN and SVM lations with the basal area and volume data (Table 4). models of volume. However, they contributed in a reduced form to the Comparisons of measured values and estimated values regression models due to multicollinearity problems, of basal area (Fig. 3) showed that basal area was under- which resulted in final regression models with few estimated by the ANN model (Fig. 3d). The model fitted significant explanatory variables (Table 5). The basal area using the RF algorithm produced values of basal area model only included the ND54 vegetation index that were in closer agreement with measured values (Table 5), while the volume model included the TM1 (Fig. 3b). Similar results were seen for the volume band and NDVI. The coefficient of determination was models, but with a slight overestimation for the plots high for the basal area model (R = 0.81), but was much with small volumes and an underestimation of the plots lower for the volume model (R = 0.37). with high volumes. The model fitted using ANN In the case of basal area and volume predictions using algorithm did not produce estimates of volume that machine learning algorithms, the increases in RMSEs were consistent with measured values (Fig. 3h). The when the predictors were excluded one by one from the models fitted using the MLR and SVM (Fig. 3e, g) algo- SVM, ANN, and RF models are shown in Fig. 2. The rithms produced predicted values that were more closely variable ranking by relative importance differed for each related to the measured values than those from the algorithm. The ND54 index, chosen for basal area model ANN algorithm. by the MLR, also had the greatest effect on the accuracy Prediction and validation sets of basal area and volume of the RF model, both for basal area and volume. The were compared by means of Student’s t test, in order to TM2 band had the highest relative importance for the check if they provided unbiased subsets of the original Table 5 Regression model fitted for basal area and volume estimation for the Eucalyptus stands Model β β β R S S (%) 0 1 2 aj xy xy G= β + β ND54 0.78*** − 1.80*** – 0.81 0.09 5.76 0 1 V= β + β NDVI + β TM1 − 24.11* 42.69*** 241.61* 0.37 2.01 13.08 0 1 2 2 3 2 G basal area (m ), V volume (m ), β , β , and β coefficients, R adjusted coefficient of determination, S residual standard error, TM thematic mapper, ND 0 1 2 aj xy normalised difference, NDVI normalised difference vegetation index ***Significant at 1%; *significant at 10% dos Reis et al. New Zealand Journal of Forestry Science (2018) 48:1 Page 8 of 17 Fig. 2 Relative importance of the variables within each machine learning algorithm: RF, SVM, and ANN for basal area and volume 2 −1 data (Viana et al. 2012). Average basal area (17.03 m ha ) SVM had the best performance and MLR the poorest 3 −1 and volume (171.10 m ha ) obtained from the prediction performance. set did not statistically differ from average basal area 2 −1 3 −1 (16.45 m ha ) and volume (164.92 m ha ) obtained Geostatistical modelling of prediction method errors from the validation set, considering two-tailed Student’s t The semivariogram models were selected based on RAE ns test (Basal area: t = 0.629 , df = 33, p value = 0.533; volume: and SRE values close to 0 and 1, respectively (Yamamoto ns t = 0.550 ,df=33, p value = 0.585). and Landim 2013). The experimental semivariograms The evaluation of spatial prediction methods, based on constructed from the residuals of the basal area and prediction and validation sets, was done by comparing volume prediction methods had a spatial dependence the statistics presented in Eqs. 1 through 4 (Table 6). structure defined in six of the eight analysed situations The mean error (ME) should ideally be close to zero if (Fig. 4 and Table 7). The volume residuals from MLR the prediction method is unbiased, and the values of this and ANN methods had a pure nugget effect, i.e. no parameter suggested that all predictions generated im- spatial dependence structure. This result indicated a partial estimates when evaluated from both prediction random spatial distribution of the residuals in these two and validation sets. Both the MAE and RMSE showed situations. that basal area estimates were more accurate than The residuals of the spatial prediction methods that volume estimates for all spatial prediction methods. The had defined spatial dependence structures (Fig. 4) were MAE and RMSE results obtained from the validation set interpolated using ordinary kriging, and their estimates demonstrated that there were no significant differences were added to basal area and volume estimates of the among the MLR, RF, SVM, and ANN for basal area respective spatial prediction methods. The relative estimates. For the volume estimates, the models fitted by improvement (RI) of the addition of basal area residual dos Reis et al. New Zealand Journal of Forestry Science (2018) 48:1 Page 9 of 17 Table 6 Prediction methods evaluation using the prediction and validation sets for the Eucalyptus stands 2 3 Method Statistic Basal area error (m ) Volume error (m ) Prediction Validation Prediction Validation set set set set MLR ME 0.00 − 0.05 0.00 − 0.74 MAE 0.07 0.09 1.56 2.08 RMSE 0.08 0.14 1.89 2.48 RMSE 5.50 9.42 12.27 16.72 (%) RF ME 0.01 − 0.03 0.08 − 0.90 MAE 0.03 0.09 0.62 1.63 RMSE 0.04 0.14 0.73 2.21 RMSE 2.48 9.54 4.77 14.91 (%) SVM ME − 0.01 − 0.05 0.00 − 0.66 MAE 0.04 0.09 1.19 1.59 RMSE 0.06 0.14 1.60 2.02 RMSE 4.14 9.39 10.41 13.58 (%) ANN ME 0.09 0.03 0.94 0.45 MAE 0.10 0.09 1.70 1.68 RMSE 0.14 0.13 1.98 2.05 RMSE 8.87 8.52 12.88 13.82 (%) MLR multiple linear regression, RF random forest, SVM support vector machine, ANN artificial neural networks, ME mean error, MAE mean absolute error, RMSE root mean square error and volume were not within the confidence interval generated by the forest inventory. Maps showing the spatial distribution of basal area Fig. 3 Scatter plots of measured values versus estimated values by: MLR (a) and (e); RF (b) and (f); SVM (c) and (g); and ANN (d) and (h) and volume identified the same areas with high and low for basal area and volume, respectively. A 1:1 line (black, dashed) is productivity, regardless of the spatial prediction method provided for reference (Figs. 5 and 6). The maps obtained by ANN had a smaller difference between maximum and minimum kriging by the ANN method was 25%, i.e. there was a estimated values for basal area and volume, while the reduction from 8.52 to 6.37% in the RMSE (Table 8). For mapping obtained from the SVM models had a greater the RF method, the RMSE increased from 9.54 to difference between these values. MLR and RF methods 10.08%, which corresponds to a 5.7% increase in the provided similar estimates in the basal area and volume error of the basal area estimates by kriging of the resid- mapping. uals. For the volume, the addition of residual kriging The addition of residual kriging in the basal area and improved the precision of SVM estimates and reduced volume mapping (Fig. 7) resulted in a greater difference the precision of the RF estimates. between maximum and minimum estimated values in all spatial prediction methods. For ANN, residual kriging resulted in estimates that were more in agreement with Mapping of basal area and volume for Eucalyptus stands the field observations, correcting the basal area under- Basal area and volume estimates obtained by different estimation behaviour for the Eucalyptus stands under spatial prediction methods (Table 9) had average values study. However, the addition of residual kriging to the very close to each other, and were in agreement with the models fitted by RF and SVM methods did not result in forest inventory estimates (Table 3). Only the ANN significant differences in basal area and volume map- method generated underestimated values for both basal ping, and also led to increases in estimation errors in area and volume, so that the total values of basal area non-sampled areas in the field (Table 8). dos Reis et al. New Zealand Journal of Forestry Science (2018) 48:1 Page 10 of 17 Discussion Remote detection of forest canopies is complex due to the size, shape, and dielectric properties of its scatter elements (leaves, branches, and stems) (Galeana-Pizaña et al. 2014). The spatial diversity of forest canopies makes the relationship between forest parameters and remote sensing data a major challenge, although several studies have already demonstrated correlation between spectral data and forest characteristics of interest (Stojanova et al. 2010, Viana et al. 2012, Castillo-Santiago et al. 2013, Fayad et al. 2016, Gao et al. 2016). For instance, plantations comprised of different Eucalyptus species may have very similar values of basal area and volume, but have different spectral characteristics due to differences in spectral behav- iour of the species that form the canopies. Also, according to Ponzoni et al. (2015), the canopy reflectance of older Eucalyptus plantations (between 4 and 6 years) tend to contain a greater contribution from green leaves and a lower contribution from shadows, the background, and from dry branches inside the canopies than the canopy reflectance of young Eucalyptus plantations (< 4 years). Thus, the canopy reflectance of older Eucalyptus planta- tions generated highest correlations with bands of the in- frared region of the electromagnetic spectrum and, therefore, with vegetation indices that include these bands in their compositions (Ponzoni et al. 2015). These results are consistent with the best correlations found in this study among the infrared bands, vegetation indices derived from these bands, basal area, and volume. This same behaviour was observed in the studies of Gebreslasie et al. (2008), Canavesi et al. (2010), Berra et al. (2012), and Pacheco et al. (2012). Basal area was more strongly correlated with the spec- tral data because this variable is derived from only the diameter of the trees, which is directly related to size of the tree canopies, and determines the canopy reflectance Fig. 4 Experimental semivariograms of residuals from: MLR (a)and (e); (Ponzoni et al. 2012). On the other hand, volume is RF (b)and (f); SVM (c) and (g); and ANN (d) and (h)for basal area and derived from the diameter, form factor, and height of the volume, respectively trees. Height estimates are obtained from empirical equations that add errors during the volume estimation 2 2 Table 7 Nugget (τ ), sill (σ ), and range (ϕ) parameters for the selected semivariance function models for each of the variables in study 2 2 Variables Residual Selected model τ σ ϕ (m) RAE SRE Basal area MLR Exponential 0.0016 0.0067 1350 − 0.0092 1.0818 RF Spherical 0.0004 0.0009 737 − 0.0079 1.0586 SVM Gaussian 0.0017 0.0037 1577 0.0089 0.9610 ANN Exponential 0.0000 0.0119 1430 − 0.0303 1.1393 Volume MLR Exponential PNE PNE PNE PNE PNE RF Spherical 0.3316 0.2505 773 − 0.0051 1.0258 SVM Exponential 0.0000 2.5582 858 − 0.0039 0.9958 ANN Exponential PNE PNE PNE PNE PNE MLR multiple linear regression, RF random forest, SVM support vector machine, ANN artificial neural networks, RAE reduced average error, SRE standard deviation of the reduced average error, PNE pure nugget effect dos Reis et al. New Zealand Journal of Forestry Science (2018) 48:1 Page 11 of 17 Table 8 Prediction methods with addition of the residual estimation by ordinary kriging using the prediction and validation sets for the Eucalyptus stands 2 3 Method Statistic Basal area error (m ) Volume error (m ) Prediction set Validation set Prediction set Validation set MLR-RK ME 0.00 − 0.03 –– MAE 0.03 0.09 –– RMSE 0.04 0.14 –– RMSE (%) 2.80 9.30 –– RI 49.09 1.27 –– RF-RK ME 0.01 − 0.05 0.00 − 1.03 MAE 0.04 0.10 0.63 1.70 RMSE 0.05 0.15 0.77 2.26 RMSE (%) 3.08 10.08 5.02 15.25 RI − 24.19 − 5.66 − 5.24 − 2.28 SVM-RK ME 0.01 − 0.03 − 0.32 − 0.57 MAE 0.05 0.10 0.80 1.22 RMSE 0.06 0.15 1.11 1.74 RMSE (%) 4.09 9.83 7.19 11.72 RI 1.21 − 4.69 30.93 13.70 ANN-RK ME 0.02 − 0.06 –– MAE 0.04 0.06 –– RMSE 0.09 0.09 –– RMSE (%) 5.79 6.37 –– RI 34.72 25.23 –– MLR multiple linear regression, RF random forest, SVM support vector machine, ANN artificial neural networks, RK residual estimation by ordinary kriging, ME mean error, MAE mean absolute error, RMSE root mean square error, RI relative improvement process. This acts to reduce the strength of relationships regression assumptions is that no linear relationship may between volume and variables obtained from remotely exist between any independent variables or linear combina- sensed images. The ND54 index was the spectral variable tions of these (Montgomery et al. 2006). that had the strongest correlation with basal area (r = − 0.91) For the MLR method, the best volume estimation and volume (r = − 0.52). However, it was also significantly model was obtained from the TM1 band and the NDVI correlated with the other spectral variables. During multiple (Table 5), yet was only able to explain approximately linear regression analysis, the fact that two or more explana- 37% of the variation in this stand attribute. Conversely, tory variables are highly correlated may generate multicolli- the best model for basal area estimation used the ND54 nearity problems in the fitted models, since one of the index as the predictor variable and was able to explain Table 9 Statistics of basal area and volume maps estimated by spatial predictions methods MLR, RF, SVM, and ANN 2 3 Method Basal area (m ) Volume (m ) Min Max Mean Standard deviation Total estimate Min Max Mean Standard deviation Total estimate MLR 0.62 1.83 1.52 0.20 6151.9 4.51 19.99 15.30 2.30 61,739.5 MLR-RK 0.65 1.93 1.52 0.21 6141.0 –– – – – RF 0.96 1,89 1.51 0.17 6101.5 9.26 18.08 15.27 1.81 61,600.1 RF-RK 0.93 1.93 1.53 0.17 6166.6 9.02 18.37 15.36 1.91 61,965.7 SVM 0.88 2.12 1.57 0.18 6326.2 1.36 19.64 15.31 2.57 61,760.7 SVM-RK 0.76 2.10 1.56 0.19 6284.2 1.10 21.78 15.29 2.92 61,683.8 ANN 0.97 1.65 1.42 0.22 5715.3 8.32 15.68 13.93 2.70 56,223.8 ANN-RK 0.90 1.94 1.50 0.23 6070.3 –– – – – Min minimum value, Max maximum value, MLR multiple linear regression, RF random forest, SVM support vector machine, ANN artificial neural networks, RK residual estimation by ordinary kriging dos Reis et al. New Zealand Journal of Forestry Science (2018) 48:1 Page 12 of 17 Fig. 5 Spatial distribution of the basal area in Eucalyptus stands, estimated by: MLR (a), RF (b), SVM (c), and ANN (d) more than 80% of the variation in this attribute, con- species in southern KwaZulu Natal, South Africa. These firming the explanatory power of spectral data for basal authors applied a MLR using MSAVI and band 3 as pre- area estimation in Eucalyptus stands. Gebreslasie et al. dictor variables and were able to explain slightly more of the 2 2 (2010) assessed the suitability of both visible and short- variation in basal area (R = 0.67) than volume (R = 0.65). wave infrared ASTER data and vegetation indices for Although the MLR model for volume does not have a high estimating forest structural attributes of Eucalyptus coefficient of determination, the spectral data can efficiently Fig. 6 Spatial distribution of the volume in Eucalyptus stands, estimated by: MLR (a); RF (b); SVM (c); and ANN (d) dos Reis et al. New Zealand Journal of Forestry Science (2018) 48:1 Page 13 of 17 Fig. 7 Spatial distribution of the basal area in Eucalyptus stands estimated by: MLR (a): RF (b); SVM (c); and ANN (d) with addition of the residual estimation by ordinary kriging; and for volume estimated by RF (e) and SVM (f) with addition of the residual estimation by ordinary kriging explain the volumetric variations in non-sampled areas in allows them to work with all available explanatory vari- the field. In a similar study for Eucalyptus stands located in ables, without loss of information in the process of vari- the southern region of Brazil, Berra et al. (2012) concluded able selection and reduction (Görgens et al. 2015). For the that spectral data obtained from Landsat images were effi- models fitted using ANN and SVM algorithms, the TM2 cient in mapping the volume in the study area, even when band was the most important predictor variable for basal the regression models did not present high coefficients of area and volume. The linear correlation between this determination (R < 0.70). variable and basal area and volume is low to non-existent Divergence among variables that were deemed import- (r =0.15 and − 0.02, respectively). However, this band is ant between the different methods was observed with the usually applied in vegetation vigour assessment (Meng et machine learning algorithms. For basal area modelling, al. 2009), a characteristic that is indirectly related to the ND54 index and NDVI had a higher importance value volume and basal area, and which may explain the greater for RF. Statistically, these indices had high correlation contribution of the TM2 band in the ANN and SVM algo- values with the variable of interest (r = − 0.91 and 0.83, rithms, since trees that are more vigorous tend to have respectively) and high multicollinearity (r = − 0.93). The higher values of basal area and volume. ND54 index also was the variable that most contributed The models of basal area and volume developed by to the volume estimate by the RF method. The fact that the RF algorithm had smaller errors compared with the explanatory variables are correlated does not affect the those developed by other machine learning algorithms performance of these algorithms. These methods do not and MLR. The performance of this algorithm has been rely on underlying assumptions about the data, which proven in many modelling and remote sensing studies dos Reis et al. New Zealand Journal of Forestry Science (2018) 48:1 Page 14 of 17 (Lafiti et al. 2010, Rodriguez-Galiano et al. 2015, Wu et analysis and machine learning algorithms). However, in al. 2016). In the study by Shataee et al. (2012), volume some situations, hybrid methods provide less-accurate prediction models developed by RF performed better estimates in regions where the data collected in the field than those developed using k-nearest neighbour (k-NN) are sparse (Palmer et al. 2009). and SVM. Employing ASTER satellite data, the relative The high growth rate of Eucalyptus stands in Brazil RMSE obtained for all three volume models was higher reinforces the importance of robust methods that con- than for the models developed in our study: 28.54% for sider auxiliary information in the process of estimating k-NN, 25.86% for SVM, and 26.86% for RF, and only the variables of interest, such as basal area and volume. The RF algorithm produced unbiased volume estimations. methodologies presented here are powerful tools for For basal area, RF produced models with lower RMSE estimating basal area and volume from spectral data ob- (18.39%) when compared with SVM (RMSE = 19.35%) tained from Landsat 5 TM or from other multispectral and k-NN (RMSE = 20.20%); however, only k-NN was optical sensors. According to Görgens et al. (2015), ma- able to generate unbiased estimation compared with the chine learning algorithms can continuously learn from other two algorithms used. new data and keep all the accumulated knowledge of One of the positive features of RF is that it achieves previous datasets. This fact allows the implementation of satisfactory performance even with a limited number of these algorithms in other situations where only limited samples and with many independent variables (attri- amounts of data are available. The use of all auxiliary butes), as in the case of this current study. It is an variables in the estimation process is another advantage ensemble method, which combines several regression over traditional regression methods, since machine trees to generate an average estimate, in which different learning algorithms are not restricted by correlation attributes are used in each tree, making the results take between input variables, thus avoiding the loss of into account the information of all available attributes. important information in the estimation process of the Stojanova et al. (2010) also concluded that ensemble variable of interest. Nevertheless, these methods have as methods (RF) were significantly better in height and disadvantage the transparency of the resulting models, canopy cover modelling using remote sensing data than so an alternative to overcome this obstacle is the evalu- single- and multi-target regression trees. The ANN and ation of the relative importance of the explanatory vari- SVM algorithms also have proven good performance ables. Furthermore, the causal relation between inputs and robustness in several studies (e.g. Shao and Lunetta and outputs of the estimation process is not clear, which 2012, Were et al. 2015). However, the parameterisation implies a limited biological interpretation (Aertsen et al. of these methods is laborious, and they are very sensitive 2010, Özçelik et al. 2013). to the variation of input parameters, with ANN being The results from the current study do need to be more sensitive than other methods (Rodriguez-Galiano interpreted cautiously, as they are limited to a et al. 2015). This same behaviour was observed in this homogenous and relatively small study area. While this study, where the use of a restricted dataset by ANN work uses a small number of plots, it represents the resulted in estimates that were not compatible with the sampling intensity adopted by most Brazilian forestry forest inventory estimates (Tables 3 and 9). companies, i.e. one plot (usually 200–500 m in size) for The addition of residual kriging in spatial prediction each 10 ha of Eucalyptus plantation (Raimundo et al. methods did not necessarily result in relative improve- 2017, Scolforo et al. 2016) and the results from this ments in the estimation of these methods. In the case of research showcase the importance of using remotely MLR and ANN methods, residual kriging contributed to sensed data and robust prediction methods for basal better accuracy of the basal area estimates. These results area and volume estimation. The data used here were are consistent with the results of Dai et al. (2014), who also from a relatively old sensor, Landsat 5 TM, and a reported that the combination of the residual kriging study by Fassnacht et al. (2014) concluded that predictor with artificial neural networks provides an improvement data (sensor) type is the most important factor for the in the estimate accuracy of the variables of interest. The accuracy of biomass estimates and that the prediction combination of MLR with residual kriging also provided method had a substantial effect on accuracy and was improvements in estimates in the studies of Viana et al. generally more important than the sample size. (2012), Castillo-Santiago et al. (2013), and Galeana- Fassnacht et al. (2014) also suggested that choosing the Pizaña et al. (2014). For basal area and volume estima- appropriate statistical method may be more effective tion, the addition of residual kriging in the RF and SVM than obtaining additional field data for obtaining good methods resulted in a lower precision of the estimates. biomass estimates. Hybrid methods are advantageous in the ability to use Considering the cost of improving accuracy of timber spatial information (ordinary kriging of residuals) and production estimates by field measurements in Eucalyptus non-spatial information (multiple linear regression stands, it seems sensible to invest in further studies that dos Reis et al. New Zealand Journal of Forestry Science (2018) 48:1 Page 15 of 17 focus on more test sites and a wider range of sensor sys- machine; S : Residual standard error; TM: Thematic mapper; USGS: United xy States Geological Survey; V: Volume; VIF: Variance inflation factor tems (particularly RADAR and LIDAR). This would further increase our understanding of the role of the statistical Acknowledgements model set-up in remote sensing-based estimates of forest We thank CAPES - Coordenadoria de Aperfeiçoamento do Pessoal do Ensino Superior (Brazilian Federal Agency for Support and Evaluation of Graduate variables in Eucalyptus stands. Further studies could also Education) for the scholarships provided to AAR and MCC. investigate whether other prediction methods, such as non- linear regression or partial least squares regression (PLSR) Funding Not applicable approaches, alter our findings. The integration of additional predictors (e.g. topographic information or climate vari- Availability of data and materials ables) would be a further possible extension of our work. Not applicable Authors’ contributions Conclusions All authors contributed substantially to the work reported here. AAR, MCC, LRG, and ACFF analysed and interpreted the data. ARR and MCC wrote the Machine learning algorithms, particularly the random for- manuscript. JMM, ACFF, LRG, and FWAJ reviewed and edited the manuscript. est (RF) and support vector machine (SVM) algorithms, All authors read and approved the final manuscript. were able to develop models that estimate basal area and Competing interests volume in Eucalyptus stands using spectral data collected The authors declare that they have no competing interests. from Landsat 5 TM images. The artificial neural network (ANN) method did not perform well in this context, due Publisher’sNote in part to the limited data availability. Springer Nature remains neutral with regard to jurisdictional claims in Random forest was the best method of spatial prediction published maps and institutional affiliations. and mapping of basal area and volume in Eucalyptus Author details stands in Minas Gerais state. However, due to the close 1 Department of Forest Science, Federal University of Lavras – UFLA, PO Box performance to the support vector machine and multiple 3037, Lavras, Minas Gerais 37200-000, Brazil. CPCE, Federal University of Piauí, BR 135 - km 3, Bom Jesus, Piauí 64900-000, Brazil. linear regression methods, we propose that both methods should be tested and then the best result applied for Received: 3 February 2017 Accepted: 27 December 2017 spatial prediction of basal area and volume in other regions with Eucalyptus stands. The approaches used in References this study provide a framework for integrating field and Aertsen, W, Kint, V, Van Orshoven, J, Özkan, KA, Muys, B. (2010). Comparison and multispectral data, highlighting methods that greatly ranking of different modelling techniques for prediction of site index in Mediterranean mountain forests. Ecological Modelling, 221, 1119–1130. improve spatial prediction of basal area and volume esti- https://doi.org/10.1016/j.ecolmodel.2010.01.007. mation in Eucalyptus stands. Although the sensor TM of Ahmed, OS, Franklin, SE, Wulder, MA, White, JC. (2015). Characterizing stand-level Landsat satellites is no longer operational, the concepts forest canopy cover and height using Landsat time series, samples of airborne LiDAR, and the random forest algorithm. ISPRS Journal of presented in this study are expected to be consistent Photogrammetry and Remote Sensing, 101,89–101. https://doi.org/10.1016/j. regardless of the sensor. Thus, the approach used in this isprsjprs.2014.11.007. study can be more broadly applied to basal area and Alvares, CA, Stape, JL, Sentelhas, PC, Gonçalves, JLM, Sparovek, G. (2013). Köppen’s climate classification map for Brazil. Meteorologische Zeitschrift, 6, volume estimation in Eucalyptus stands using the new 711–728. https://doi.org/10.1127/0941-2948/2013/0507. optical sensors such as Landsat 8 OLI and Sentinel-2. Barrios, PG, Bidegain, MP, Gutiérrez, L. (2015). Effects of tillage intensities on The combination of spatial prediction methods with spatial soil variability and site-specific management in early growth of Eucalyptus grandis. Forest Ecology and Management, 346,41–50. https://doi. residual kriging should be used with caution, since the org/10.1016/j.foreco.2015.02.031. relative improvement of spatial prediction accuracy of Berra, EF, Brandelero, C, Pereira, RS, Sebem, E, Goergen, LCG, Benedetti, ACP, basal area and volume did not occur in all methods, and Lippert, DB. (2012). Estimativa do volume total de madeira em espécies de eucalipto a partir de imagens de satélite Landsat. Ciência Florestal, 22(4), 853– there is not always a spatial dependency structure in the 864. https://doi.org/10.5902/198050987566. residuals of a spatial prediction method. Boisvenue, C, Smiley, BP, White, JC, Kurz, WA, Wulder, MA. (2016). Integration of Landsat time series and field plots for forest productivity estimates in Abbreviations decision support models. Forest Ecology and Management, 376, 284–297. AIC: Akaike information criterion; ANN: Artificial neural networks; https://doi.org/10.1016/j.foreco.2016.06.022. EVI: Enhanced vegetation index; G: Basal area; GDP: Gross domestic product; Breiman, L. (2001). Random forests. Machine Learning, 45,5–32. https://doi.org/10. GEMI: Global environment monitoring index; GIS: Geographical information 1023/A:1010933404324. systems; GPS: Global positioning systems; MAE: Mean absolute error; Canavesi, V, Ponzoni, FJ, Valeriano, MM. (2010). Estimativa de volume de madeira ME: Mean error; MLA: Machine learning algorithms; MLR: Multiple linear em plantios de Eucalyptus spp. utilizando dados hiperespectrais e dados regression; MSAVI: Modified soil-adjusted vegetation index; ND: Normalised topográficos. Revista Árvore, 4(3), 539–549. https://doi.org/10.1590/S0100- difference; NDVI: Normalised difference vegetation index; PNE: Pure nugget 67622010000300018. effect; R : Adjusted coefficient of determination; RAE: Reduced average error; Castillo-Santiago, MA, Ghilardi, A, Oyama, K, Hernández-Stefanoni, JL, Torres, I, aj RBF: Radial basis function; RF: Random forest; RI: Relative improvement; Flamenco-Sandoval, A, Fernández, A, Mas, JF. (2013). Estimating the spatial RK: Residual estimation by ordinary kriging; RMSE: Root mean square error; distribution of woody biomass suitable for charcoal making from remote SAVI: Soil-adjusted vegetation index; SMO: Sequential minimal optimization; sensing and geostatistics in central Mexico. Energy for Sustainable SRE: Standard deviation of the reduced average error; SVM: Support vector Development, 17, 177–188. https://doi.org/10.1016/j.esd.2012.10.007. dos Reis et al. New Zealand Journal of Forestry Science (2018) 48:1 Page 16 of 17 Cluter, MEJ, Boyd, DS, Foody, GM, Vetrivel, A. (2012). Estimating tropical forest nitens (Deane & Maiden) maiden short rotation woody crops in Northwest biomass with a combination of SAR image texture and Landsat TM data: An Spain. New Forests, 46, 387–407. https://doi.org/10.1007/s11056-015-9467-7. assessment of predictions between regions. ISPRS Journal of Photogrammetry Görgens, EB, Montaghi, A, Rodriguez, LCE. (2015). A performance comparison of and Remote Sensing, 70,66–77. https://doi.org/10.1016/j.isprsjprs.2012.03.011. machine learning methods to estimate the fast-growing forest plantation Coops, NC, Johnson, M, Wulder, MA, White, JC. (2006). Assessment of QuickBird yield based on laser scanning metrics. Computers and Electronics in high spatial resolution imagery to detect red attack damage due to Agriculture, 116, 221–227. https://doi.org/10.1016/j.compag.2015.07.004. mountain pine beetle infestation. Remote Sensing of Environment, 103(1), 67– Guedes, ICL, Mello, JM, Silveira, EMO, Mello, CR, Reis, AA, Gomide, LR. (2015). 80. https://doi.org/10.1016/j.rse.2006.03.012. Spatial continuity of dendrometric characteristics in clonal cultivated Dai, F, Zhou, Q, Lv, Z, Wang, X, Liu, G. (2014). Spatial prediction of soil organic Eucalyptus sp. throughout the time. Cerne, 21(4), 527–534. https://doi.org/10. matter content integrating artificial neural network and ordinary kriging in 1590/01047760201521041824. Tibetan plateau. Ecological Indicators, 45, 184–194. https://doi.org/10.1016/j. Huang, C, Song, K, Kim, S, Townshend, JRG, Davis, P, Masek, JG, Goward, SN. ecolind.2014.04.003. (2008). Use of a dark object concept and support vector machines to Diamantopoulou, MJ. (2012). Assessing a reliable modeling approach of features automate forest cover change analysis. Remote Sensing of Environment, 112, of trees through neural network models for sustainable forests. Sustainable 970–985. https://doi.org/10.1016/j.rse.2007.07.023. Computing: Informatics and Systems, 2, 190–197. https://doi.org/10.1016/j. Huete, A, Didan, K, Miura, T, Rodriguez, EP, Gao, X, Ferreira, LG. (2002). Overview suscom.2012.10.002. of the radiometric and biophysical performance of the MODIS vegetation Dube, T, & Mutanga, O. (2015). Investigating the robustness of the new Landsat-8 indices. Remote Sensing of Environment, 83, 195–213. https://doi.org/10.1016/ operational land imager derived texture metrics in estimating plantation S0034-4257(02)00096-2. forest aboveground biomass in resource constrained areas. ISPRS Journal of Huete, AR. (1988). A soil-adjusted vegetation index (SAVI). Remote Sensing of Photogrammetry and Remote Sensing, 108,12–32. https://doi.org/10.1016/j. Environment, 25, 295–309. https://doi.org/10.1016/0034-4257(88)90106-X. isprsjprs.2015.06.002. Immitzer, M, Atzberger, C, Koukal, T. (2012). Tree species classification with Dube, T, Mutanga, O, Adam, E, Ismail, R. (2014). Intra-and-inter species biomass random forest using very high spatial resolution 8-band WorldView-2 satellite prediction in a plantation forest: Testing the utility of high spatial resolution data. Remote Sensing, 4, 2661–2693. https://doi.org/10.3390/rs4092661. Spaceborne multispectral RapidEye sensor and advanced machine learning Indústria Brasileira de Árvores (2014). Anuário estatístico da indústria brasileira de algorithms. Sensors, 14, 15348–15370. https://doi.org/10.3390/s140815348. árvores: ano base 2014. Brasília: IBA. Dye, PJ, Jacobs, S, Drew, D. (2004). Verification of 3-PG growth and water-use Indústria Brasileira de Árvores (2015). Anuário estatístico da indústria brasileira de predictions in twelve Eucalyptus plantation stands in Zululand, South Africa. árvores: ano base 2015. Brasília: IBA. Forest Ecology and Management, 193, 197–218. https://doi.org/10.1016/j. Journel, AG, & Huijbregts, CJ (1978). Mining geostatistics. London: Academic. foreco.2004.01.030. Justice, CO, Vermote, E, Townshend, JRG, Defries, R, Roy, DO, Hall, DK, Environmental Systems Research Institute (2010). ArcGIS desktop: Release 10.1. Salomonson, VV, Privette, JL, Riggs, G, Strahler, A, Lucht, W, Myneni, RB, Redlands: ESRI. Knyazikhin, Y, Running, SW, Nemani, RR, Wan, Z, Huete, AR, Leeuwen, WV, Fassnacht, FE, Hartig, F, Latifi, H, Berger, C, Hernández, J, Corvalán, P, Koch, Wolfe, RE, Giglio, L, Muller, J, Lewis, P, Barnsley, MJ. (1998). The moderate B. (2014). Importance of sample size, data type and prediction method resolution imaging spectroradiometer (MODIS): Land remote sensing for for remote sensing-based estimations of aboveground forest biomass. global change research. IEEE Transactions on Geoscience and Remote Sensing, Remote Sensing of Environment, 154, 102–114. https://doi.org/10.1016/j.rse. 36(4), 1228–1249. https://doi.org/10.1109/36.701075. 2014.07.028. Lafiti, H, Nothdurft, A, Koch, B. (2010). Non-parametric prediction and mapping of Fayad, I, Baghdadi, N, Guitet, S, Bailly, JS, Hérault, B, Gond, V, Hajj, ME, Minh, DHT. standing timber volume and biomass in a temperate forest: Application of (2016). Aboveground biomass mapping in French Guiana by combining multiple optical/LiDAR-derived predictors. Forestry, 83(4), 395–407. https://doi. remote sensing, forest inventories and environmental data. International org/10.1093/forestry/cpq022. Journal of Applied Earth Observation and Geoinformation, 52, 502–514. https:// Lopes, DM, Aranha, JT, Walford, N, O’Brien, J, Lucas, N. (2009). Accuracy of remote doi.org/10.1016/j.jag.2016.07.015. sensing data versus other sources of information for estimating net primary Frank, E, Hall, MA, Witten, I (2016). The WEKA workbench [online appendix]. In I production in Eucalyptus globulus Labill. and Pinus pinaster Ait. ecosystems in Witten, E Frank, M Hall, C Pal (Eds.), Data mining: Practical machine learning Portugal. Canadian Journal of Remote Sensing, 35(1), 37–53. https://doi.org/10. tools and techniques, (4th ed., ). Burlington: Morgan Kaufmann. 5589/m08-078. Galeana-Pizaña, JM, López-Caloca, A, López-Quiroza, P, Silván-Cárdenasa, JL, López-Sánchez, CA, García-Ramírez, P, Resl, R, José, C, Hernández-Díaz, JC, López- Couturier, S. (2014). Modeling the spatial distribution of above-ground Serrano, PM, Wehenkel, C. (2014). Modelling dasometric attributes of mixed and carbon in Mexican coniferous forests using remote sensing and a uneven-aged forests using Landsat-8 OLI spectral data in the Sierra Madre geostatistical approach. International Journal of Applied Earth Observation and Occidental, Mexico. iForest, 10,288–295. https://doi.org/10.3832/ifor1891-009. Geoinformation, 30, 179–189. https://doi.org/10.1016/j.jag.2014.02.005. López-Serrano, PM, Corral-Rivas, JJ, Díaz-Varela, RA. (2016). Evaluation of Gao, T, Zhu, J, Deng, S, Zheng, X, Zhang, J, Shang, G, Huang, L. (2016). Timber radiometric and atmospheric correction algorithms for aboveground forest production assessment of a plantation forest: An integrated framework with biomass estimation using Landsat 5 TM data. Remote Sensing, 8(5), 369. field-based inventory, multi-source remote sensing data and forest https://doi.org/10.3390/rs8050369. management history. International Journal of Applied Earth Observation and Lu, D, Mausel, P, Brondízio, E, Moran, E. (2004). Relationships between forest Geoinformation, 52, 155–165. https://doi.org/10.1016/j.jag.2016.06.004. stand parameters and Landsat TM spectral responses in the Brazilian Amazon García-Gutiérrez, J, Martínez-Álvarez, F, Troncoso, A, Riquelme, JC. (2015). A Basin. Forest Ecology and Management, 198, 149–167. https://doi.org/10.1016/ comparison of machine learning regression techniques for LiDAR-derived j.foreco.2004.03.048. estimation of forest variables. Neurocomputing, 167,24–31. https://doi.org/10. Masek, JG, Vermote, EF, Saleous, NE, Wolfe, R, Hall, FG, Huemmrich, KF, Gao, F, 1016/j.neucom.2014.09.091. Kutler, J, Lim, TK. (2006). A Landsat surface reflectance dataset for North Gebreslasie, MT, Ahmed, FB, Aardt, JAN. (2008). Estimating plot-level forest America, 1990 – 2000. IEEE Geoscience and Remote Sensing Letters, 3(1), 68–72. structural attributes using high spectral resolution ASTER satellite data in https://doi.org/10.1109/LGRS.2005.857030. even-aged Eucalyptus plantations in southern KwaZulu-Natal, South Africa. Meng, Q, Cieszewski, C, Madden, M. (2009). Large area forest inventory using Southern Forests, 70(3), 227–236. https://doi.org/10.2989/SF.2008.70.3.6.667. Landsat ETM+: A geostatistical approach. ISPRS Journal of Photogrammetry Gebreslasie, MT, Ahmed, FB, Aardt, JAN. (2010). Predicting forest structural and Remote Sensing, 64,27–36. https://doi.org/10.1016/j.isprsjprs.2008.06.006. attributes using ancillary data and ASTER satellite data. International Journal Montgomery, DC, Peck, EA, Vining, GG (2006). Introduction to linear regression of Applied Earth Observation and Geoinformation, 12S, S23–S26. https://doi. analysis. New York: Wiley. org/10.1016/j.jag.2009.11.006. Moreno, A, Neumann, M, Hasenauer, H. (2016). Optimal resolution for linking Gleason, CJ, & Im, J. (2012). Forest biomass estimation from airborne LiDAR data remotely sensed and forest inventory data in Europe. Remote Sensing of using machine learning approaches. Remote Sensing of Environment, 125,80–91. Environment, 183, 109–119. https://doi.org/10.1016/j.rse.2016.05.021. https://doi.org/10.1016/j.rse.2012.07.006. Morgenroth, J, & Visser, R. (2013). Uptake and barriers to the use of geospatial González-García, M, Hevia, A, Majada, J, Anta, RC, Barrio-Anta, M. (2015). Dynamic technologies in forest management. New Zealand Journal of Forestry Science, growth and yield model including environmental factors for Eucalyptus 43(16), 1–9. https://doi.org/10.1186/1179-5395-43-16. dos Reis et al. New Zealand Journal of Forestry Science (2018) 48:1 Page 17 of 17 Özçelik, R, Diamantopoulou, MJ, Crecente-Campo, F, Eler, U. (2013). Estimating Savannakhet, Lao PDR. Remote Sensing, 6, 5452–5479. https://doi.org/10.3390/ Crimean juniper tree height using nonlinear regression and artificial neural rs6065452. network models. Forest Ecology and Management, 306,52–60. https://doi.org/ Watt, MS, Dash, JP, Watt, P, Bhandari, S. (2016). Multi-sensor modelling of a forest 10.1016/j.foreco.2013.06.009. productivity index for radiata pine plantations. New Zealand Journal of Pacheco, LRF, Ponzoni, FJ, Santos, SB, Andrades Filho, CO, Mello, MP, Campos, RC. Forestry Science, 46, 9. https://doi.org/10.1186/s40490-016-0065-z. (2012). Structural characterization of canopies of Eucalyptus spp. using Watt, MS, Rubilar, R, Kimberley, MO, Kriticos, DJ, Emhart, V, Mardones, O, Acevedo, radiometric data from TM/Landsat 5. Cerne, 18(1), 105–116. https://doi.org/10. M, Pincheira, M, Stape, J, Fox, T. (2014). Using seasonal measurements to 1590/S0104-77602012000100013. inform ecophysiology: Extracting cardinal growth temperatures for process- based growth models of five Eucalyptus species/crosses from simple field Palmer, DJ, Höck, BK, Kimberley, MO, Watt, MS, Lowe, DJ, Payn, TW. (2009). trials. New Zealand Journal of Forestry Science, 44, 9. https://doi.org/10.1186/ Comparison of spatial prediction techniques for developing Pinus radiata s40490-014-0009-4. productivity surfaces across New Zealand. Forest Ecology and Management, Wear, DN, Dixon IV, E, Abt, RC, Singh, N. (2015). Projecting potential adoption of 258(9), 2046–2055. https://doi.org/10.1016/j.foreco.2009.07.057. genetically engineered freeze-tolerant Eucalyptus in the United States. Forest Pinty, B, & Verstraete, MM. (1992). GEMI: A non-linear index to monitor global Science, 61(3), 466–480. https://doi.org/10.5849/forsci.14-089. vegetation from satellites. Vegetatio, 101(1), 15–20. https://doi.org/10.1007/ Were, K, Bui, DT, Dick, OB, Singh, BR. (2015). A comparative assessment of support BF00031911. vector regression, artificial neural networks, and random forests for Ponzoni, FJ, Pacheco, LRF, Santos, SB, Andrades Filho, CO. (2015). Caracterização predicting and mapping soil organic carbon stocks across an Afromontane espectro-temporal de dosséis de Eucalyptus spp. mediante dados landscape. Ecological Indicators, 52, 394–403. https://doi.org/10.1016/j.ecolind. radiométricos TM/Landsat 5. Cerne, 21(2), 267–275. https://doi.org/10.1590/ 2014.12.028. Wu, C, Shen, H, Shen, A, Deng, J, Gan, M, Zhu, J, Xu, H, Wang, K. (2016). Ponzoni, FJ, Shimabukuro, YE, Kuplich, TM (2012). Sensoriamento Remoto da Comparison of machine-learning methods for above-ground biomass Vegetação, (2nd ed., ). São Paulo: Oficina de Textos. estimation based on Landsat imagery. Journal of Applied Remote Sensing, 10, Qi, J, Chehbouni, A, Huete, AR, Kerr, YH, Sorooshian, S. (1994). A modified soil 3. https://doi.org/10.1117/1.JRS.10.035010. adjusted vegetation index. Remote Sensing of Environment, 48, 119–126. Yamamoto, JK, & Landim, PMB (2013). Geoestatística: conceitos e aplicações. São https://doi.org/10.1016/0034-4257(94)90134-1. Paulo: Oficina de Textos. R Core Team (2016). R: A language and environment for statistical computing. Zhang, J, Huang, S, Hogg, EH, Lieffers, V, Qin, Y, He, F. (2014). Estimating spatial Vienna: R Foundation for Statistical Computing. variation in Alberta forest biomass from a combination of forest inventory Raimundo, MR, Scolforo, HF, Mello, JM, Scolforo, JRS, McTague, JP, Reis, AA. and remote sensing data. Biogeosciences, 11, 2793–2808. https://doi.org/10. (2017). Geostatistics applied to growth estimates in continuous forest 5194/bg-11-2793-2014. inventories. Forest Science, 63(1), 29–38. https://doi.org/10.5849/FS.2016-056. Retslaff, FAS, Figueiredo Filho, A, Dias, AN, Bernett, LG, Figura, MA. (2015). Curvas de sítio e relações hipsométricas para Eucalyptus grandis na Região dos Campos Gerais, Paraná. Cerne, 2(2), 199–207. https://doi.org/10.1590/ Ribeiro Júnior, PJ, & Diggle, PJ. (2001). GeoR: A package for geostatistical analysis. R-NEWS, 1(2), 15–18. Rodriguez-Galiano, V, Castillo, MS, Chica-Olmo, M, Chica-Rivas, M. (2015). Machine learning predictive models for mineral prospectivity: An evaluation of neural networks, random forest, regression trees and support vector machines. Ore Geology Reviews, 71, 804–818. https://doi.org/10.1016/j.oregeorev.2015.01.001. Rouse, J, Haas, R, Schell, J, Deering, D, Harlan, J (1973). Monitoring the vernal advancements and retrogradation (greenwave effect) of nature vegetation. NASA/GSFC final report. Greenbelt: NASA. Scolforo, HF, Castro Neto, F, Scolforo, JRS, Burkhart, H, McTague, JP, Raimundo, MR, Loos, RA, Fonseca, S, Sartório, RC. (2016). Modeling dominant height growth of Eucalyptus plantations with parameters conditioned to climatic variations. Forest Ecology and Management, 380, 182–195. https://doi.org/10. 1016/j.foreco.2016.09.001. Shao, Y, & Lunetta, RS. (2012). Comparison of support vector machine, neural network, and CART algorithms for the land-cover classification using limited training data points. ISPRS Journal of Photogrammetry and Remote Sensing, 70, 78–87. https://doi.org/10.1016/j.isprsjprs.2012.04.001. Shataee, S, Kalbi, S, Fallah, A, Pelz, D. (2012). Forest attribute imputation using machine-learning methods and ASTER data: Comparison of k-NN, SVR and random forest regression algorithms. International Journal of Remote Sensing, 33, 6254–6280. https://doi.org/10.1080/01431161.2012.682661. Stojanova, D, Panov, P, Gjorgjioski, V, Kobler, A, Džeroski, S. (2010). Estimating vegetation height and canopy cover from remotely sensed data with machine learning. Ecological Informatics, 5, 256–266. https://doi.org/10.1016/j. ecoinf.2010.03.004. United States Geological Survey (2017). Landsat imagery. Available online at: https://earthexplorer.usgs.gov. Accessed Jan 2017. Verma, NK, Lamb, DW, Reid, N, Wilson, B. (2014). An allometric model for estimating DBH of isolated and clustered Eucalyptus trees from measurements of crown projection area. Forest Ecology and Management, 326, 125–132. https://doi.org/10.1016/j.foreco.2014.04.003. Viana, H, Aranha, J, Lopes, D, Cohen, WB. (2012). Estimation of crown biomass of Pinus pinaster stands and shrubland above-ground biomass using forest inventory data, remotely sensed imagery and spatial prediction models. Ecological Modelling, 226,22–35. https://doi.org/10.1016/j.ecolmodel.2011.11.027. Vicharnakorn, P, Shrestha, RP, Nagai, M, Salam, AP, Kiratiprayoon, S. (2014). Carbon stock assessment using remote sensing and forest inventory data in

Journal

New Zealand Journal of Forestry ScienceSpringer Journals

Published: Dec 1, 2018

References