Open Advanced Search
Get 20M+ Full-Text Papers For Less Than $1.50/day.
Start a 14-Day Trial for You or Your Team.
Learn More →
Evaluation of Seven Gap-Filling Techniques for Daily Station-Based Rainfall Datasets in South Ethiopia
Evaluation of Seven Gap-Filling Techniques for Daily Station-Based Rainfall Datasets in South...
Chinasho, Alefu;Bedadi, Bobe;Lemma, Tesfaye;Tana, Tamado;Hordofa, Tilahun;Elias, Bisrat
Hindawi Advances in Meteorology Volume 2021, Article ID 9657460, 15 pages https://doi.org/10.1155/2021/9657460 Research Article Evaluation of Seven Gap-Filling Techniques for Daily Station-Based Rainfall Datasets in South Ethiopia 1,2 1 1 3 4 Alefu Chinasho , Bobe Bedadi, Tesfaye Lemma, Tamado Tana, Tilahun Hordofa, and Bisrat Elias Africa Center of Excellence for Climate-SABC, Haramaya University, P.O. Box 138, Haramaya, Ethiopia Department of Environmental Science, Wolaita Sodo University, P.O. Box 138, Wolaita Sodo, Ethiopia Department of Crop Production, Faculty of Agriculture, University of Eswatini, P.O. M205, Luyengo, Eswatini Ethiopia Institute of Agricultural Research, Melkasa Research Center, P.O. Box. 436, Adama, Ethiopia Faculty of Meteorology and Hydrology, Arba Minch University, P.O. Box. 21, Arba Minch, Ethiopia Correspondence should be addressed to Alefu Chinasho; email@example.com Received 5 June 2021; Revised 21 July 2021; Accepted 13 August 2021; Published 19 August 2021 Academic Editor: Stefano Federico Copyright © 2021 Alefu Chinasho et al. &is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Meteorological stations, mainly located in developing countries, have gigantic missing values in the climate dataset (rainfall and temperature). Ignoring the missing values from analyses has been used as a technique to manage it. However, it leads to partial and biased results in data analyses. Instead, ﬁlling the data gaps using the reference datasets is a better and widely used approach. &us, this study was initiated to evaluate the seven gap-ﬁlling techniques in daily rainfall datasets in ﬁve meteorological stations of Wolaita Zone and the surroundings in South Ethiopia. &e considered gap-ﬁlling techniques in this study were simple arithmetic means (SAM), normal ratio method (NRM), correlation coeﬃcient weighing (CCW), inverse distance weighting (IDW), multiple linear regression (MLR), empirical quantile mapping (EQM), and empirical quantile mapping plus (EQM ). &e techniques were preferred because of their computational simplicity and appreciable accuracies. &eir performance was evaluated against mean absolute error (MAE), root mean square error (RMSE), skill scores (SS), and Pearson’s correlation coeﬃcients (R). &e results indicated that MLR outperformed other techniques in all of the ﬁve meteorological stations. It showed the lowest RMSE and the highest SS and R in all stations. Four techniques (SAM, NRM, CCW, and IDW) showed similar performance and were second-ranked in all of the stations with little exceptions in time series. EQM improved (not substantial) the performance levels of gap-ﬁlling techniques in some stations. In general, MLR is suggested to ﬁll in the missing values of the daily rainfall time series. However, the second-ranked techniques could also be used depending on the required time series (period) of each station. &e techniques have better performance in stations located in higher altitudes. &e authors expect a substantial contribution of this paper to the achievement of sustainable development goal thirteen (climate action) through the provision of gap-ﬁlling techniques with better accuracy. predominantly caused by the provisional absence of ob- 1. Introduction servers, equipment miscarriage, data archiving, and irregular Rainfall (precipitation) is one of the key inputs in many calibration of devices [4, 5]. Ignoring the missing values disciplines such as climatology (climate variability and from analyses has been used as a technique to manage it change), meteorology (weather conditions), irrigation en- [6–8]. However, it leads to partial (coarse resolution) and gineering (irrigation scheduling), hydrology (water cycle), biased results in data analyses [9–11]. Instead, ﬁlling the data and environmental hazard assessment (ﬂoods). Despite its gaps using reference datasets such as reanalysis products or overriding uses, the rainfall dataset of meteorological sta- estimates from the surrounding stations are better and tions has gigantic missing values, mainly in developing widely used approaches [12–15]. Ample gap-ﬁlling tech- countries [1–3]. Data gaps in rainfall time series are niques have been evaluated and suggested in the literature to 2 Advances in Meteorology globe (Serially Complete Earth) dataset using the global ﬁll in the missing daily rainfall time series at diﬀerent parts of the world. &e majority of gap-ﬁlling techniques are spatial historical climatology network daily (GHCND), a global surface summary of the day (GSOD), and Environment and interpolation methods. Among the spatial interpolation techniques, simple Climate Change Canada (ECCC). In addition, Noh and Ahn arithmetic mean (SAM) was indicated for its best perfor-  developed a new gridded rainfall dataset (K-Hydra) mance and computational simplicity in some studies over the Korean peninsula to ﬁll rainfall data gaps, which has [16–18]. But, [19, 20] prioritized inverse distance weighting comparable performance with global precipitation clima- (IDW) over other spatial interpolation techniques. Out- tology project (GPCP), climate prediction center (CPC), performance was also reported for the normal ratio method tropical rainfall measuring mission (TRMM), and Asian precipitation highly resolved observational data integration (NRM) [16, 21, 22], correlation coeﬃcient weighing (CCW) , and multiple linear regression (MLR) . Never- towards evaluation (APHRODITE). In Ethiopia, some studies evaluated and suggested dif- theless, Longman et al.  speciﬁed no statistical diﬀer- ences (similar performance) between ﬁve spatial ferent gap-ﬁlling techniques for daily rainfall time series. For instance, Boke  evaluated ﬁve spatial gap-ﬁlling tech- interpolation techniques (normal ratio method, linear re- gression, inverse distance weighting, quantile mapping, niques in ten meteorological stations in Ethiopia and sug- and single best estimator) for large gaps. Machine learning gested the nearest neighbor, inverse distance weighting processes such as the artiﬁcial neural network (ANN), average, and modiﬁed inverse distance weighting average for Kernel approaches, and kriging are also suggested in some the country. Woldesenbet et al.  also tested four gap- studies to ﬁll the rainfall data gaps. &e best performance of ﬁlling techniques in 38 stations in the upper Blue Nile basin the machine learning process was stated for ANN [26, 27], of Ethiopia in which CCW showed the best performance over NRM, modiﬁed NRM, and IDW. Similarly, Armanuos ordinary kriging [28, 29], and Kernel approaches . Besides, Grillakis et al.  indicated the acceptable per- et al.  assessed twenty-one (21) gap-ﬁlling methods in 15 stations and suggested that NRM, MLR, IDW, CCW, and formance of empirical quantile mapping in ﬁlling the discontinued daily rainfall data in the Mediterranean island SAM ﬁll in the missing rainfall data in Ethiopia. &e reviewed literature indicates that the performances of gap- of Crete. Combining or modifying the previously existing gap- ﬁlling techniques vary between stations, considered evalu- ﬁlling techniques is also reported for better performance ation criteria, statistical properties of data , and density than using the techniques separately. Teegavarapu et al.  and the geometrical organization of the station network . indicated that the linear weight optimization method Yet, to the authors’ best knowledge, none of the reviewed (LWOM) with a single best estimator (SBE) performed literature and no related study covered the meteorological better than SBE only in Florida. Similarly, Kim and stations located in Wolaita Zone and the surroundings. Moreover, the applicability of gap-ﬁlling methods is limited Pachepsky  concluded that the regression tree (RT) with ANN showed better performance than solely using RT or by many factors including the required computational skill and the percentage of gaps in the data . On the other ANN in the Chesapeake Bay watershed of the USA. Fur- thermore, Khosravi et al.  presented better performance hand, Ethiopia is a large country covering about 1,104,300 of the modiﬁed geographical coordinate (GC) method than square kilometers  in which directly using any of the the previously available methods in 24 station gauges in Iran. suggested techniques for the entire country is not repre- Mart´ınez et al.  also showed that the generalization of the sentative and can lead to biased results. So, testing the gap- modiﬁed normal ratio with the inverse distance weighting ﬁlling techniques at local levels is very important. and the generalization of modiﬁed correlation coeﬃcient &us, this study was initiated to evaluate the perfor- with the inverse distance weighting method outperformed mances of seven gap-ﬁlling techniques to ﬁll in the missing NRM, NRM weighted with correlation, NRM modiﬁed with values of daily rainfall data in the meteorological stations of Wolaita Zone and the surroundings in South Ethiopia. &e IDW, CCW, modiﬁed CCW, IDW, modiﬁed correlation coeﬃcient with IDW, IDW weighing of NRM with corre- seven selected techniques were simple arithmetic mean (SAM), normal ratio method (NRM), inverse distance lation, and IDW-modiﬁed height. Similarly, Rahman et al.  indicated that the generalized linear model with gamma weighting (IDW), correlation coeﬃcient weighing (CCW), and Fourier series was outperformed over SAM, NRM, multiple linear regression (MLR), empirical quantile map- CCW, and IDW in estimating the missing daily rainfall ping (EQM), and empirical quantile mapping plus (EQM ). series. &e Gaussian mixture model-based KNN imputation &e techniques were preferred among others due to their showed better performance level than KNN only . computational simplicity, wider application, and compa- Filling the rainfall data gaps using the reanalysis prod- rable performance with other techniques . Performances ucts is also another widely used approach. For example, of the techniques were tested against four evaluation criteria Cordeiro and Blanco  indicated that the Climate Haz- such as mean absolute error (MAE), root mean square error ards Group InfraRed Precipitation with Stations (CHIRPS) (RMSE), skill score (SS), and Pearson’s correlation coeﬃ- product outperformed the tropical rainfall measuring mis- cients. As well, the performance consistency was evaluated sion (TRMM) and Morphing Technique (CMORPH-CPC) on diﬀerent time scales. in estimating daily rainfall time series in the Amazon region. &e authors of this paper expect momentous contri- Further, Tang et al. [14, 15] ﬁlled the data gaps in daily butions of the paper to environmentalists, engineers, cli- rainfall of North America (serially complete NA) and the matologists, agriculturalists, and natural resource Advances in Meteorology 3 management experts facing rainfall data gaps. &e tech- from February and March to December. It ranges between ° ° niques included in this study can be tracked on any other 8.65 C (in Hosana) and 15.4 C (in Areka) (see Table 1). location and their performances can be compared with the ﬁndings of this work. Besides, the ﬁlled rainfall datasets of 2.2. Methodology. &e methodology of this work trailed the ﬁve meteorological stations (Areka, Bele, Boditi, Hosana, following processing steps. First, the data matrixes of ﬁve and Shone) are freely available based on requests. Moreover, stations with complete data (excluding the years of data our ﬁndings have a substantial contribution to sustainable missing) were prepared. For the gap-ﬁlling techniques other development goal (SDG) thirteen (climate action) by pro- than quantile mapping and quantile mapping plus, the viding the ﬁlled and summarized rainfall data freely so that datasets of ﬁve (all) stations were considered. In the em- the policymakers of the country can use it to understand the pirical quantile mapping (EQM and EQM ), the datasets of climate variability and change in the study area with reduced three stations (one target and two with higher correlation error level. So, it provides imperative information to take coeﬃcients) were used. &e correlation between a target action on climate change adaptation and mitigation mea- station and the surrounding stations is more important than sures. &e rest part of this paper is organized into four proximity (physical distance) of the stations . &en, the sections. Section 2 describes the materials and methods: seven gap-ﬁlling techniques were cross-validated using four study area and data description and methodology for gap- evaluation criteria, and the missing values were estimated ﬁlling techniques and evaluation criteria. Section 3 presents using the best-performed technique. In the case when there the results of the gap-ﬁlling techniques of the missing daily is no data from neighboring stations, the method used by precipitation data in ﬁve meteorological stations. Section 4 Ismail and Ibrahim , using the mean on the same day discusses and interprets the results. Finally, Section 5 con- and month but at diﬀerent years, was used to estimate the cludes the ﬁndings of this study. missing value on that particular date. &e detailed meth- odology is described in the following paragraphs. 2. Materials and Methods 2.1. Study Area and Data Description. Five meteorological 2.2.1. Simple Arithmetic Mean (SAM). It estimates the stations located in two zones (Wolaita and Hadiya) of missing values in the target station from the surrounding southern nations’ nationalities and people’s regional state of stations by simply taking the average of surrounding stations’ Ethiopia were included in this study (see Figure 1). From the data in the same period of missing value . &is is the ﬁve stations, two (Hosana and Shone) are located in Hadiya simplest technique of estimating missing values used when Zone and three stations (Areka, Bele, and Boditi) are located the missing value has less than 10% . It is expressed as in Wolaita Zone. &e ﬁve meteorological stations considered V in this study are suﬃcient and comparable with the four i�1 i (1) V � , stations [46, 47] and six stations  of similar studies. &e n stations are located from 6.92 to 7.57 (latitude) and from 37.5 where V is the estimated value of the missing data, V is the o i to 37.95 (longitude) and in the altitudinal ranges of th value of the same variable at the i nearest station, and n is the 1240–2397 meters above sea level (see Table 1). &e observed number of nearest weather stations considered for averaging. daily rainfall and maximum and minimum temperature data of ﬁve stations for periods (1987–2017) were obtained from the National Meteorological Agency (NMA) of Ethiopia . 2.2.2. Normal Ratio Method (NRM). It considers the cor- &e stations have huge missing values up to 30.2% in relation coeﬃcients between the target station and the sur- daily rainfall, 29.4% in maximum temperature, and 19.4% in rounding stations. It is recommended for ﬁlling in missing minimum temperature (see Table 1). Besides, Bele and values if more than 10% of the data is missing . It gives Shone stations did not have the dataset for maximum and weight to the data of surrounding stations based on their minimum temperature in the study period. So, the two correlation with the target station. It is expressed as follows: stations were not considered in analyses of maximum and W V i�1 i i minimum temperature (see Table 1). &e rainfall datasets of V � , (2) o n W ﬁve stations have a bimodal pattern (two peaks in the year) i�1 even though the months of obtaining peak values slightly where V is the estimated value, W depicts the weight of the o i vary from station to station (see Figure 2, presented in bar th i surrounding weather station, and V is the value of the charts). Two peak values of rainfall were observed in April th same variable at the i station. &e weight of the sur- and August in Areka and Hosana stations, April and July in rounding station is calculated using the following equation: Shone, May and August in Boditi, and May and July in Bele stations. &e study area received an annual rainfall between n − 2 W � r , (3) i 2 1,212 (in Hosana) and 1,561 mm (in Shone). Besides, the 1 − r maximum temperature has a bimodal distribution pattern th (see Figure 2: presented in lines). &e mean monthly where W is the weight of the i station, r corresponds to the th maximum temperature varies between 19.34 C (in Hosana) correlation coeﬃcient between the target station and the i and 29.5 C (in Areka) (see Table 1). &e minimum tem- surrounding station, and n is the number of points used to perature of the area has a continuously decreasing trend calculate the correlation coeﬃcient. 4 Advances in Meteorology 240000 280000 320000 360000 400000 440000 850000 850000 820000 820000 790000 790000 760000 760000 730000 730000 Scale: 1:1,200,000 240000 280000 320000 360000 400000 440000 Stations National Selected Zones Selected Zones in SNNPRS Figure 1: Location map of ﬁve meteorological stations in Wolaita Zone and the surroundings. SNNPRS is southern nations’ nationalities’ and peoples’ regional state of Ethiopia. Table 1: Description of ﬁve meteorological stations in Wolaita Zone and the surroundings. Stations Geographic information Areka Bele Boditi Hosana Shone Latitude (degree) 7.07 6.92 6.95 7.57 7.13 Longitude (degree) 37.7 37.53 37.96 37.85 37.95 Altitude (meter) 1804 1240 2043 2307 1959 Rainfall Data coverage 1988/2017 1987–2018 1987–2018 1987–2018 1987–2018 Missing data (%) 30.2 21.5 4.5 2.7 8.4 Monthly total value 1536.84 1250.16 1264.66 1212.34 1560.58 Minimum value 29.54 28.74 34.72 11.92 37.06 Maximum value 209.1 174.44 182.08 165.66 235.94 Mean value 128.07 104.18 105.4 101.03 130.05 Standard deviation 68.7 61.13 53.94 55.1 68.63 Maximum temperature Data coverage 1992–2017 NA 1987–2018 1987–2018 NA Missing data (%) 29.4 NA 4.6 14.8 NA Minimum value 22.2 NA 21 19.34 NA Maximum value 29.5 NA 28.4 25.6 NA Advances in Meteorology 5 Table 1: Continued. Stations Geographic information Areka Bele Boditi Hosana Shone Mean value 25.7 NA 25 22.7 NA Standard deviation 2.5 NA 2.5 2.14 NA Minimum temperature Data coverage 1992–2017 NA 1987–2018 1987–2018 NA Missing data (%) 19.36 NA 4.7 12.5 NA Minimum value 12.95 NA 11.99 8.65 NA Maximum value 15.44 NA 14.29 12.03 NA Mean value 14.14 NA 13.28 10.81 NA Standard deviation 0.72 NA 0.75 0.95 NA NA is data not available. 250.0 200.0 150.0 100.0 50.0 0.0 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec RF_Areka RF_Shone Tmin_Areka RF_Hosana Tmax_Areka Tmin_Boditi RF_Bele Tmax_Boditi Tmin_Hosana RF_Boditi Tmax_Hosana Figure 2: Mean monthly annual rainfall and temperature (maximum and minimum) patterns in Wolaita Zone and the surroundings. RF is ° ° the total rainfall (mm), Tmax is the mean maximum temperature ( C), and Tmin is the mean minimum temperature ( C). 2.2.3. Inverse Distance Weighting (IDW). It is the most where V is the estimate obtained for missing value, V is the o i th th commonly used technique for estimating the missing values observed value at the i station, d is the i surrounding of daily rainfall . It assumes that the closer the sur- station distance, and n is the number of stations used. &e rounding stations to the target station, the better the esti- distance between the target station and the surrounding mation of missing values and the lower the error in stations is calculated using the Pythagoras formula. estimation or the better the accuracy. It is calculated using ������������������� 2 2 (5) the following equation d � X − X + Y − Y , i a i a i V /d i�1 i i where d is the distance between the target station and the V � , (4) i o n th 1/d i�1 surrounding i station, X and X are the longitudes, and Y a i a Rainfall (mm) and Temperature (°C) 6 Advances in Meteorology th th and Y are the latitudes of the target and the i surrounding where Qm (t) is the t estimated daily data at the target −1 stations, respectively. &en, the values in degrees are mul- station, Fo is the inverse cumulative distribution function tiplied by 111 to convert them to kilometers. (CDF) of the available data at the target station, Qs (t) is the th t daily data at the neighboring station, and Fs is the CDF of the daily data at the neighboring station. 2.2.4. Correlation Coeﬃcient Weighing (CCW). In this ap- proach, distance is replaced by Pearson’s correlation coef- ﬁcients . It assures that the datasets of surrounding 2.2.7. Empirical Quantile Mapping Plus (EQM ). In this stations having a better positive correlation with that of the study, we used the name “empirical quantile mapping plus target station give better estimates of missing values in target (EQM )” to refer to the empirical quantile mapping ap- stations than that of less correlated ones. &us, Pearson’s plied to the outputs (values estimated by all six tech- correlation coeﬃcients between rainfall data of ﬁve mete- niques). &e study  obtained a better result (reduced orological stations were analyzed, and the missing values in mean absolute error) after applying quantile mapping on target stations were determined using the following the outputs generated by other techniques. &us, we ap- equation: plied it to evaluate its performance on the outputs obtained by other techniques. First, the data matrix was made be- r V i�1 i i tween the observed, an average of observed data, and the V � , (6) o n r i�1 outputs of SAM, NRM, CCW, MLR, IDW, and EQM. &en, the data matrix was refed to R-software for the where V is the missing value of the target station, r is the th empirical quantile mapping process. Finally, the output correlation coeﬃcient of the i surrounding station, and V is th was subjected to cross-validation analysis against preset the value of the same variable in the i surrounding station. criteria. 2.2.5. Multiple Linear Regression (MLR). It was carried out by considering the linear signiﬁcant relationship between 2.3. Performance Evaluation of Gap-Filling Techniques. the observed values of the target station and the surrounding Cross-validation was assessed by comparing the observed data and the data estimated by diﬀerent gap-ﬁlling tech- stations . &e dataset of the target station was considered as a dependent variable and the surrounding stations’ niques. It was used to evaluate the quality (performance) of diﬀerent gap-ﬁlling techniques based on four commonly datasets were considered as independent variables. Ac- cordingly, the multiple linear regressions were carried out used statistical validation (evaluation) criteria. &e con- sidered evaluation criteria are mean absolute error (MAE), for ﬁve stations. &en, the missing values in the target station were ﬁlled using the intercept and the coeﬃcients of the root mean square error (RMSE), skill score (SS), and variables were expressed as follows: Pearson’s correlation coeﬃcients (R). Similar evaluation criteria were considered in identical studies conducted by [60–62]. &e best-performing technique was selected based V � a +