Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Histogram-based weighted median filtering used for noise reduction of digital elevation model data

Histogram-based weighted median filtering used for noise reduction of digital elevation model data A new histogram-based robust filter developed for noise reduction of digital elevation model data is presented. When large percentage of data points in data matrices are contam- inated with outlier noise, the noise reduction process can give better results than traditional median filtering, if elements with a potentially higher chance of being noise are eliminated by weighting from the input dataset before the median value is calculated. However, on the same matrices, there are likely to be subsets of data where unfiltered input is more reason- able for the calculation. The new method implementing weighting between these two cases is presented below, with its initial tuning and a comparison with both standard median filtering and the Most Frequent Value (MFV) method, as the latter being much more effi- cient than the usual methods. Following the description of the procedures, their effective- ness is compared for noise reduction in digital elevation model data systems, at various noise levels. The comparison is done mainly by three measures, with most of the focus on the L norm data distance results. Finally, a modified version of the method—which includes Steiner’s MFV filter as a core part—is also introduced, with similar examination. The method to be presented has been shown to be superior to conventional median filtering for most noise rates, and in many cases also to Steiner’ MFV, for handling non-zero mean noises. The  modified version of the method—with the help of Steiner’s MFV—has also achieved this in handling zero mean noise, in the field of application described in the paper. Keywords Noise reduction · Median filter · Digital elevation model · Weighted median 1 Introduction Due to its robustness, the Most Frequent Value (MFV) method (Steiner 1991) can be well applied in the processing of noisy datasets in geophysical (Dobróka et  al. 1991; Szabó and Balogh 2016; Szabó and Balogh 2018), hydrogeological research (Szűcs and Zákányi 2007) and various other fields (Zhang 2017). Such data systems, especially those with extreme noise, are also present in the field of spatial informatics, including digital elevation * Roland Kilik gfkr@uni-miskolc.hu Department of Geophysics, University of Miskolc, Miskolc-Egyetemváros 3515, Hungary 1 3 Vol.:(0123456789) 744 Acta Geodaetica et Geophysica (2021) 56:743–764 modelling and satellite transmission. A similarly widely used, but less robust and sophisti- cated technique is the median filtering method (Stone 1995; Huang et al. 1979). In this paper, a new median filtering method, improved by histogram operations and weighted averaging is presented, and compared to the original median filter and the MFV based procedure. The method’s modified and also presented version – which aims to elimi- nate zero mean noises – contains Steiner’s MFV filter as a core part. The aim of the proposed method’s both version is to eliminate scattered noise from digi- tal elevation models in a moving windowed manner (i.e. the procedure corrects the central element of the actual window). This study was also conducted for noise exposure at four different percentages of data points. 2 Input dataset The analysed data consisted of three 25  m spatial resolution digital elevation models of different areas, created using Topo to Raster interpolation in ArcGIS software by digitizing the contour lines, elevation points and water network of 1:10,000 scale EOTR map sheets. The histograms of the three datasets can be seen in Fig. 1. The mean of the data in the first dataset is 211.99, while the standard deviation is 8.440. The same statistical data for the second dataset is 191.31 mean and 8.752 standard deviation, while value of the mean is 96.39 and the standard deviation is 0.457 in the third data system. For each of the resulting digital elevation model data systems, firstly normally distrib- uted noise added to the data matrices, with a standard deviation to have the average noise amplitude at around 1% of the mean of the data matrix. After this—as outlier non-zero mean impulse noise—additional noise was added to 10, 15, 20, and 25 percent of the points randomly. In order to achieve this, a normally distributed noise vector is generated for every row with the mean equal of the mean of the given data row, and standard devia- tion to have the average noise amplitude around 100% of the mean of the data row. Then, the elements of the noise vector generated for a given row of data were ran- domly scattered across the row with a multiplier between 0.1 and 0.7,  giving an addi- tional ~ 10–70% noise to the data in the different test cases (referred as 0.1–0.7 noise ampli- tude below). 3 Introduction of the weighted median (WM) method The method is used to produce the corrected value of the central element at each position of the moving window going through the image matrices using a weighted mean. The fol- lowing version of the presented method is mainly for eliminating non-zero mean noises (due to, for example, measurement device problem or long distance data transfer). The weighted mean is calculated for each data point, with two weights (w , w ) defined 1 2 below. In order to achieve this, two independent window narrowing process occurs as an initial step, before calculating the weights. These narrowed windows are created from the actual (5 × 5) data window, at every window position. As a test, a window size of 9 (3 × 3) and 49 (7 × 7) was also applied, but this did not prove to be optimal for solving the problem. The process of the first window narrowing is the following. The range of the values of the elements in the moving window is divided into two and three bins with equal range widths, and then two ratios are generated: 1 3 Acta Geodaetica et Geophysica (2021) 56:743–764 745 1 3 Fig. 1 Histograms of input data matrices 746 Acta Geodaetica et Geophysica (2021) 56:743–764 • λ the element count ratio between the larger and the smaller element count domain out of the two, • λ ratio of the largest and second largest domains out of 3 (regarding the element count again). If  > , the new set ( D ) is defined as the most element count domain of the 2 1 2 domains, otherwise as the most element count domain of the 3 domains. Then the value of m defined as m = median(D). Thus, we used higher  value as an indication of a sharper cut. The ideal number of 2 and 3 as the number for ranges chosen because in case of splitting into 4 (or more), a bin would possibly not contain a sufficient number of elements from the initial 5 × 5 window, in order to do the further steps described. Then another (independent) second window narrowing process occurs, for determin- ing the value of parameter m . In order to achieve this, the original moving window’s e1 elements are sorted by value, then divided into two and three equal width ranges (based on the set of values). For example, in the case of splitting into three, if the ordered vector is v, and max(v) is its highest value element, and min(v) is the lowest, then in the third with the lowest values will be the values lower than min(v) +(max(v) − min(v))∕3. Here we calculate ratio  from the case when splitting into two. The value of  is 3 3 1∕n , where n is the sum of element count of the two bins without the highest valued bin (i.e. the bin with the lower values). Then we calculate  ratio from the case when split- ting into three bins. The value of this ratio is 1∕m , where m is the sum of the element count of the three bins without the highest valued bin. If  > : we take the highest valued bin from the two (the bin with the highest val- 3 4 ues), otherwise we take the highest valued bin from the three as the chosen set (E). Finally, m will be the average of the elements of the chosen new set. e1 A similar value, m is determined with the same method, however by splitting the e2 original window into 3 and 5 parts (instead of 2 and 3). With this step, a higher division number is reflected in the result, if the current window’s value set allows it (i.e. if the new intervals’ element count is not zero). As m and m are both calculated on chosen subsets of the largest values from the e1 e2 value set of the window, both m and m are related to maximums. Important differ - e1 e2 ence between m and m that in m , m the mean is calculated while in m the median is e s e1 e2 s calculated, with different bin numbers. In addition, in m and m , the narrowest set has e1 e2 the largest valued elements, and in m , the set contains the largest number of elements (which may not necessarily contain the largest values). Now we can calculate the first weight ( w ) of the current point’s weight vector as w =(m ∕m )∗ , (1) 1 s e1 where  is a scaling factor that ensures that the value obtained by w falls within the same range of values as w , described below. Its value (  = 1∕3 ) is determined on experimental basis to fulfill this purpose. We can also define m as the median of the values in the original moving window. Using the values described above, three sub-weights are produced as follows (all will have a role in determining the value of w weight). 1 3 Acta Geodaetica et Geophysica (2021) 56:743–764 747 w = ∗ m − max + 1, a e1 1 (2) max where max : max m , m , 1 w e1 m = m − m , (3) as e1 e2 w = ∗ m − max + 1, (4) p as 2 max where max ∶ max m , m , m , 2 w e1 e2 w = ∗ m − max + 1, p2 as 3 (5) max where max ∶ max m , mean(window) . 3 w The calculation of the max value applied in a given sub-weight ( w ,w , w ), shall in a p p2 all cases include the median of the window, as well as the mean ( m , m ) or median ( m ) e1 e2 s value of the narrowed window. In Eqs. 4 and 5, fewer elements are omitted from the original window (because of con- taining m via m ) , so that the sub-weights w , and w both calculated with a smaller e2 as p p2 multiplier than in Eq. 2. The values of these  and  constants are chosen as 0.5 (the adjust- ment procedure’s results can be in seen Sect. 6). Both w and w weights have a corrective role. Their value can be high if there is a p p2 large difference between the averages of the subsets obtained by splitting the weights into 3 and 5 ( m and m ). As can be seen in Fig.  2, a large difference between the values of e1 e2 m and m results in a large L norm error. Thus, a large difference indicates that the histo- e1 e2 1 gram operations in the current window position may distort the result, so increasing value Fig. 2 L1 norm and m value relation as 1 3 748 Acta Geodaetica et Geophysica (2021) 56:743–764 of the difference increases the weight of w (i.e. the conventional median without histo- gram operations). The subweight w can take a high value other than 1, if the median value of the original, unconstrained window m is greater than the average of the elements of the constrained window ( m ). Because the narrowed window contains the largest values of the subsets, if e1 the difference between the median of the original window and the mean of the elements of this narrowed window is outstanding, it indicates that histogram operations at the cur- rent window position distorting the result. As in the previous, a large positive difference between m and max results in a large L norm error. e1 1 1 Therefore, this should be reflected in the final weight vector  in the form of either a reduction in the value of w or an increase in the value of w (i.e. an increase in the weight 1 2 of the result of the traditional median method). The latter is achieved with the usage of w in weight w . Since w is the most important of the three correction factors ( w , w , w ), 2 a a b c its square is included in the formula w . The higher effect was achieved by squaring the weight, because the maximum value of the weight before adding one to its value is 1, so squaring the weight will not result in an extreme weight value, even at its maximum. The + 1 in the formulas for the partial weights w and w is included because they all p p2 include a maximum value subtraction, which in most cases results in a negative value, so the constant provides a shift into the positive range. In the formula of w , the role of adding 1 is shifting its minimum to greater than one (in order to be able to increase w weight in its squared value). Finally, the following two weights ( w , and w ) are produced using w and w b c p p2 respectively. w = 1 + , (6) where m is the median of the values in the original moving window. Since the maximum value of the partial weight w is not a function of the different narrowed windows, but of the median or average value of the original window, this partial weight is taken with a smaller constant: p2 w = 0.5 − . (7) With the components defined above, the weight w takes the following form w = w ∗ w ∗ w . (8) 2 b c At this point, we know the  weight vector of the current data point = w w . (9) 1 2 In weight vector  (Eq. 9), weights w and w have an effect on the median of the cur - 1 2 rent data window ( m ) on the one hand, and on the median of the reduced set of the same window ( m ) on the other hand, w weighting the latter, and w weighting the former as fol- s 1 2 lows (for example, on the k-th element of the data matrix): res =(w ∗ m + w ∗ m )∕(w + w ). (10) WMk 1 s 2 w 1 2 As described above, the median of a narrowed window ( m ) , and the original win- dow’s median ( m ) is weighted at every window position for the final result of the 1 3 Acta Geodaetica et Geophysica (2021) 56:743–764 749 actual point. Weight of m is (m /m )*α (i.e., the median is divided by the average of s s e1 the maximal values of narrowed window). If this ratio is for example low for the given moving window position due to the noise (high m average of maximums), then m ’s e1 s weight should be proportionally low, because otherwise, the high value of the outlier maximums would have negative effect on the final result. In such cases m ’s weight will be proportionally high – not only because of the low weight of m , but due to the fact that m is weighted by w ,w , w , all containing m or m values. w a b c e1 e2 4 Most Frequent Value Method A much more reliable statistical characteristic than the arithmetic mean, the weighted mean, is obtained by assigning a small weight ( w ) to points far from the majority of the data ( X ) and a larger weight ( w ) to points in the highest data density location (Eq. 11). k k −1 N N M = X w w (k = 1, 2, … , N). (11) k k k k=1 k=1 The k-th weight is chosen by Steiner (Steiner 1991) as follows: 2 2 w =  ∕  + X − M . (12) k k In the above, N is the number of data and ε is the dihesion, a scalar parameter. If ε is large, then all data are given nearly equal weights and outliers will spoil the value esti- mate, and if ε is too small, care must be taken to avoid ignoring some data. The weighted mean, called the most frequent value (M) defined by Eq. (11), should be known in advance in order to assign weights with maximum values at its location and smaller and smaller weights away from it. Therefore, this procedure requires an iterative algorithm in which M and ε are determined jointly. In the first iteration step, the dihe- sion can be estimated from the sample space using the following formula: ε = max X − min X , (13) 1 k k while for the M the initial value is preferably chosen as the sample mean or median. In this study, the median value was used. In subsequent iteration steps, M and ε can be derived from each other according to the following procedure: Table 1 Example of adjusting constant value used for calculating w 2 1.5 1 0.5 0.25 0.125 0.0625 0.03125 L norm distance 843.12 822.32 801.63 762.82 794.51 956.47 1720.47 2393.27 1 3 750 Acta Geodaetica et Geophysica (2021) 56:743–764 X −M N ( ) 2 k j 3 � � j+1 k=1 2 2 k k=1 2 +(X −M ) k j  +(X −M ) j k j j+1 (14) = ↔ M = . ∑ j+1 j+1 2 N 1 j+1 � � k=1 2 2 2 k=1 2 +(X −M )  +(X −M ) k j k j j j+1 5 Description of the resulting numerical values For comparing the results of the different filtering methods, the following metrics are used in the paper. Calculation of the RMSE (Root Mean Square Error) value for the MFV method and median filtering (where inp is the noise-free data matrix): � N � (res − inp ) Sti i (15) i=1 RMSE = , St where res : matrix corrected by Steiner’s MFV, St � N � (res − inp ) Medi i (16) i=1 RMSE = , Med where res : matrix corrected by median method, Med � N � (res − inp ) WMi i (17) i=1 RMSE = , WM where res : matrix corrected by weighted median. WM Deviation regarding the three procedures: Std = std res − inp , (18) St St Std = std res − inp , (19) Med Med Std = std res − inp , (20) WM WM where res , res , res inp: as above. St Med WM L norm: L =∥ res − inp ∥ , 1St St 1 (21) L =∥ res − inp ∥ , (22) 1Med Med 1 1 3 Acta Geodaetica et Geophysica (2021) 56:743–764 751 L =∥ res − inp ∥ , (23) 1WM WM 1 where res , res , res , inp: as above. St Med WM 6 Adjusting constants Regarding the 0.5 constant value of w , it was also tested in some randomly chosen test cases, that how the L norm distance value between the noise-free matrix, and the weighted median- corrected matrix changes with the different values of the constant. The value of the norm monotonously decreased in every such cases, as can be seen as an example in Table 1 (10% noise rate, 0.3 noise amplitude): Similarly, the 0.5 constant of w was also tested, and it produced similar results, as can be p2 seen in the example of Table 2 (for the same parameters as in the previous case): All of the constants are examined with a few chosen values, however, the global optimisa- tion of them is not part of this paper. 7 Comparative results In all cases, a 5 × 5 window ran through the matrices, and both the Steiner-MFV method (using as a filter – Dobróka 2021) and the weighted median method described here always corrected the value of the window’s central element, with all elements of the window as input. In all cases, the number of iterations for the Steiner filter was 20 and the initial weight was the median of the window elements. In all cases the results were also compared with those obtained using the classical median filter in MATLAB software. The median filter ran on the same noisy data matrices with the same window size as the Steiner method and the weighted median method. The example shown in Table 3 shows the values of the result metrics on the first data series, with noise on 15% of the data points, with a noise amplitude multiplier of 0.3 Another example can be seen in Fig. 3, again noise on 15% of data points, now with a noise amplitude multiplier of 0.5. Figure 3. (a) shows the original data, (b) the noisy data, (c) is the result of Steiner method, (d) is the result of the weighted median method and (e) is the result of the classical median method. Table  4 shows the values of the L norms showing the distance from the noise-free input data matrix and their ratios, using the weighted median method ( L ) and the Steiner 1WM method ( L ), with 25% of the points contaminated with noise, as a function of different noise 1St amplitudes (0.1,…,0.7) on the first data set. The values show that in two cases the Steiner method gives better results, by about 6%, and in the other cases the weighted median method proved to be better. The latter gives better results by 6.3% on average (since the average of the L ∕L ratios is 0.937). 1WM 1St Table 2 Example of adjusting constant used for calculating w p2 2 1.5 1 0.5 0.25 0.125 0.0625 0.03125 L norm distance 735.13 721.78 698.67 654.31 668.807 668.94 676.11 685.71 1 3 752 Acta Geodaetica et Geophysica (2021) 56:743–764 Table 3 Example of comparative L 538.98 1WM results (10% additional outlier noise on 15% of the data points) L 630.41 1St L 711.19 1Med RMSE 1.42 WM RMSE 2.02 St RMSE 2.13 Med Std 1.19 WM Std 1.75 St Std 1.58 Med Table  5 shows the results of the weighted median procedure and the standard median filtering at the same noise level. In this case, the weighted median procedure proved to be better on the data set by 26.4% on average. Tables  6 and 7 show the same comparison as before, regarding the L norm for noise affecting 20% of the points. In this case, there are noise amplitude values where the Steiner method gives a smaller distance to the noise-free matrix than the weighted median pro- cedure, but in none of the cases the standard median procedure could achieve this. The weighted method is on average 23% better than the latter. Tables 8 and 9 show the case with 15% noisy points. The results of Table 8 comparing the Steiner method with the weighted median proce- dure show that for a noise amplitude multiplier of 0.1, the Steiner method is superior. The weighted median procedure outperforms the unweighted median procedure by an average of 21.2% on the first data set for 15% noisy points (Table  9). Tables  10 and 11 show the data distances of the three procedures for the case where 10% of data is contami- nated by noise according to the L norm. In this case (with noise on 10% of the points), the Steiner method outperforms the weighted median method in most of the different noise amplitudes—in 4 cases, with an average of 5.6% in the four cases, and in the remaining three cases the WM method outper- forms on the data set, by 13%. Table 12 shows the RMSE values obtained by the three procedures and their ratios, for a noise exposure of 25% on the left and 20% on the right, for different noise amplitude multi- pliers (0.1,…,0.7) in both cases. Table 13 shows the RMSE values as before, now for noise at 15% and 10% of the data points. It can be seen that the weighted median method performs worse for the highest noise amplitude multipliers (0.7), but this is also true for the other methods, so the ratio to the other methods does not deteriorate. The RMSE obtained by the MFV method is closest to that obtained by the weighted median method for the smallest noise amplitude multi- plier (0.1). The RMSE value for the weighted median method is on average 79.5% of that obtained by the conventional median method. As the weighted median procedure was found to be the least efficient when 10% of the data points were contaminated with noise in the above tests, the standard deviations were also examined in this case. An example of this compared to the standard median method is shown in Table 14. Table  15 shows the average of the results for the first and second data sets. The mini- mum, maximum and average of the data distance ratios for the L norm as a function of the 1 3 Acta Geodaetica et Geophysica (2021) 56:743–764 753 1 3 Fig3 Visual example of results on the first dataset 754 Acta Geodaetica et Geophysica (2021) 56:743–764 Table 4 L1 norm values at 25% noise ratio Noise amp. mult 0.1 0.2 0.3 0.4 0.5 0.6 0.7 L 540.22 425.44 420.39 471.26 615.92 787.36 515.82 1WM L 597.90 457.46 536.20 541.64 580.61 738.81 545.17 1St L L 0.90 0.92 0.784 0.87 1.06 1.06 0.94 1WM/ 1St Table 5 L1 norm values at 25% noise ratio Noise amp. mult 0.1 0.2 0.3 0.4 0.5 0.6 0.7 L 540.22 425.44 420.39 471.26 615.92 787.36 515.82 1WM L 604.39 582.33 725.92 692.10 827.83 970.98 723.63 1Med L L 0.89 0.73 0.57 0.68 0.74 0.81 0.71 1WM/ 1Med Table 6 L1 norm values at 20% noise ratio Noise amp. mult 0.1 0.2 0.3 0.4 0.5 0.6 0.7 L 566.65 437.97 440.14 485.61 708.77 646.18 759.37 1WM L 545.34 468.96 469.76 608.38 1002.99 866.07 841.60 1St L L 1.03 0.93 0.93 0.79 0.70 0.74 0.90 1WM/ 1St Table 7 L1 norm values at 20% noise ratio Noise amp. mult 0.1 0.2 0.3 0.4 0.5 0.6 0.7 L 566.65 437.97 440.14 485.61 708.77 646.18 759.37 1WM L 612.87 627.87 612.76 741.06 901.39 792.62 979.72 1Med L L 0.92 0.69 0.71 0.65 0.78 0.81 0.77 1WM/ 1Med Table 8 L1 norm values at 15% noise ratio Noise amp. mult 0.1 0.2 0.3 0.4 0.5 0.6 0.7 L 542.97 433.71 538.98 464.25 493.81 675.11 624.08 1WM L 519.85 473.9043 630.41 490.20 561.66 732.78 644.35 1st L L 1.04 0.91 0.85 0.94 0.87 0.92 0.96 1WM/ 1St different noise levels can be seen for the Steiner method and the weighted median method. The mean of the minima (i.e. the mean of the cases where the largest difference is in favour of the weighted median method) is 0.82, i.e. in these cases the method is 18% better. The average of the maxima is 1.032, i.e. in the opposite case the Steiner method is on average 3.2% better for the two data systems combined. 1 3 Acta Geodaetica et Geophysica (2021) 56:743–764 755 Table 9 L1 norm values at 15% noise ratio Noise amp. mult 0.1 0.2 0.3 0.4 0.5 0.6 0.7 L 542.97 433.71 538.99 464.25 493.81 675.11 624.08 1WM L 604.28 614.34 711.192 639.07 672.32 846.64 695.19 1Med L L 0.89 0.7 0.75 0.72 0.73 0.79 0.89 1WM/ 1Med Table 10 L1 norm values at 10% noise ratio Noise amp. mult 0.1 0.2 0.3 0.4 0.5 0.6 0.7 L 521.53 483.07 543.86 484.36 779.51 727.48 724.2 1WM L 507.13 482.52 542.96 527.25 652.87 939.2 790.58 1St L L 1.02 1 1 0.91 1.19 0.77 0.91 1WM/ 1St Table 11 L1 norm values at 10% noise ratio Noise amp. mult 0.1 0.2 0.3 0.4 0.5 0.6 0.7 L 521.53 483.07 543.86 484.36 779.51 727.48 724.2 1WM L 605.38 637.25 746.44 645.17 871.46 1038.26 865.18 1Med L L 0.86 0.75 0.72 0.75 0.89 0.7 0.83 1WM/ 1Med Table  16 shows the L norm results grouped according to the same method, in this case for the two median procedures. The average of the minima is 0.692, so in the cases where the weighted median procedure is the best, this method is better by more than 30%. The average of the maxima is 0.86, so that even in the worst cases the weighted median procedure is on average 14% better than the standard median filtering for the two data sets combined. A less detailed comparison was carried out on the third set of data (examining only the L norm ratios). This study showed similar characteristics as the previous ones, however the Steiner method was found to be the best of the three in more cases than before. For 10% and 15% noisy data points (both with 7 different noise amplitudes, as before), the Steiner method was found to be superior to the weighted median method in 11 out of 14 cases, with an aver- age of 14.6% (regarding L norm ratios). In the 20% and 25% noisy point cases, the weighted median method gave a better result, in 14 out of 14 cases, with an average of 19.04%. Comparing the two median methods on the data set (again with L norm ratios), in 27 out of 28 cases, the weighted median method proved to be better, with an average of 14.32%. 8 Handling zero mean noises The previously presented version of the proposed method developed for mainly non-zero mean noises. In the following, a second, modified version of the method is introduced, what purpose is mainly handling zero-mean normal distribution noises. This version of the 1 3 756 Acta Geodaetica et Geophysica (2021) 56:743–764 1 3 Table 12 RMSE values at 25% and 20% noise ratio Noise ratio 25% 20% Noise amp. mult 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.1 0.2 0.3 0.4 0.5 0.6 0.7 RMSE 1.54 1.47 1.55 2.04 2.50 2.73 2.25 1.58 1.51 1.71 1.63 2.14 2.52 2.74 WM RMSE 1.79 1.95 2.16 2.50 2.84 3.07 2.64 1.84 2.03 2.30 2.17 2.58 2.95 3.10 Med RMSE 1.60 1.74 2.07 2.73 3.29 3.79 3.76 1.63 1.82 2.21 2.50 3.09 3.61 4.24 St RMSE /RMSE 0.86 0.75 0.72 0.82 0.88 0.89 0.85 0.86 0.75 0.74 0.75 0.83 0.85 0.88 WM Med RMSE /RMSE 0.96 0.84 0.75 0.75 0.76 0.72 0.60 0.97 0.83 0.77 0.65 0.69 0.70 0.65 WM St Acta Geodaetica et Geophysica (2021) 56:743–764 757 1 3 Table 13 RMSE values at 15% and 10% noise ratio Noise ratio 15% 10% Noise amp. mult 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.1 0.2 0.3 0.4 0.5 0.6 0.7 RMSE 1.59 1.55 1.42 1.70 1.83 2.34 2.55 1.60 1.46 1.75 1.50 2.12 2.02 2.72 WM RMSE 1.83 2.04 2.13 2.22 2.36 2.74 2.92 1.86 2.00 2.29 2.10 2.56 2.60 3.08 Med RMSE 1.62 1.83 2.02 2.55 2.97 3.72 3.81 1.66 1.84 2.13 2.36 3.01 3.32 4.23 St RMSE /RMSE 0.87 0.76 0.67 0.77 0.77 0.85 0.87 0.86 0.73 0.77 0.72 0.83 0.78 0.88 WM med RMSE /RMSE 0.98 0.85 0.70 0.67 0.61 0.63 0.67 0.96 0.80 0.82 0.64 0.71 0.61 0.64 WM St 758 Acta Geodaetica et Geophysica (2021) 56:743–764 Table 14 Standard deviation Noise amp. mult 0.1 0.2 0.3 0.4 0.5 0.6 0.7 values at 10% noise regarding the two median methods Std 1.41 1.25 1.37 1.20 1.43 1.37 1.62 WM Std 1.42 1.51 1.70 1.56 1.76 1.74 1.96 Med Std Std 0.99 0.82 0.80 0.77 0.81 0.79 0.83 WM / Med Table 15 L1 norm ratios for two data sets combined Noise exposure percent- min L ∕L max L ∕L avg L ∕L 1WM 1St 1WM 1St 1WM 1St age 25% 0.81 1.07 0.93 20% 0.81 0.99 0.91 15% 0.83 0.98 0.90 10% 0.80 1.07 0.93 Table 16 L1 norm ratios for two Noise expo- min L ∕L max L ∕L avg L ∕L 1WM 1Med 1WM 1Med 1WM 1Med data sets combined sure percent- age 25% 0.63 0.86 0.74 20% 0.67 0.86 0.77 15% 0.73 0.84 0.78 10% 0.72 0.85 0.77 method calls upon and uses Steiner’s MFV values to correct the actual central element of the given data window. 8.1 Noise generation Regarding the noise generation process, in the first step, a general zero mean noise was added to the data matrix. In order to achieve this, a normal distribution noise generated for every row of data, with the mean value equal to the mean of zero, and standard deviation 1. In order to add outlier noise to given percentage of points for the examination, addi- tional zero mean normal distribution noise is added randomly to 20, 15, 10, and 5 percent of the points, with 0.1–0.7 amplitude multiplier in all of such cases (as in the previously introduced version of the method). The noise’s standard deviation was always the mean of the current data row. 8.2 Modified version of the weighted median method As with the method’s previously introduced version, the first steps are histogram filterings. Firstly a median, then and a mean value is generated from filtered windows, however here with a two-step filtering for both. For producing the value of m median value, the histogram-based filtering is the following. 1 3 Acta Geodaetica et Geophysica (2021) 56:743–764 759 Based on the set of values, the elements of the current data window are divided into two and three ranges (bins) with equal range widths, and then two ratios are generated: : the ratio of the largest and second largest domains out of 2, •  : ratio of the largest and second largest domains out of 3 (in both cases regarding the element count). If  >  , the new set ( D ) will have the most element count domain of the 2 domains, 1 2 otherwise it will have the most element count domain of the 3 domains. Finally, the value of m is: median(D). For determining the value of m , a second narrowing process is done. In this pro- e3 cess, the window elements are first sorted by value and then divided into three equal width ranges based on the set of values. Here we calculate  ratio, as the ratio between the element count of the window and the sum of the element count of the thirds without the highest valued third. Then we calculate  , whose value will be the ratio between the element count of the most and the second most element count bins. Thus,  and  are 3 4 calculated differently than in the previous version of the method. If  >  : we take the highest third (the bin with the highest values), otherwise take 3 4 the bin with the most element count as the chosen set (E). Thus, we used higher  value as an indication of a sharper cut. We take this truncated E set, and distribute its values into bins. The bin width is determined with Scott’s rule (Scott 1979, 1992): 1∕3 3.5 ∗ std(E)∕numel E . (24) We must also determine the number of bins, in order to be able to distribute all the values into them (what is a trivial step because of having the bin widths and the data values). We take the bin with the largest element count, and m is the average of the e3 elements of the chosen bin. Since we have the values of m and m , we can replace the actual window’s middle s e3 element with m (forming w ), and similarly with m (constructing w ). Finally, let s r_ms e3 r_me the window with the MFV method’s result at its centre w . r_St In the next step, we concatenate w , w and w one by one with the original r_ms r_me r_St (noisy) actual window, forming w , w , w . u1 u2 u3 Now we can calculate three gradient measures in the following way: w w w w u u u u G(x, y) = ∗ + ∗ , (25) x x y y g = G (x, y). (26) X ∗ Y x∈X y∈Y In the formula, g is the result of g when using w , g is the value of g in case of 1 repl_St 2 w , and g is the value of g when using w . repl_ms 3 repl_me If min(g1, g2, g3) is g , then in the actual  = w w weight vector, the value of w is 1 1 2 1 0, and the value of w is 1. Thus, in this case only the result of the MFV method counts in the given data window’s correction. If min(g1, g2, g3) is g , t hen w is 0.15 and w is 2 1 2 1 3 760 Acta Geodaetica et Geophysica (2021) 56:743–764 1 3 Table 17 L1 norm ratios on first data set 25% Noise exposure 20% Noise exposure L L 0.991 0.987 0.996 1.004 0.991 0.997 0.993 0.978 1.005 0.998 0.975 0.990 0.994 0.997 1WM/ 1St L L 0.881 0.847 0.781 0.798 0.777 0.704 0.760 0.903 0.846 0.822 0.795 0.818 0.802 0.807 1WM/ 1Med 15% Noise exposure 10% Noise exposure L 0.987 0.982 0.993 0.978 0.991 0.988 0.978 0.979 0.975 0.979 0.972 0.983 0.982 0.970 1WM/ 1St L 0.917 0.879 0.816 0.835 0.861 0.844 0.840 0.922 0.903 0.813 0.833 0.870 0.852 0.878 1WM/ 1Med Acta Geodaetica et Geophysica (2021) 56:743–764 761 1 3 Table 18 L1 norm ratios on second data set 25% Noise exposure 20% Noise exposure L 0.987 0.979 0.994 0.989 0.968 0.991 0.987 0.979 0.994 0.990 0.982 1.000 0.987 1.001 1WM/ 1St L 0.862 0.840 0.897 0.755 0.786 0.775 0.706 0.889 0.843 0.858 0.768 0.793 0.815 0.748 1WM/ 1Med 15% Noise exposure 10% Noise exposure L 0.995 0.993 0.983 0.987 0.993 0.987 1.000 1.005 0.995 0.983 0.979 0.990 0.989 0.994 1WM/ 1St L 0.919 0.891 0.855 0.809 0.799 0.868 0.857 0.913 0.902 0.884 0.871 0.890 0.829 0.860 1WM/ 1Med 762 Acta Geodaetica et Geophysica (2021) 56:743–764 1 3 Table 19 L1 norm ratios on third data set 25% Noise exposure 20% Noise exposure L 1.000 0.994 1.003 0.989 1.001 0.999 0.990 0.995 0.997 0.995 0.995 0.994 0.982 0.991 1WM/ 1St L 0.904 0.852 0.798 0.939 0.876 0.941 0.882 0.900 0.915 0.815 0.836 0.815 0.856 0.858 1WM/ 1Med 15% Noise exposure 10% Noise exposure L 0.995 0.991 0.986 0.984 0.996 0.990 0.998 1.001 0.984 0.991 0.986 1.002 0.994 0.976 1WM/ 1St L 0.943 0.925 0.873 0.924 0.852 0.888 0.901 0.941 0.940 0.957 0.925 0.936 0.931 0.887 1WM/ 1Med Acta Geodaetica et Geophysica (2021) 56:743–764 763 0.85. If min(g1, g2, g3) is g , t hen w is 0.4 and w is 0.6. Thus, in this case, the weight of 3 1 2 the MFV method’s result is 0.6 for the given window. As we can see, in all of these cases, we are weighting the Steiner MFV method’s results, and increasing or decreasing its weight in the correction of the actual data window’s cen- tral element. Similar to the previous version of the method, we get poor results, if the value of m i.e. as m − m is large. In order to be able to handle this, here we have to calculate both m e1 e2 e1 and m (the same way as in the previous version), and if the difference is greater than 2% e2 of the mean of raw data, then w should be 0, and the value of w should be 1. 1 2 8.3 Results of the modified version of WM filtering procedure In Table  17, L norm ratios can be seen for the first data set, comparing WM method’s results with both the MFV’s result and the original median method’s. In the former case, the WM method performed better in 26 test cases out of the 28. In that cases, the average of the L norm ratio was 0.985, thus usage of the method resulted in 1.4% lower L norm 1 1 values on average. In the remaining two cases, the mean of the ratio was 1.005, thus the WM method performed 0.5% worse in those two cases. The best L norm ratio value was 0.97, thus the WM method gave 3% lower L norm value in that particular noise reduction. Regarding the comparison with the original median method, the WM method performed better in all of the 28 cases, by 16.4% on average (0.836 average L norm ratio). Here the best result was a 29.5% improvement (0.705 L1 norm ratio). Table  18 shows results in the same structure as the previous one, here on the second data set. Comparing with Steiner’s MFV, the WM method gave better results according to L norm in 23 cases (by 1.25% on average), and in the remaining 5 cases, performed worse by 0.21% avg. At its best, the WM method gave lower L norm value by 3.12%. In comparison with the other median method, WM performed better in all of the cases (16% avg., 29.4% max.). In Table  19 the third data set’s L norm ratios can be seen. Comparing WM’s results with the MFV method’s, the former performed better in 23 of the 28 cases (by 1% avg., 2.4% max.), and the MFV method was superior in 5 cases (1.4% avg.). Regarding the comparison with conventional median method, WM performed better in all of the cases, by 10.7% avg., 20.2% max. 9 Conclusions The effectiveness of the histogram-based weighted median procedure described above has been demonstrated for noise elimination in digital elevation model data. The method’s main purpose is eliminating outlier noise in data matrices, especially if a high percentage of the matrix points are contaminated with outlier noise. Averaged over the different noise amplitudes and noise exposure percentages investi- gated, the WM method outperformed the standard median filtering procedure on the dif- ferent data sets by 14–23% regarding data distance calculated with L norm for eliminating non-zero mean noises. The version of the method for filtering zero mean noises, performed better by 14.3% on average against the conventional median filter. 1 3 764 Acta Geodaetica et Geophysica (2021) 56:743–764 Beyond general refinement and optimisation of the method, there is room for improve- ment particularly in more effective handling of the low noise exposure cases. Funding Open access funding provided by University of Miskolc. None Data availability Because of the data confidentiality, the experimental data is not published. Code availability Because of the data confidentiality, the code is not published. Declarations Conflict of interest The authors have no conflicts of interest to declare that are relevant to the content of this article. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Com- mons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/. References Dobróka M, Gyulai Á, Ormos T, Csókás J, Dresen L (1991) Joint inversion of seismic and geoelectric data recorded in an underground coal mine. Geophysical Prospecting, 39 (5). ISSN 0016–8025:643–665. https:// doi. org/ 10. 1111/j. 1365- 2478. 1991. tb003 34.x Dobróka T (2021) An MFV-based image processing filter and its application in seismic tomographic images. Acta Geodaetica et Geophysica. https:// doi. org/ 10. 1007/ s40328- 021- 00351-7 Huang TS, Yang GJ, Tang GY (1979) A fast two-dimensional median filtering algorithm. IEEE Trans Acoust Speech Signal Process 27(1):13–18. https:// doi. org/ 10. 1109/ TASSP. 1979. 11631 88 Scott DW (1979) On optimal and data-based histograms. Biometrika 66:605–610 Scott DW (1992) Multivariate density estimation: theory, practice, and visualization. John Wiley & Sons, New York Steiner F (1991) The Most Frequent Value. Introduction to Modern Conception Statistics, Budapest. ISBN 978–9630556873 Stone DC (1995) Application of median filtering to noisy data. Can J Chem 73(10):1573–1581. https:// doi. org/ 10. 1139/ v95- 195 Szabó NP, Balogh GP, Stickel J (2018) Most frequent value-based factor analysis of direct-push logging data. Geophys Prospect. https:// doi. org/ 10. 1111/ 1365- 2478. 12573 Szabó N, Balogh GP (2016) Most frequent value based factor analysis of engineering geophysical sounding logs. 78th EAGE Conference and Exhibition 2016. Houten, Holland: European Association of Geosci- entists and Engineers (EAGE), Paper: Tu SBT4 12, 5 p. https:// doi. org/ 10. 3997/ 2214- 4609. 20160 0796 Szűcs P, Zákányi B (2007) Applying most frequent value (MFV) in hydrogeological modelling. Mérnökgeológia-Kőzetmechanika 161–174. Zhang J (2017) Most frequent value statistics and distribution of Li abundance observations. Mon Not R Astron Soc 468(4):5014–5019. https:// doi. org/ 10. 1093/ mnras/ stx627 1 3 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png "Acta Geodaetica et Geophysica" Springer Journals

Histogram-based weighted median filtering used for noise reduction of digital elevation model data

"Acta Geodaetica et Geophysica" , Volume 56 (4) – Dec 1, 2021

Loading next page...
 
/lp/springer-journals/histogram-based-weighted-median-filtering-used-for-noise-reduction-of-8MmKGKvGpq
Publisher
Springer Journals
Copyright
Copyright © The Author(s) 2021
ISSN
2213-5812
eISSN
2213-5820
DOI
10.1007/s40328-021-00356-2
Publisher site
See Article on Publisher Site

Abstract

A new histogram-based robust filter developed for noise reduction of digital elevation model data is presented. When large percentage of data points in data matrices are contam- inated with outlier noise, the noise reduction process can give better results than traditional median filtering, if elements with a potentially higher chance of being noise are eliminated by weighting from the input dataset before the median value is calculated. However, on the same matrices, there are likely to be subsets of data where unfiltered input is more reason- able for the calculation. The new method implementing weighting between these two cases is presented below, with its initial tuning and a comparison with both standard median filtering and the Most Frequent Value (MFV) method, as the latter being much more effi- cient than the usual methods. Following the description of the procedures, their effective- ness is compared for noise reduction in digital elevation model data systems, at various noise levels. The comparison is done mainly by three measures, with most of the focus on the L norm data distance results. Finally, a modified version of the method—which includes Steiner’s MFV filter as a core part—is also introduced, with similar examination. The method to be presented has been shown to be superior to conventional median filtering for most noise rates, and in many cases also to Steiner’ MFV, for handling non-zero mean noises. The  modified version of the method—with the help of Steiner’s MFV—has also achieved this in handling zero mean noise, in the field of application described in the paper. Keywords Noise reduction · Median filter · Digital elevation model · Weighted median 1 Introduction Due to its robustness, the Most Frequent Value (MFV) method (Steiner 1991) can be well applied in the processing of noisy datasets in geophysical (Dobróka et  al. 1991; Szabó and Balogh 2016; Szabó and Balogh 2018), hydrogeological research (Szűcs and Zákányi 2007) and various other fields (Zhang 2017). Such data systems, especially those with extreme noise, are also present in the field of spatial informatics, including digital elevation * Roland Kilik gfkr@uni-miskolc.hu Department of Geophysics, University of Miskolc, Miskolc-Egyetemváros 3515, Hungary 1 3 Vol.:(0123456789) 744 Acta Geodaetica et Geophysica (2021) 56:743–764 modelling and satellite transmission. A similarly widely used, but less robust and sophisti- cated technique is the median filtering method (Stone 1995; Huang et al. 1979). In this paper, a new median filtering method, improved by histogram operations and weighted averaging is presented, and compared to the original median filter and the MFV based procedure. The method’s modified and also presented version – which aims to elimi- nate zero mean noises – contains Steiner’s MFV filter as a core part. The aim of the proposed method’s both version is to eliminate scattered noise from digi- tal elevation models in a moving windowed manner (i.e. the procedure corrects the central element of the actual window). This study was also conducted for noise exposure at four different percentages of data points. 2 Input dataset The analysed data consisted of three 25  m spatial resolution digital elevation models of different areas, created using Topo to Raster interpolation in ArcGIS software by digitizing the contour lines, elevation points and water network of 1:10,000 scale EOTR map sheets. The histograms of the three datasets can be seen in Fig. 1. The mean of the data in the first dataset is 211.99, while the standard deviation is 8.440. The same statistical data for the second dataset is 191.31 mean and 8.752 standard deviation, while value of the mean is 96.39 and the standard deviation is 0.457 in the third data system. For each of the resulting digital elevation model data systems, firstly normally distrib- uted noise added to the data matrices, with a standard deviation to have the average noise amplitude at around 1% of the mean of the data matrix. After this—as outlier non-zero mean impulse noise—additional noise was added to 10, 15, 20, and 25 percent of the points randomly. In order to achieve this, a normally distributed noise vector is generated for every row with the mean equal of the mean of the given data row, and standard devia- tion to have the average noise amplitude around 100% of the mean of the data row. Then, the elements of the noise vector generated for a given row of data were ran- domly scattered across the row with a multiplier between 0.1 and 0.7,  giving an addi- tional ~ 10–70% noise to the data in the different test cases (referred as 0.1–0.7 noise ampli- tude below). 3 Introduction of the weighted median (WM) method The method is used to produce the corrected value of the central element at each position of the moving window going through the image matrices using a weighted mean. The fol- lowing version of the presented method is mainly for eliminating non-zero mean noises (due to, for example, measurement device problem or long distance data transfer). The weighted mean is calculated for each data point, with two weights (w , w ) defined 1 2 below. In order to achieve this, two independent window narrowing process occurs as an initial step, before calculating the weights. These narrowed windows are created from the actual (5 × 5) data window, at every window position. As a test, a window size of 9 (3 × 3) and 49 (7 × 7) was also applied, but this did not prove to be optimal for solving the problem. The process of the first window narrowing is the following. The range of the values of the elements in the moving window is divided into two and three bins with equal range widths, and then two ratios are generated: 1 3 Acta Geodaetica et Geophysica (2021) 56:743–764 745 1 3 Fig. 1 Histograms of input data matrices 746 Acta Geodaetica et Geophysica (2021) 56:743–764 • λ the element count ratio between the larger and the smaller element count domain out of the two, • λ ratio of the largest and second largest domains out of 3 (regarding the element count again). If  > , the new set ( D ) is defined as the most element count domain of the 2 1 2 domains, otherwise as the most element count domain of the 3 domains. Then the value of m defined as m = median(D). Thus, we used higher  value as an indication of a sharper cut. The ideal number of 2 and 3 as the number for ranges chosen because in case of splitting into 4 (or more), a bin would possibly not contain a sufficient number of elements from the initial 5 × 5 window, in order to do the further steps described. Then another (independent) second window narrowing process occurs, for determin- ing the value of parameter m . In order to achieve this, the original moving window’s e1 elements are sorted by value, then divided into two and three equal width ranges (based on the set of values). For example, in the case of splitting into three, if the ordered vector is v, and max(v) is its highest value element, and min(v) is the lowest, then in the third with the lowest values will be the values lower than min(v) +(max(v) − min(v))∕3. Here we calculate ratio  from the case when splitting into two. The value of  is 3 3 1∕n , where n is the sum of element count of the two bins without the highest valued bin (i.e. the bin with the lower values). Then we calculate  ratio from the case when split- ting into three bins. The value of this ratio is 1∕m , where m is the sum of the element count of the three bins without the highest valued bin. If  > : we take the highest valued bin from the two (the bin with the highest val- 3 4 ues), otherwise we take the highest valued bin from the three as the chosen set (E). Finally, m will be the average of the elements of the chosen new set. e1 A similar value, m is determined with the same method, however by splitting the e2 original window into 3 and 5 parts (instead of 2 and 3). With this step, a higher division number is reflected in the result, if the current window’s value set allows it (i.e. if the new intervals’ element count is not zero). As m and m are both calculated on chosen subsets of the largest values from the e1 e2 value set of the window, both m and m are related to maximums. Important differ - e1 e2 ence between m and m that in m , m the mean is calculated while in m the median is e s e1 e2 s calculated, with different bin numbers. In addition, in m and m , the narrowest set has e1 e2 the largest valued elements, and in m , the set contains the largest number of elements (which may not necessarily contain the largest values). Now we can calculate the first weight ( w ) of the current point’s weight vector as w =(m ∕m )∗ , (1) 1 s e1 where  is a scaling factor that ensures that the value obtained by w falls within the same range of values as w , described below. Its value (  = 1∕3 ) is determined on experimental basis to fulfill this purpose. We can also define m as the median of the values in the original moving window. Using the values described above, three sub-weights are produced as follows (all will have a role in determining the value of w weight). 1 3 Acta Geodaetica et Geophysica (2021) 56:743–764 747 w = ∗ m − max + 1, a e1 1 (2) max where max : max m , m , 1 w e1 m = m − m , (3) as e1 e2 w = ∗ m − max + 1, (4) p as 2 max where max ∶ max m , m , m , 2 w e1 e2 w = ∗ m − max + 1, p2 as 3 (5) max where max ∶ max m , mean(window) . 3 w The calculation of the max value applied in a given sub-weight ( w ,w , w ), shall in a p p2 all cases include the median of the window, as well as the mean ( m , m ) or median ( m ) e1 e2 s value of the narrowed window. In Eqs. 4 and 5, fewer elements are omitted from the original window (because of con- taining m via m ) , so that the sub-weights w , and w both calculated with a smaller e2 as p p2 multiplier than in Eq. 2. The values of these  and  constants are chosen as 0.5 (the adjust- ment procedure’s results can be in seen Sect. 6). Both w and w weights have a corrective role. Their value can be high if there is a p p2 large difference between the averages of the subsets obtained by splitting the weights into 3 and 5 ( m and m ). As can be seen in Fig.  2, a large difference between the values of e1 e2 m and m results in a large L norm error. Thus, a large difference indicates that the histo- e1 e2 1 gram operations in the current window position may distort the result, so increasing value Fig. 2 L1 norm and m value relation as 1 3 748 Acta Geodaetica et Geophysica (2021) 56:743–764 of the difference increases the weight of w (i.e. the conventional median without histo- gram operations). The subweight w can take a high value other than 1, if the median value of the original, unconstrained window m is greater than the average of the elements of the constrained window ( m ). Because the narrowed window contains the largest values of the subsets, if e1 the difference between the median of the original window and the mean of the elements of this narrowed window is outstanding, it indicates that histogram operations at the cur- rent window position distorting the result. As in the previous, a large positive difference between m and max results in a large L norm error. e1 1 1 Therefore, this should be reflected in the final weight vector  in the form of either a reduction in the value of w or an increase in the value of w (i.e. an increase in the weight 1 2 of the result of the traditional median method). The latter is achieved with the usage of w in weight w . Since w is the most important of the three correction factors ( w , w , w ), 2 a a b c its square is included in the formula w . The higher effect was achieved by squaring the weight, because the maximum value of the weight before adding one to its value is 1, so squaring the weight will not result in an extreme weight value, even at its maximum. The + 1 in the formulas for the partial weights w and w is included because they all p p2 include a maximum value subtraction, which in most cases results in a negative value, so the constant provides a shift into the positive range. In the formula of w , the role of adding 1 is shifting its minimum to greater than one (in order to be able to increase w weight in its squared value). Finally, the following two weights ( w , and w ) are produced using w and w b c p p2 respectively. w = 1 + , (6) where m is the median of the values in the original moving window. Since the maximum value of the partial weight w is not a function of the different narrowed windows, but of the median or average value of the original window, this partial weight is taken with a smaller constant: p2 w = 0.5 − . (7) With the components defined above, the weight w takes the following form w = w ∗ w ∗ w . (8) 2 b c At this point, we know the  weight vector of the current data point = w w . (9) 1 2 In weight vector  (Eq. 9), weights w and w have an effect on the median of the cur - 1 2 rent data window ( m ) on the one hand, and on the median of the reduced set of the same window ( m ) on the other hand, w weighting the latter, and w weighting the former as fol- s 1 2 lows (for example, on the k-th element of the data matrix): res =(w ∗ m + w ∗ m )∕(w + w ). (10) WMk 1 s 2 w 1 2 As described above, the median of a narrowed window ( m ) , and the original win- dow’s median ( m ) is weighted at every window position for the final result of the 1 3 Acta Geodaetica et Geophysica (2021) 56:743–764 749 actual point. Weight of m is (m /m )*α (i.e., the median is divided by the average of s s e1 the maximal values of narrowed window). If this ratio is for example low for the given moving window position due to the noise (high m average of maximums), then m ’s e1 s weight should be proportionally low, because otherwise, the high value of the outlier maximums would have negative effect on the final result. In such cases m ’s weight will be proportionally high – not only because of the low weight of m , but due to the fact that m is weighted by w ,w , w , all containing m or m values. w a b c e1 e2 4 Most Frequent Value Method A much more reliable statistical characteristic than the arithmetic mean, the weighted mean, is obtained by assigning a small weight ( w ) to points far from the majority of the data ( X ) and a larger weight ( w ) to points in the highest data density location (Eq. 11). k k −1 N N M = X w w (k = 1, 2, … , N). (11) k k k k=1 k=1 The k-th weight is chosen by Steiner (Steiner 1991) as follows: 2 2 w =  ∕  + X − M . (12) k k In the above, N is the number of data and ε is the dihesion, a scalar parameter. If ε is large, then all data are given nearly equal weights and outliers will spoil the value esti- mate, and if ε is too small, care must be taken to avoid ignoring some data. The weighted mean, called the most frequent value (M) defined by Eq. (11), should be known in advance in order to assign weights with maximum values at its location and smaller and smaller weights away from it. Therefore, this procedure requires an iterative algorithm in which M and ε are determined jointly. In the first iteration step, the dihe- sion can be estimated from the sample space using the following formula: ε = max X − min X , (13) 1 k k while for the M the initial value is preferably chosen as the sample mean or median. In this study, the median value was used. In subsequent iteration steps, M and ε can be derived from each other according to the following procedure: Table 1 Example of adjusting constant value used for calculating w 2 1.5 1 0.5 0.25 0.125 0.0625 0.03125 L norm distance 843.12 822.32 801.63 762.82 794.51 956.47 1720.47 2393.27 1 3 750 Acta Geodaetica et Geophysica (2021) 56:743–764 X −M N ( ) 2 k j 3 � � j+1 k=1 2 2 k k=1 2 +(X −M ) k j  +(X −M ) j k j j+1 (14) = ↔ M = . ∑ j+1 j+1 2 N 1 j+1 � � k=1 2 2 2 k=1 2 +(X −M )  +(X −M ) k j k j j j+1 5 Description of the resulting numerical values For comparing the results of the different filtering methods, the following metrics are used in the paper. Calculation of the RMSE (Root Mean Square Error) value for the MFV method and median filtering (where inp is the noise-free data matrix): � N � (res − inp ) Sti i (15) i=1 RMSE = , St where res : matrix corrected by Steiner’s MFV, St � N � (res − inp ) Medi i (16) i=1 RMSE = , Med where res : matrix corrected by median method, Med � N � (res − inp ) WMi i (17) i=1 RMSE = , WM where res : matrix corrected by weighted median. WM Deviation regarding the three procedures: Std = std res − inp , (18) St St Std = std res − inp , (19) Med Med Std = std res − inp , (20) WM WM where res , res , res inp: as above. St Med WM L norm: L =∥ res − inp ∥ , 1St St 1 (21) L =∥ res − inp ∥ , (22) 1Med Med 1 1 3 Acta Geodaetica et Geophysica (2021) 56:743–764 751 L =∥ res − inp ∥ , (23) 1WM WM 1 where res , res , res , inp: as above. St Med WM 6 Adjusting constants Regarding the 0.5 constant value of w , it was also tested in some randomly chosen test cases, that how the L norm distance value between the noise-free matrix, and the weighted median- corrected matrix changes with the different values of the constant. The value of the norm monotonously decreased in every such cases, as can be seen as an example in Table 1 (10% noise rate, 0.3 noise amplitude): Similarly, the 0.5 constant of w was also tested, and it produced similar results, as can be p2 seen in the example of Table 2 (for the same parameters as in the previous case): All of the constants are examined with a few chosen values, however, the global optimisa- tion of them is not part of this paper. 7 Comparative results In all cases, a 5 × 5 window ran through the matrices, and both the Steiner-MFV method (using as a filter – Dobróka 2021) and the weighted median method described here always corrected the value of the window’s central element, with all elements of the window as input. In all cases, the number of iterations for the Steiner filter was 20 and the initial weight was the median of the window elements. In all cases the results were also compared with those obtained using the classical median filter in MATLAB software. The median filter ran on the same noisy data matrices with the same window size as the Steiner method and the weighted median method. The example shown in Table 3 shows the values of the result metrics on the first data series, with noise on 15% of the data points, with a noise amplitude multiplier of 0.3 Another example can be seen in Fig. 3, again noise on 15% of data points, now with a noise amplitude multiplier of 0.5. Figure 3. (a) shows the original data, (b) the noisy data, (c) is the result of Steiner method, (d) is the result of the weighted median method and (e) is the result of the classical median method. Table  4 shows the values of the L norms showing the distance from the noise-free input data matrix and their ratios, using the weighted median method ( L ) and the Steiner 1WM method ( L ), with 25% of the points contaminated with noise, as a function of different noise 1St amplitudes (0.1,…,0.7) on the first data set. The values show that in two cases the Steiner method gives better results, by about 6%, and in the other cases the weighted median method proved to be better. The latter gives better results by 6.3% on average (since the average of the L ∕L ratios is 0.937). 1WM 1St Table 2 Example of adjusting constant used for calculating w p2 2 1.5 1 0.5 0.25 0.125 0.0625 0.03125 L norm distance 735.13 721.78 698.67 654.31 668.807 668.94 676.11 685.71 1 3 752 Acta Geodaetica et Geophysica (2021) 56:743–764 Table 3 Example of comparative L 538.98 1WM results (10% additional outlier noise on 15% of the data points) L 630.41 1St L 711.19 1Med RMSE 1.42 WM RMSE 2.02 St RMSE 2.13 Med Std 1.19 WM Std 1.75 St Std 1.58 Med Table  5 shows the results of the weighted median procedure and the standard median filtering at the same noise level. In this case, the weighted median procedure proved to be better on the data set by 26.4% on average. Tables  6 and 7 show the same comparison as before, regarding the L norm for noise affecting 20% of the points. In this case, there are noise amplitude values where the Steiner method gives a smaller distance to the noise-free matrix than the weighted median pro- cedure, but in none of the cases the standard median procedure could achieve this. The weighted method is on average 23% better than the latter. Tables 8 and 9 show the case with 15% noisy points. The results of Table 8 comparing the Steiner method with the weighted median proce- dure show that for a noise amplitude multiplier of 0.1, the Steiner method is superior. The weighted median procedure outperforms the unweighted median procedure by an average of 21.2% on the first data set for 15% noisy points (Table  9). Tables  10 and 11 show the data distances of the three procedures for the case where 10% of data is contami- nated by noise according to the L norm. In this case (with noise on 10% of the points), the Steiner method outperforms the weighted median method in most of the different noise amplitudes—in 4 cases, with an average of 5.6% in the four cases, and in the remaining three cases the WM method outper- forms on the data set, by 13%. Table 12 shows the RMSE values obtained by the three procedures and their ratios, for a noise exposure of 25% on the left and 20% on the right, for different noise amplitude multi- pliers (0.1,…,0.7) in both cases. Table 13 shows the RMSE values as before, now for noise at 15% and 10% of the data points. It can be seen that the weighted median method performs worse for the highest noise amplitude multipliers (0.7), but this is also true for the other methods, so the ratio to the other methods does not deteriorate. The RMSE obtained by the MFV method is closest to that obtained by the weighted median method for the smallest noise amplitude multi- plier (0.1). The RMSE value for the weighted median method is on average 79.5% of that obtained by the conventional median method. As the weighted median procedure was found to be the least efficient when 10% of the data points were contaminated with noise in the above tests, the standard deviations were also examined in this case. An example of this compared to the standard median method is shown in Table 14. Table  15 shows the average of the results for the first and second data sets. The mini- mum, maximum and average of the data distance ratios for the L norm as a function of the 1 3 Acta Geodaetica et Geophysica (2021) 56:743–764 753 1 3 Fig3 Visual example of results on the first dataset 754 Acta Geodaetica et Geophysica (2021) 56:743–764 Table 4 L1 norm values at 25% noise ratio Noise amp. mult 0.1 0.2 0.3 0.4 0.5 0.6 0.7 L 540.22 425.44 420.39 471.26 615.92 787.36 515.82 1WM L 597.90 457.46 536.20 541.64 580.61 738.81 545.17 1St L L 0.90 0.92 0.784 0.87 1.06 1.06 0.94 1WM/ 1St Table 5 L1 norm values at 25% noise ratio Noise amp. mult 0.1 0.2 0.3 0.4 0.5 0.6 0.7 L 540.22 425.44 420.39 471.26 615.92 787.36 515.82 1WM L 604.39 582.33 725.92 692.10 827.83 970.98 723.63 1Med L L 0.89 0.73 0.57 0.68 0.74 0.81 0.71 1WM/ 1Med Table 6 L1 norm values at 20% noise ratio Noise amp. mult 0.1 0.2 0.3 0.4 0.5 0.6 0.7 L 566.65 437.97 440.14 485.61 708.77 646.18 759.37 1WM L 545.34 468.96 469.76 608.38 1002.99 866.07 841.60 1St L L 1.03 0.93 0.93 0.79 0.70 0.74 0.90 1WM/ 1St Table 7 L1 norm values at 20% noise ratio Noise amp. mult 0.1 0.2 0.3 0.4 0.5 0.6 0.7 L 566.65 437.97 440.14 485.61 708.77 646.18 759.37 1WM L 612.87 627.87 612.76 741.06 901.39 792.62 979.72 1Med L L 0.92 0.69 0.71 0.65 0.78 0.81 0.77 1WM/ 1Med Table 8 L1 norm values at 15% noise ratio Noise amp. mult 0.1 0.2 0.3 0.4 0.5 0.6 0.7 L 542.97 433.71 538.98 464.25 493.81 675.11 624.08 1WM L 519.85 473.9043 630.41 490.20 561.66 732.78 644.35 1st L L 1.04 0.91 0.85 0.94 0.87 0.92 0.96 1WM/ 1St different noise levels can be seen for the Steiner method and the weighted median method. The mean of the minima (i.e. the mean of the cases where the largest difference is in favour of the weighted median method) is 0.82, i.e. in these cases the method is 18% better. The average of the maxima is 1.032, i.e. in the opposite case the Steiner method is on average 3.2% better for the two data systems combined. 1 3 Acta Geodaetica et Geophysica (2021) 56:743–764 755 Table 9 L1 norm values at 15% noise ratio Noise amp. mult 0.1 0.2 0.3 0.4 0.5 0.6 0.7 L 542.97 433.71 538.99 464.25 493.81 675.11 624.08 1WM L 604.28 614.34 711.192 639.07 672.32 846.64 695.19 1Med L L 0.89 0.7 0.75 0.72 0.73 0.79 0.89 1WM/ 1Med Table 10 L1 norm values at 10% noise ratio Noise amp. mult 0.1 0.2 0.3 0.4 0.5 0.6 0.7 L 521.53 483.07 543.86 484.36 779.51 727.48 724.2 1WM L 507.13 482.52 542.96 527.25 652.87 939.2 790.58 1St L L 1.02 1 1 0.91 1.19 0.77 0.91 1WM/ 1St Table 11 L1 norm values at 10% noise ratio Noise amp. mult 0.1 0.2 0.3 0.4 0.5 0.6 0.7 L 521.53 483.07 543.86 484.36 779.51 727.48 724.2 1WM L 605.38 637.25 746.44 645.17 871.46 1038.26 865.18 1Med L L 0.86 0.75 0.72 0.75 0.89 0.7 0.83 1WM/ 1Med Table  16 shows the L norm results grouped according to the same method, in this case for the two median procedures. The average of the minima is 0.692, so in the cases where the weighted median procedure is the best, this method is better by more than 30%. The average of the maxima is 0.86, so that even in the worst cases the weighted median procedure is on average 14% better than the standard median filtering for the two data sets combined. A less detailed comparison was carried out on the third set of data (examining only the L norm ratios). This study showed similar characteristics as the previous ones, however the Steiner method was found to be the best of the three in more cases than before. For 10% and 15% noisy data points (both with 7 different noise amplitudes, as before), the Steiner method was found to be superior to the weighted median method in 11 out of 14 cases, with an aver- age of 14.6% (regarding L norm ratios). In the 20% and 25% noisy point cases, the weighted median method gave a better result, in 14 out of 14 cases, with an average of 19.04%. Comparing the two median methods on the data set (again with L norm ratios), in 27 out of 28 cases, the weighted median method proved to be better, with an average of 14.32%. 8 Handling zero mean noises The previously presented version of the proposed method developed for mainly non-zero mean noises. In the following, a second, modified version of the method is introduced, what purpose is mainly handling zero-mean normal distribution noises. This version of the 1 3 756 Acta Geodaetica et Geophysica (2021) 56:743–764 1 3 Table 12 RMSE values at 25% and 20% noise ratio Noise ratio 25% 20% Noise amp. mult 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.1 0.2 0.3 0.4 0.5 0.6 0.7 RMSE 1.54 1.47 1.55 2.04 2.50 2.73 2.25 1.58 1.51 1.71 1.63 2.14 2.52 2.74 WM RMSE 1.79 1.95 2.16 2.50 2.84 3.07 2.64 1.84 2.03 2.30 2.17 2.58 2.95 3.10 Med RMSE 1.60 1.74 2.07 2.73 3.29 3.79 3.76 1.63 1.82 2.21 2.50 3.09 3.61 4.24 St RMSE /RMSE 0.86 0.75 0.72 0.82 0.88 0.89 0.85 0.86 0.75 0.74 0.75 0.83 0.85 0.88 WM Med RMSE /RMSE 0.96 0.84 0.75 0.75 0.76 0.72 0.60 0.97 0.83 0.77 0.65 0.69 0.70 0.65 WM St Acta Geodaetica et Geophysica (2021) 56:743–764 757 1 3 Table 13 RMSE values at 15% and 10% noise ratio Noise ratio 15% 10% Noise amp. mult 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.1 0.2 0.3 0.4 0.5 0.6 0.7 RMSE 1.59 1.55 1.42 1.70 1.83 2.34 2.55 1.60 1.46 1.75 1.50 2.12 2.02 2.72 WM RMSE 1.83 2.04 2.13 2.22 2.36 2.74 2.92 1.86 2.00 2.29 2.10 2.56 2.60 3.08 Med RMSE 1.62 1.83 2.02 2.55 2.97 3.72 3.81 1.66 1.84 2.13 2.36 3.01 3.32 4.23 St RMSE /RMSE 0.87 0.76 0.67 0.77 0.77 0.85 0.87 0.86 0.73 0.77 0.72 0.83 0.78 0.88 WM med RMSE /RMSE 0.98 0.85 0.70 0.67 0.61 0.63 0.67 0.96 0.80 0.82 0.64 0.71 0.61 0.64 WM St 758 Acta Geodaetica et Geophysica (2021) 56:743–764 Table 14 Standard deviation Noise amp. mult 0.1 0.2 0.3 0.4 0.5 0.6 0.7 values at 10% noise regarding the two median methods Std 1.41 1.25 1.37 1.20 1.43 1.37 1.62 WM Std 1.42 1.51 1.70 1.56 1.76 1.74 1.96 Med Std Std 0.99 0.82 0.80 0.77 0.81 0.79 0.83 WM / Med Table 15 L1 norm ratios for two data sets combined Noise exposure percent- min L ∕L max L ∕L avg L ∕L 1WM 1St 1WM 1St 1WM 1St age 25% 0.81 1.07 0.93 20% 0.81 0.99 0.91 15% 0.83 0.98 0.90 10% 0.80 1.07 0.93 Table 16 L1 norm ratios for two Noise expo- min L ∕L max L ∕L avg L ∕L 1WM 1Med 1WM 1Med 1WM 1Med data sets combined sure percent- age 25% 0.63 0.86 0.74 20% 0.67 0.86 0.77 15% 0.73 0.84 0.78 10% 0.72 0.85 0.77 method calls upon and uses Steiner’s MFV values to correct the actual central element of the given data window. 8.1 Noise generation Regarding the noise generation process, in the first step, a general zero mean noise was added to the data matrix. In order to achieve this, a normal distribution noise generated for every row of data, with the mean value equal to the mean of zero, and standard deviation 1. In order to add outlier noise to given percentage of points for the examination, addi- tional zero mean normal distribution noise is added randomly to 20, 15, 10, and 5 percent of the points, with 0.1–0.7 amplitude multiplier in all of such cases (as in the previously introduced version of the method). The noise’s standard deviation was always the mean of the current data row. 8.2 Modified version of the weighted median method As with the method’s previously introduced version, the first steps are histogram filterings. Firstly a median, then and a mean value is generated from filtered windows, however here with a two-step filtering for both. For producing the value of m median value, the histogram-based filtering is the following. 1 3 Acta Geodaetica et Geophysica (2021) 56:743–764 759 Based on the set of values, the elements of the current data window are divided into two and three ranges (bins) with equal range widths, and then two ratios are generated: : the ratio of the largest and second largest domains out of 2, •  : ratio of the largest and second largest domains out of 3 (in both cases regarding the element count). If  >  , the new set ( D ) will have the most element count domain of the 2 domains, 1 2 otherwise it will have the most element count domain of the 3 domains. Finally, the value of m is: median(D). For determining the value of m , a second narrowing process is done. In this pro- e3 cess, the window elements are first sorted by value and then divided into three equal width ranges based on the set of values. Here we calculate  ratio, as the ratio between the element count of the window and the sum of the element count of the thirds without the highest valued third. Then we calculate  , whose value will be the ratio between the element count of the most and the second most element count bins. Thus,  and  are 3 4 calculated differently than in the previous version of the method. If  >  : we take the highest third (the bin with the highest values), otherwise take 3 4 the bin with the most element count as the chosen set (E). Thus, we used higher  value as an indication of a sharper cut. We take this truncated E set, and distribute its values into bins. The bin width is determined with Scott’s rule (Scott 1979, 1992): 1∕3 3.5 ∗ std(E)∕numel E . (24) We must also determine the number of bins, in order to be able to distribute all the values into them (what is a trivial step because of having the bin widths and the data values). We take the bin with the largest element count, and m is the average of the e3 elements of the chosen bin. Since we have the values of m and m , we can replace the actual window’s middle s e3 element with m (forming w ), and similarly with m (constructing w ). Finally, let s r_ms e3 r_me the window with the MFV method’s result at its centre w . r_St In the next step, we concatenate w , w and w one by one with the original r_ms r_me r_St (noisy) actual window, forming w , w , w . u1 u2 u3 Now we can calculate three gradient measures in the following way: w w w w u u u u G(x, y) = ∗ + ∗ , (25) x x y y g = G (x, y). (26) X ∗ Y x∈X y∈Y In the formula, g is the result of g when using w , g is the value of g in case of 1 repl_St 2 w , and g is the value of g when using w . repl_ms 3 repl_me If min(g1, g2, g3) is g , then in the actual  = w w weight vector, the value of w is 1 1 2 1 0, and the value of w is 1. Thus, in this case only the result of the MFV method counts in the given data window’s correction. If min(g1, g2, g3) is g , t hen w is 0.15 and w is 2 1 2 1 3 760 Acta Geodaetica et Geophysica (2021) 56:743–764 1 3 Table 17 L1 norm ratios on first data set 25% Noise exposure 20% Noise exposure L L 0.991 0.987 0.996 1.004 0.991 0.997 0.993 0.978 1.005 0.998 0.975 0.990 0.994 0.997 1WM/ 1St L L 0.881 0.847 0.781 0.798 0.777 0.704 0.760 0.903 0.846 0.822 0.795 0.818 0.802 0.807 1WM/ 1Med 15% Noise exposure 10% Noise exposure L 0.987 0.982 0.993 0.978 0.991 0.988 0.978 0.979 0.975 0.979 0.972 0.983 0.982 0.970 1WM/ 1St L 0.917 0.879 0.816 0.835 0.861 0.844 0.840 0.922 0.903 0.813 0.833 0.870 0.852 0.878 1WM/ 1Med Acta Geodaetica et Geophysica (2021) 56:743–764 761 1 3 Table 18 L1 norm ratios on second data set 25% Noise exposure 20% Noise exposure L 0.987 0.979 0.994 0.989 0.968 0.991 0.987 0.979 0.994 0.990 0.982 1.000 0.987 1.001 1WM/ 1St L 0.862 0.840 0.897 0.755 0.786 0.775 0.706 0.889 0.843 0.858 0.768 0.793 0.815 0.748 1WM/ 1Med 15% Noise exposure 10% Noise exposure L 0.995 0.993 0.983 0.987 0.993 0.987 1.000 1.005 0.995 0.983 0.979 0.990 0.989 0.994 1WM/ 1St L 0.919 0.891 0.855 0.809 0.799 0.868 0.857 0.913 0.902 0.884 0.871 0.890 0.829 0.860 1WM/ 1Med 762 Acta Geodaetica et Geophysica (2021) 56:743–764 1 3 Table 19 L1 norm ratios on third data set 25% Noise exposure 20% Noise exposure L 1.000 0.994 1.003 0.989 1.001 0.999 0.990 0.995 0.997 0.995 0.995 0.994 0.982 0.991 1WM/ 1St L 0.904 0.852 0.798 0.939 0.876 0.941 0.882 0.900 0.915 0.815 0.836 0.815 0.856 0.858 1WM/ 1Med 15% Noise exposure 10% Noise exposure L 0.995 0.991 0.986 0.984 0.996 0.990 0.998 1.001 0.984 0.991 0.986 1.002 0.994 0.976 1WM/ 1St L 0.943 0.925 0.873 0.924 0.852 0.888 0.901 0.941 0.940 0.957 0.925 0.936 0.931 0.887 1WM/ 1Med Acta Geodaetica et Geophysica (2021) 56:743–764 763 0.85. If min(g1, g2, g3) is g , t hen w is 0.4 and w is 0.6. Thus, in this case, the weight of 3 1 2 the MFV method’s result is 0.6 for the given window. As we can see, in all of these cases, we are weighting the Steiner MFV method’s results, and increasing or decreasing its weight in the correction of the actual data window’s cen- tral element. Similar to the previous version of the method, we get poor results, if the value of m i.e. as m − m is large. In order to be able to handle this, here we have to calculate both m e1 e2 e1 and m (the same way as in the previous version), and if the difference is greater than 2% e2 of the mean of raw data, then w should be 0, and the value of w should be 1. 1 2 8.3 Results of the modified version of WM filtering procedure In Table  17, L norm ratios can be seen for the first data set, comparing WM method’s results with both the MFV’s result and the original median method’s. In the former case, the WM method performed better in 26 test cases out of the 28. In that cases, the average of the L norm ratio was 0.985, thus usage of the method resulted in 1.4% lower L norm 1 1 values on average. In the remaining two cases, the mean of the ratio was 1.005, thus the WM method performed 0.5% worse in those two cases. The best L norm ratio value was 0.97, thus the WM method gave 3% lower L norm value in that particular noise reduction. Regarding the comparison with the original median method, the WM method performed better in all of the 28 cases, by 16.4% on average (0.836 average L norm ratio). Here the best result was a 29.5% improvement (0.705 L1 norm ratio). Table  18 shows results in the same structure as the previous one, here on the second data set. Comparing with Steiner’s MFV, the WM method gave better results according to L norm in 23 cases (by 1.25% on average), and in the remaining 5 cases, performed worse by 0.21% avg. At its best, the WM method gave lower L norm value by 3.12%. In comparison with the other median method, WM performed better in all of the cases (16% avg., 29.4% max.). In Table  19 the third data set’s L norm ratios can be seen. Comparing WM’s results with the MFV method’s, the former performed better in 23 of the 28 cases (by 1% avg., 2.4% max.), and the MFV method was superior in 5 cases (1.4% avg.). Regarding the comparison with conventional median method, WM performed better in all of the cases, by 10.7% avg., 20.2% max. 9 Conclusions The effectiveness of the histogram-based weighted median procedure described above has been demonstrated for noise elimination in digital elevation model data. The method’s main purpose is eliminating outlier noise in data matrices, especially if a high percentage of the matrix points are contaminated with outlier noise. Averaged over the different noise amplitudes and noise exposure percentages investi- gated, the WM method outperformed the standard median filtering procedure on the dif- ferent data sets by 14–23% regarding data distance calculated with L norm for eliminating non-zero mean noises. The version of the method for filtering zero mean noises, performed better by 14.3% on average against the conventional median filter. 1 3 764 Acta Geodaetica et Geophysica (2021) 56:743–764 Beyond general refinement and optimisation of the method, there is room for improve- ment particularly in more effective handling of the low noise exposure cases. Funding Open access funding provided by University of Miskolc. None Data availability Because of the data confidentiality, the experimental data is not published. Code availability Because of the data confidentiality, the code is not published. Declarations Conflict of interest The authors have no conflicts of interest to declare that are relevant to the content of this article. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Com- mons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/. References Dobróka M, Gyulai Á, Ormos T, Csókás J, Dresen L (1991) Joint inversion of seismic and geoelectric data recorded in an underground coal mine. Geophysical Prospecting, 39 (5). ISSN 0016–8025:643–665. https:// doi. org/ 10. 1111/j. 1365- 2478. 1991. tb003 34.x Dobróka T (2021) An MFV-based image processing filter and its application in seismic tomographic images. Acta Geodaetica et Geophysica. https:// doi. org/ 10. 1007/ s40328- 021- 00351-7 Huang TS, Yang GJ, Tang GY (1979) A fast two-dimensional median filtering algorithm. IEEE Trans Acoust Speech Signal Process 27(1):13–18. https:// doi. org/ 10. 1109/ TASSP. 1979. 11631 88 Scott DW (1979) On optimal and data-based histograms. Biometrika 66:605–610 Scott DW (1992) Multivariate density estimation: theory, practice, and visualization. John Wiley & Sons, New York Steiner F (1991) The Most Frequent Value. Introduction to Modern Conception Statistics, Budapest. ISBN 978–9630556873 Stone DC (1995) Application of median filtering to noisy data. Can J Chem 73(10):1573–1581. https:// doi. org/ 10. 1139/ v95- 195 Szabó NP, Balogh GP, Stickel J (2018) Most frequent value-based factor analysis of direct-push logging data. Geophys Prospect. https:// doi. org/ 10. 1111/ 1365- 2478. 12573 Szabó N, Balogh GP (2016) Most frequent value based factor analysis of engineering geophysical sounding logs. 78th EAGE Conference and Exhibition 2016. Houten, Holland: European Association of Geosci- entists and Engineers (EAGE), Paper: Tu SBT4 12, 5 p. https:// doi. org/ 10. 3997/ 2214- 4609. 20160 0796 Szűcs P, Zákányi B (2007) Applying most frequent value (MFV) in hydrogeological modelling. Mérnökgeológia-Kőzetmechanika 161–174. Zhang J (2017) Most frequent value statistics and distribution of Li abundance observations. Mon Not R Astron Soc 468(4):5014–5019. https:// doi. org/ 10. 1093/ mnras/ stx627 1 3

Journal

"Acta Geodaetica et Geophysica"Springer Journals

Published: Dec 1, 2021

Keywords: Noise reduction; Median filter; Digital elevation model; Weighted median

References