Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Dynamic Fuzzy Rule-based Source Selection in Distributed Decision Fusion Systems

Dynamic Fuzzy Rule-based Source Selection in Distributed Decision Fusion Systems FUZZY INFORMATION AND ENGINEERING 2018, VOL. 10, NO. 1, 107–127 https://doi.org/10.1080/16168658.2018.1509524 Dynamic Fuzzy Rule-based Source Selection in Distributed Decision Fusion Systems F. Fatemipour and M. R. Akbarzadeh-T Department of Computer Engineering, Center of Excellence on Soft Computing and Intelligent Information Processing, Ferdowsi University of Mashhad, Mashhad, Iran ABSTRACT ARTICLE HISTORY Received 16 January 2018 A key challenge in decision fusion systems is to determine the best Revised 15 March 2018 performing combination of local decision makers. This selection pro- Accepted 19 April 2018 cess can be performed statically at the training phase or dynamically at the execution phase, taking into consideration various features KEYWORDS of the data being processed. Dynamic algorithms for the selection Fuzzy linguistic rule base; of competent sources are generally more accurate, but they are distributed decision-making; also computationally more intensive and require more memory. In dynamic classifier selection; classifier selection; decision this research, we propose a fuzzy rule-based approach for dynamic fusion source selection (FDSS) that compresses the knowledge from local sources using a divide-and-conquer strategy along with the basic concepts of coverage and truth value criteria, leading to less memory requirement and faster processing. A top-down approach to FDSS is then used to reach a parameter-free algorithm, i.e. one that avoids the restrictive parameters/threshold settings of FDSS. The rule bases in both approaches are created recursively and use the conditional probabilities of each class’s correctness as the rule’s weight. The pro- posed approaches are compared against several competing dynamic classifier selection methods based on local accuracy. Results indi- cate that the proposed fuzzy rule structures are generally faster and require less memory, while they also lead to more accurate decisions from the uncertain decisions from multiple sources. 1. Introduction Nowadays, data are produced at staggering rates due to the widespread communication and sensor technologies around the world. While the availability of data is an enormous opportunity for making better decisions, storing and processing it presents a great chal- lenge. A reasonable approach here could be to keep only the most relevant and to process only the most appropriate. If so, data could be gainfully transformed to information and subsequently processed to knowledge and ultimately to wisdom; with wisdom here being defined as the essence of what is most relevant, widely appropriate and most lasting. However, it is non-trivial to define the concepts ‘relevance’, ‘appropriateness’, and ‘last- ing’. What complicates this process further is when these data are not in one place, which is a common occurrence due to the distributed nature of the available data today. These CONTACT F. Fatemipour farnoosh.fatemipour@stu-mail.um.ac.ir © 2018 The Author(s). Published by Taylor & Francis Group on behalf of the Fuzzy Information and Engineering Branch of the Operations Research Society of China & Operations Research Society of Guangdong Province. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. 108 F. FATEMIPOUR AND M. R. AKBARZADEH-T Figure 1. A simple visualization of a decision-making center that is connected to four information sources where each information source is connected to a different data source. data are often provided from different sources and contain different aspects of information with concerns for their privacy. Besides, it is virtually impossible to aggregate all data in one place for processing and mining, due to databases’ high volume, variety and velocity. Accordingly, handling vast data sets and developing distributed data mining algorithms that analyze and summarize distributed data into usable knowledge is highly challeng- ing. In this process, decision fusion is a key methodology for making decisions with vast amounts of data located at different sources. The main concerns in developing such decision-making system is handling the inconsistency and uncertainty of decisions, dif- fering types of local sources and competency levels, and the variety of decision fusion strategies. Decision fusion is the process of combining the decisions made by multiple informa- tion sources. This kind of systems can be either human experts, classifiers, regressors, or any other kind of decision makers. A simple visualization of a decision fusion center is depicted in Figure 1. Decision fusion approaches let each source make its own decision with its local information. The main purpose of such approaches is to make effective use of the disparate local sites without direct access, in order to preserve their privacy. Three main phases typically included in fusion-based decision-making systems are generating local sources, pruning them and fusing their decisions [1]. Decision fusion has many advan- tages for data mining including the possibility of using a set of low cost computers to train a bunch of algorithms with subsets of the whole training data that fit in their main memory. In this way, it would also be possible to apply algorithms on very large data sets in a feasible time. There are a number of challenges in distributed decision fusion. First of all, the local systems are individually trained by the separately collected data obtained from each site. Based on the specifics of gathering information such as data location and time, these local FUZZY INFORMATION AND ENGINEERING 109 data may not cover the entire feature space. These differences also lead to different views about the data. Moreover, inconsistencies commonly exist in a group of separately made decisions from different views and localities. As a result, some sources may be inefficient and reduce the performance of the system [2]. Too many of these poor sources can suppress correct predictions of good sources [3]. For this reason, some local sources are eliminated before fusion and the final decision is made by using a selected subset of sources [4]. In this paper, we focus on effective selection of sources. The current approaches for selecting local sources in the literature generally falls into two categories: static and dynamic. Static approaches apply a region of competence for each source in the feature space at the training phase, while dynamic approaches apply the com- petence regions during the execution phase and determine the best sources using training data considering the data being analyzed. In other words, the accuracy of each source is estimated in a local area surrounding the data named ‘region of competence’ or ‘local accu- racy’ [5]. In general, dynamic approaches are more accurate than static approaches [6]. The most common strategy for dynamic approaches in the literature so far uses neighborhood of the data [1], which requires keeping all training data in memory and searching them for each test data in order to find its neighbors. Determining a specific number of neigh- bors for each data is very time consuming [7] and the final decision depends on the size and shape of the neighborhood [1]. Therefore, dynamic approaches require larger memory and are more computationally intensive compared to static methods [7,8]. This is partic- ularly so in applications with larger scale and higher computational complexity [1,8]; and as a result, their applicability is often criticized [1] and their benefits have remained out of reach. The objective of the present work is to use fuzzy logic to compress the knowledge about the expertise of sources into a single rule base that fits well in memory while also provid- ing fast and accurate results. Specifically, we propose two algorithms which extract useful knowledge about the competence regions and performances of local sources and store this knowledge in the form of a fuzzy rule base. Upon arrival of new data, both algorithms gradually partition the input space and define one or more rules for each partition. Each rule in the final rule base assigns a competence region to a local source. Since the firing degree of each rule depends on the location of the data, the contribution of each source in the final decision is determined based on the data at hand. Hence, the above methods are categorized as dynamic selection methods. The first proposed Fuzzy Distributed Source Selection (FDSS) algorithm dynamically weights the local sources for each new data using a rule-based system which represents the priori stored knowledge about the competence regions of sources. This rule-based sys- tem is constructed using the previously available data in the fusion center called validation set. Divide and conquer strategy is used to recursively partition the input space and deter- mine the competence regions of the local sources to construct the rule base. Inspired by fuzzy rule measures and linguistic summarization concepts, truth value and coverage [9], we propose two measures for defining the conditions of recursion termination. We pre- vent the generation of redundant rules by defining two appropriate thresholds for these two measures. In order to control the contribution of each source in the final decision, we also provide the rule base with an extra parameter that controls the firing degrees of rules. We use the conditional probability of correctness of each rule for each class estimated over the validation set and create the rule base in a probabilistic form. The second proposed 110 F. FATEMIPOUR AND M. R. AKBARZADEH-T Table 1. List of symbols. x Vector of input data, x = [x , ... , x ]; 1 d L ; Label of x; Output of source S for data x ; i k O = [O , ... , O , ... , O ]; S ,x S ,x ,1 S ,x ,j S ,x ,m i i i i k k k k s ,x i k 1, S labels x to class j, i k O = j = 1, ... , m S ,x ,j i k 0, o.w., 1, S labels x correctly, i k S ,x i k 0, Otherwise. β {x |α = 1&L = C }, ij k S ,x x j i k k β {x |α = 0&L = C }, ij k S ,x x j i k k S Sourcei, i = 1, ... , n, T Truth value of S , i i V Coverage of S , i i C Class j, j = 1, ... , m, |∗| Cardinality of set *, ¯ ¯ Vector of conditional probabilities, P = [P , ... , P , ... , P ], P = P(C |S), j = 1, ... , m, S S 1 j m j j i i R Number of rules in fuzzy rule-based system, A Average of B . Used as center of the corresponding rule. i i algorithm, top-down FDSS, differs from FDSS by omitting the pre-defined threshold val- ues while constructing the rule base in a top-down manner. This methodology does not require the pre-defined parameters and conditions of the first method and hence is no longer affected by the threshold settings. Training and storing a fuzzy rule-based system for weighting and selecting sources carries several benefits. Using fuzzy logic, we combine the merits of dynamic selection methods with those of static methods. While less memory is typically required for storing the rule-based system than storing the complete data set, we also omit the computation- ally intensive process of searching the whole training data set to find the nearest neighbors of each test data. Avoiding the risk of defining a single region of competence for each source and let- ting each source contribute to the estimation of the accuracies according to its adja- cency to the data results in avoiding the problems arising from crisp boundaries [10]. Using fuzzy logic also better enables coping with uncertainties that often exist in the outputs of local sources. The proposed approaches also benefit from the interpretability and representation ability of fuzzy rule-based systems that make it suitable for real-life applications. The rest of this paper is organized as follows. For simplification, symbols used in the paper are shown in Table 1. In Section 2, we review the current algorithms in the literature for selecting sources in fusion systems. In Section 3, the details of the proposed approaches are explained. Experimental results are included in Section 4. Finally, Section 5 concludes the paper. 2. Literature review Static and dynamic methods are two main categories of source selection strategies in deci- sion fusion systems. The first category or static selection, selects the best decision maker or decision makers at the training phase. Second category or dynamic selection applies the FUZZY INFORMATION AND ENGINEERING 111 selection process at the testing phase involving the input data. Yin et al. [11] classifies deci- sion combination methods into two main groups. The first group aims to train and combine an ensemble of classifiers during the learning process, e.g. Boosting [12] and Bagging [13]. Selection phase of this group in the literature commonly appears under different names, such as ensemble pruning, ensemble selection or ensemble thinning [14]. The second group combines the results of multiple available decision makers to solve the targeted problem. This group often trains a meta-learner to combine the component decision makers intelli- gently [11]. Although this paper focuses on the second group, we explore current literature in both groups in the viewpoints of source selection strategies. The rest of this section is dedicated to a brief literature review on each category, emphasizing dynamic methods which this paper focuses on. 2.1. Static selection The selection phase in static selection methods aims to find the subset of classifiers with optimal accuracy [8]. Different methods have been proposed in the literature to select one subset of classifiers. Based on the categorization of ensemble selection methods [14], one of the simplest methods for static selection is ranking-based approaches which choose N-best performing classifiers or weighted N-best performing classifiers over the training data set [6,15]. Greedy approaches [16] are also used in the literature which add or remove from the ensemble iteratively aiming to increase the overall accuracy. Genetic algorithms are also popular for selecting and/or weighting local sources [17–19]. Clustering the input space into disjoint regions and dominating one local classifier in each region is adversely used in the literature such as [7,20]. The authors have used fuzzy logic earlier in [21] for weighting local sources in a multi-source decision fusion problem. They train a fuzzy rule base which approximates the reliabilities of sources over the input space. The estimated reliabilities are used as weights of local sources. In ensemble-based multiple classifier systems, different criteria such as incorrect predictions of local classifiers [22], diversity [15,23], independency [24] or combination of measures such as diversity together with sparsity [11,25]are usedto prune the ensemble and select a subset of classifiers. Some of these criteria are discussed in [26]. 2.2. Dynamic selection Using neighborhood of the pattern to estimate the accuracy of multiple classifiers, in notion of ‘local accuracy’ or ‘region of competence’ was first triggered by Woods et al. [5]asDCS- LA algorithm. This algorithm estimates the accuracy of each source in the vicinity of the test pattern using its K nearest neighbors. This algorithm was further extended in the literature. In [27] multiple classifier behavior or MCB is used to determine the K nearest neighbors of the test pattern. In [28], linear programming is used to weight sources in the neighborhood the test pattern. In [4], the competence of each source in the neighborhood is determined by comparing it to a random classifier. In [29], KNNE is used instead of KNN which selects the K nearest neighbors of each class separately. Cevikalp and Polikar [30] use quadratic programming to weight local sources in the neighborhood of the test pattern based on their accuracy. 112 F. FATEMIPOUR AND M. R. AKBARZADEH-T In [1], it is emphasized that using K nearest neighbors for defining local region of com- petence has several disadvantages. First of all, using the concept of neighborhood, it is assumed that the local accuracy of each source is constant in the region. Also, the relia- bility of the results is deeply dependent on the number of points in the neighborhood. In other words, the result depends on the shape (distance measure) and size (number of points) of the neighborhood. Some attempts are made to reduce these disadvantages. Bringing into consideration that using KNN, the final decision depends on the number of K, Zou et al. [31] proposed to add another phase for selecting a suitable number for K to the algorithm. This selection is performed using the margin error. KO et al. [32]alsopro- posed to reduce the number of neighbors considered in estimating local accuracy until at least one source is found that correctly classifies all the neighbors. Although these attempts have improved the performance of neighborhood-based methods, adding an extra phase for deciding about number of neighbors is time consuming and increases the complexity of the system. Ensemble-based methods also apply dynamic selection using different criteria for effec- tive selection of local sources. Calculating confidence measures rather than performance, Dos Santos et al. [8] proposed to select an ensemble with less ambiguity from a pool of ensembles which increases the degree of certainty of the final decision about the test pattern. Lysiak et al. [33] propose dynamic weighing of sources in which sources are first eliminated using diversity measure and then the remained sources are weighted using their accuracy over the entire data set. Considering both local accuracy and diversity, Giacinto and Roli [26] propose to select the most accurate classifiers in the vicinity of the test data together with the most diverse set of classifiers between them. Li et al. [34] consider error diversity measures to select from the initial pool of classifiers and then use local accuracy for selecting the final classifier. Swiderski et al. [35] propose to use the area under curve (AUC) of the receiver operating characteristic of each classifier as a measure to select a proper subset of classifiers. Nazemi et al. [36] use fuzzy logic for dynamically weighting ensemble mem- bers in loss given default modeling problem. They create fuzzy rule base using clustering methods and use it to weight trained regression sources dynamically. Several combination formulas are tested and compared in this paper. Ykhlef and Bouchaffra [37] consider game theory algorithms and solve the selection problem as a coalitional game. Dealing with vast amount of data, it is hard to find one local source which has exper- tise over the entire input space. Applying a selection phase before combination in order to use locally accurate sources in the vicinity of the input pattern leads to more precise results. Dynamic methods involve the input pattern in the process of selecting sources by defining a local region of competence in the neighborhood of the input pattern. Since selecting a weak source can significantly reduce the performance of the system, determin- ing this region directly effects the performance of the system [28]. Until now, the most used method for dynamically defining the local accuracies in the literature is K nearest neigh- bor method [1]. Dynamic methods so far are memory consuming and execute under a high computationally intensive process. Even though the presented dynamic selection methods so far have shown remarkable results, there is a notable lack of methodologies that present high accuracy of dynamic algorithms while avoiding the time-consuming process of find- ing neighbors and high memory consumption. In this paper, we aim to expand the limits of dynamic selection methods to better use their advantages in effective selection of local sources. FUZZY INFORMATION AND ENGINEERING 113 3. The proposed fuzzy algorithm for dynamically selecting local sources 3.1. The proposed FDSS algorithm Let {S , S , ... , S } be a set of local decision makers each learnt with n separate data sets. 1 2 n The feature vector x = [x , x , ... , x ] is presented to be labeled into one of m classes 1 2 d [c , c , ... , c ]. After all sources are trained, one separate data set called the validation set 1 2 m is labeled by all the local sources. The labeled data set is then used as the training set at the fusion center. Figure 2 shows this process. The proposed approach tries to generate a fuzzy rule base that contains useful information about the competence regions of local sources and their performances in order to dynamically assign a proper weight to each source for each given data. To this end, the algorithm searches for the local regions in the search space in which there is at least one decision maker with high performance. After the rule base is constructed, each rule defines a competence region and specifies an efficient deci- sion maker for that region. The proposed algorithm constructs the rule base iteratively and no pre-defined number of regions is necessary. The pseudo code of the algorithm under the name FDSS is shown in Figure 3. FDSS stands for Dynamic Selection Approach-Fuzzy Rule-Based System. In the following, the training and testing phases of the algorithm are explained in detail. Since the proposed algorithm constructs the rule base iteratively, its convergence is discussed at the end of this section. Figure 2. Training the local sources and the decision-making center with separate data sets. The decision-making center is also provided with outputs of local sources for the validation set. 114 F. FATEMIPOUR AND M. R. AKBARZADEH-T Figure 3. First FDSS proposed algorithm uses coverage and truth value as the terminating conditions in a recursive manner. 3.1.1. Training phase in the proposed FDSS algorithm Inspired by [38], each iteration of the algorithm at the training phase divides the feature space into two regions, aiming at finding local regions with powerful decision makers and assign a fuzzy rule to that region. The added rule is in the following form IF x is A THEN SELECT S with P . This rule specifies that, if x is in the region A, then the result of decision maker s would be A A correct with probability P . P is the conditional probability of correctness of s given each S S class. Selection algorithms are prone to overfitting [39,40]. Overfitting happens when the algorithm that selects the local sources fits the training data set so well that while being accurate for training data set, its selections fail to make proper decisions during the exe- cution. For the generated rules to present high reliability and to prevent overfitting, each region could be turned into a rule if the following conditions are met: (1) There exists at least one source with high correctness probability for the data in that region. (2) There exists a sufficient number of data in that region (for the sake of generalization ability of the final rule base). To evaluate the fulfillment of these conditions, we propose two measures, truth value and coverage, based on quality measures introduced in [13]. The proposed measures are described in the following. 3.1.2. Degree of truth value (T) T can be viewed as the rate of the data satisfying the consequent from those which satisfy the antecedent [9]. Since each rule in our problem specifies one of the local decision mak- ers that are suitable for making decision about the incoming data, the truth value of the rule depends on the performance of the specified decision maker. In other words, degree FUZZY INFORMATION AND ENGINEERING 115 of truth of each rule is related to the correctness degree of the specified decision maker. T increases as more data satisfying the antecedent part (located in the determined com- petence region) also satisfy the consequent part (are labeled correctly by the specified decision maker). Hence, we formulate T as follows, min(μ (x ), α , x ) A k S k k=1 T = .(1) μ (x ) A k k=1 In this formula, the numerator shows the membership of the data which the corre- sponding decision maker labels correctly. The formula computes the normalized sum of these membership values. As the above formula shows, we have considered the correctness probability of the decisions in the training data set. 3.1.3. Degree of coverage (V) Coverage value specifies the generalization ability of the rule. Degree of sufficient coverage or V describes if the rule is supported by enough data [9]. V increases as more data satisfy both antecedent and consequent parts. As V increases, the generality of the rule increases. Since each rule in our problem indicates a competence region and a decision maker, we consider its coverage value as the number of train data that the source labels correctly in the specified region, as shown in (2) α , x i k k=1 V = .(2) Two threshold values are initiated at the beginning of the algorithm for the truth and coverage values above. At each iteration, each divided region turns into a rule if there exists at least one source that its truth value for the corresponding data exceeds the desired threshold. If no sources satisfy this condition, the coverage measure is checked. When cov- erage is below the threshold for all of the sources, it means that further division of the space leads to rules that are not supported by enough train data, hence not efficient for the rule base. Therefore, the division is stopped. To increase the diversity of the sources included in the rule base, all sources with truth value higher than average of truth values in that region are added to the rule base. Then the algorithm breaks the chain of recursion. Otherwise, the process repeats by further dividing the area. A conditional probability vector in the following form is assigned to each rule in the final rule base: P = [P , ... , P , ... , P ], j = 1, ... , m, S 1 j m ij P = P(C |S ) = .(3) j j i β + β s ij ij This vector assigns a probability to source S per class. Whenever each rule uses a source to label data to class C , P indicates the probability that this decision is correct. In this way, j j for each new data, the correctness probabilities of decisions of sources are considered in the final decision further to its membership degree to local competence regions. 116 F. FATEMIPOUR AND M. R. AKBARZADEH-T At the end, the final constructed rule base includes R rules in the following form: 1 1 1 1 Rule :IF x is A THEN SELECT s with P , k 1 ··· R R r Rule :IF x is A THEN SELECT s with P k r Such that s ∈{S , ... , S }, k = 1, ... , r is one of the base decision makers. A is a vector 1 n th showing the center of β in the competence region that the r rule specifies. 3.1.4. Decision-making process in the proposed method After the training phase, an unseen pattern is received for being processed. The output of each rule suggests one source with a vector ω which is the result of multiplying the firing value of the rule and its conditional probability vector. The result of the rule base is a matrix W of size n ∗ m as below, ⎡ ⎤ ω ··· ω 1 1 s ,c s ,c 1 m ⎢ ⎥ ω ··· ω 2 2 s ,c s ,c ⎢ 1 m⎥ W = .(4) Rules ⎣ ⎦ ··· ω ··· ω R R s ,c s ,c 1 m W is then set to the element by element multiplication of itself and the matrix of results Rules of sources is, ⎡ ⎤ ⎡ ⎤ ω 1 ··· ω 1 O 1 ··· O 1 s ,x ,1 s ,x ,m s ,c s ,c 1 m k k ⎢ ⎥ ⎢ ⎥ ω 2 ··· ω 2 O 2 ··· O 2 s ,x ,1 s ,x ,m s ,c s ,c ⎢ 1 m⎥ ⎢ k k ⎥ W = ∗ .(5) Rules ⎣ ⎦ ⎣ ⎦ ··· ··· ω R ··· ω R O R ··· O R s ,c s ,c s ,x ,1 s ,x ,m 1 m k k After calculating (5), W is multiplied by the result of each source. O equals one s ,x ,j Rules if the corresponding source labels x to class j and zero otherwise. Therefore, W turns s ,c into zero if s labels the data to a class other than c . Then we compute the weight for each i j source per class as below, ⎡ ⎤ W ··· W s ,c s ,c 1 1 1 m ⎢ ⎥ W ··· W s ,c s ,c 2 1 2 m ⎢ ⎥ W = .(6) ⎣ ⎦ ··· W ··· W s ,c s ,c n 1 n m W specifies the weight of source S for class C which is computed as below, S ,C i j i j W = max r r.(7) s ,c i j r s =s (ω)s ,c | i j As Equation (7) shows, the determined weight of S for class C is the maximum output i j value of rules for that specific source and class. At the end, we compute the weight of each class as the maximum weight among all sources. If there exists one class which strongly dominates other classes, that class is selected as the final decision, as in [41]. Otherwise, we compute the weight of each class FUZZY INFORMATION AND ENGINEERING 117 Figure 4. The inference process at the decision-making center for new data. as the average of weights in the rule base as shown in (8). Then, the class with maximum weight is selected as the final decision, as below, (W r > mean W r ) s ,c r s ,c r=1 j j W = .(8) r r W > mean W s ,c r s ,c j j The process of making decision about the new data is shown in Figure 4. 3.1.5. Remarks on convergence of the algorithm Since the proposed algorithms run iteratively and recursively, we discuss their conver- gence here. The proposed algorithm continues dividing the regions until the following two conditions are met: (1) The local regions cover an appropriate number of data. (2) No source is accurate enough for the current region’s data. 118 F. FATEMIPOUR AND M. R. AKBARZADEH-T If there does not exist any source with sufficiently high accuracy for any region, the cover- age criteria breaks the chain of recursion and prevents the unlimited number of divisions. Therefore, the algorithm always converges and does not fall into an unlimited loop. 3.2. The proposed top-down FDSS algorithm One of the main disadvantages of dynamic approaches is their dependence on pre-defined parameters. Although FDSS has many advantages including less memory consumption, its performance depends on two parameter settings, minimum truth and minimum coverage threshold values. One idea to remove the parameters of the proposed FDSS algorithm is to construct the rule base in a top-down manner. The top-down algorithm starts with one rule over the entire data set. After this, the algorithm starts dividing the area recursively just like FDSS. The difference is that the maximum achieved conditional probability for each class so far by the added rules to the areas containing the current area is passed to each step. In each step, only those rules are added to the rule base that outperform the previous rules in terms of conditional probabilities for at least one class. In this way, the algorithm gives up searching for suitable rules which need suitability parameters to be previously defined, in favor of adding better rules in each step. We also omit the minimum coverage threshold. The FDSS divides the search space to find at least one suitable rule. In some cases, there might not exist any suitable source for the local regions. Therefore we define the coverage threshold to stop the algorithm from adding dysfunctional rules which are not supported by enough amounts of data. Two parameters are provided in the FDSS algorithm to prevent overfitting. The methodology of creating the rule base in top-down FDSS is immune to overfitting. When using top-down manner, we add a rule in each step only if it is better than the previously added ones. If in any situation one rule with very low coverage is added to the rule base, it does not lead to overfitting because of its very low variance and the fact that the algorithm has added efficient rules for this area in the previous steps. This lets us remove the coverage parame- ter. Newly added rules in each step focus on local regions where sources perform better in comparison to the previous more global areas. We should note that this algorithm provides a rule base with overlapping local regions of competence. It means that unlike the previous algorithm, each region might fit to more than one rule. Figure 5 shows the pseudo code of the proposed top-down FDSS algorithm. 4. Experimental results To evaluate the performance of our proposed method, we consider two sets of experi- ments. First we evaluate the accuracy of classification using homogeneous local sources on 14 benchmark data sets from UCI [42] and Keel [43] machine learning database repositories. Table 2 shows the main features of the selected data sets. The number of features in the selected data sets varies from 2 to 24, and number of instances varies from 569 to 19,020. After describing the experimental setup, classification results to homogeneous bench- marks are presented. Then we evaluate the proposed algorithm using heterogeneous local sources. Performance results and comparisons in this part are included in Section 4.2. Finally we present the memory consumption of the proposed method in Section 4.3. FUZZY INFORMATION AND ENGINEERING 119 Figure 5. Second proposed algorithm is performed in top-down manner. The coverage and truth value thresholds in FDSS are removed in this algorithm. Vector of conditional probabilities is set to zero in the first iteration. Table 2. Main specifications of the considered benchmark datasets. Dataset Source #Features #Instances #Classes Blood Transfusion TTransfusionTrTrTransfusion Keel 4 748 2 WBDC Keel 30 569 2 Balance UCI 4 625 3 Australian UCI 14 690 2 Breast cancer UCI 9 699 2 Pima UCI 8 768 2 Mammographic UCI 5 830 2 Vehicle UCI 3 846 4 Wine UCI 11 962 6 German UCI 24 1000 2 segmentation UCI 19 2310 7 Banana UCI 2 5300 2 Page Blocks UCI 10 5473 5 MAGIC UCI 10 19,020 2 The classification results are compared against five other selection approaches: (1) DCS-LA [5]: This algorithm defines the competence of each local source classifier as its local accuracy. The local accuracy is estimated using the k nearest neighbors of the test data. We choose k = 10 since this value of k represents the best performance according to [5]. (2) DCS-P and DCS-KL [4]: These two algorithms are dynamic approaches that use compar- ison to a random classifier as a measure of competence. These algorithms use k nearest neighbor method and we set k = 10 according to the paper’s results. (3) Single Best (SB): This approach selects the best performing source on training data set, This one source is used to make decision about all the test data. (4) FRBMCS [44]: This static approach trains a fuzzy rule base at the training phase that statically weights local sources. This method needs the outputs of local sources in the form of a vector of probabilities assigned to each class. 120 F. FATEMIPOUR AND M. R. AKBARZADEH-T 4.1. Experimental setup In order to simulate the distributed local sources where each source is trained by its own separate data set, we divide the training data randomly between sources with no overlap. Data for training and testing sets are extracted using fivefold cross validation. At each itera- tion, three folds are used for training sources, one fold as validation data set and one fold as testing data set. Therefore, the presented results are the average of 20 times running of the algorithm. Naïve Based is used as the local classifier and FCM clustering is used for dividing the space. The threshold values for truth value is set to 0.96 and for coverage value is set to 10/(#training data) based on experiments. In both homogeneous and heterogeneous tests, we apply the following process. At each iteration, we first train the local sources, each using its own dedicated set of training data. Then we compute the outputs of local sources for the validation data set and train the combination algorithm using the output values together with the validation set. The algorithm is then tested on the testing data set. 4.1.1. Test with homogeneous sources In this test, we use the same classifier and the same set of features for all the local sources. In order to evaluate the performance of the algorithm in dealing with unreliability of local sources, we increase the number of local sources that the training data will divide into. Since the data are divided with no overlap, the available training data for each source decreases as we increase the number of local sources. For 2–10 number of local sources, each source receives approximately 50%, 25%, 16%, 12% and 10% of the whole available train data, respectively. This leads to less reliability for local sources. The accuracy results averaged over the different number of sources is shown in Table 2. Figure 6 compares the accuracies of different algorithms for different number of sources. As this figure and Table 3 show, the proposed approaches perform better than other approaches in most cases. Increasing the number of local sources presents interesting results. Single best approach produces an average of 0.234 decrease in accuracy. This is because of the fact that decreasing the number of available train data leads to classifiers with lower performance. FRBMCS, DCS-P, SB, FDSS, DCS-LA, DCS-KL produce an average of −4.69, −3.09, 0.578, 0.054, −0.245, 0.035 values increase in accuracy. As the results show, increasing the number of local sources might lead to better results. This is because when number of local sources increases, the fusion center could select between more comple- mentary classifiers. While the overall performance of the proposed method is better than other methods according to Table 3, the change of accuracy is lower in the proposed method except for DCS-KL. On the other hand, DCS-KL shows less accuracy in predictions. In other words, the proposed method produces a smoother and more accurate perfor- mance by increasing the number of local sources. This is the result of the main feature of the proposed algorithm that facilitates the power of fuzzy logic in handling uncertainty. The Kolmogorov–Smirnov test is used for a deeper comparison of the two proposed algorithms. The Kolmogorov–Smirnov test is a nonparametric test that does not assume that data are sampled from any specific distribution. The confidence level for the null hypothesis rejection is considered as 5%. The p-values of the test are also shown in Table 3. As the test shows, the two proposed algorithms almost perform the same. The answer to the question of which algorithm is beneficial refers to the differences of the two algo- rithms. FDSS has two parameters that control the number of iterations of the algorithm. FUZZY INFORMATION AND ENGINEERING 121 Figure 6. Average results of fivefold cross validation on different data sets when train data is divided between 2, 4, 6, 8 and 10 homogeneous local sources separately for each approach. The larger the minimum coverage threshold or the smaller the minimum truth value thresh- old, the faster the algorithm’s convergence, and the smaller the rule base size. On the other hand, Top-down FDSS is parameter free. Although this is a significant advantage, but makes the algorithm continue searching to the deepest possible level and leads to more time-consuming process of training phase. The memory consumption of two algorithms is compared in Subsection 4.3. 4.2. Test with heterogeneous sources In this test, we use different sets of features for each local classifier, hence heterogeneous. For this purpose, we use Random Subspace [45] method to create the local data. The Ran- dom Subspace method uses different sets of features for each local classifier. Here we use ten local sources each trained with 50% of the features that are randomly selected. The rule base is trained with the whole set of features of the validation set. Table 4 shows average of fivefold cross validation results. For heterogeneous sources, we observe that the proposed approach works better than the others for six datasets. The average difference between FDSS and best of other approaches for all data sets is −0.7 and for top-down FDSS it is −1.14. FDSS also works better than top-down FDSS in eight data sets. The p-values of the test are also shown in Table 4. As the table shows, the two proposed algorithms maintains similar performance. 122 F. FATEMIPOUR AND M. R. AKBARZADEH-T Table 3. Classification accuracies obtained by each algorithm using homogeneous sources averaged over different number of local sources. Best results for each data set are bold. Last column shows the Kolmogorov–Smirnov test for the comparison of FDSS and top-down FDSS. Proposed methods DCS- KL DCS-P DCS-LA SB FRBMCS FDSS Top-down FDSS p-value Transfusion 74.80 ∓3.07 70.1 ∓5.73 75.648 ∓3.46 74.967 ∓3.26 75.977 ∓3.34 76.139 ∓3.01 76.36 ∓2.99 0.556 Banana 72.007 ∓1.6 65.53 ∓4.36 77.246 ∓1.98 72.246 ∓1.63 57.345 ∓3.62 79.084∓1.71 76.398 ∓2.03 2.95E-11 Page blocks 59.63 ∓1.58 92.905 ∓0.882 94.04 ∓0.668 93.515 ∓0.782 90.33 ∓1.23 94.172∓0.678 93.89 ∓0.678 0.139 Pima 69.921 ∓3.46 70.329 ∓3.71 73.316 ∓3.2 72.904 ∓3.27 66.319 ∓4.99 73.86 ∓3.43 74.226 ∓2.7 0.139 WDBC 74.753 ∓4.18 71.51 ∓9.4 93.78 ∓2.3 94.141∓2.48 65.938 ∓8.02 92.09 ∓6.7 93.968 ∓6.4 0.556 Vehicle 53.954 ∓3.8 50.548 ∓4.5 60.386 ∓3.8 56.704 ∓3.44 43.950 ∓9.49 61.56∓5.3 61.454 ∓5.5 0.556 Mammographi c 79.887 ∓3.87 70.7117 ∓8.18 79.05 ∓3.4 80.254 ∓3.6 62.185 ∓11.32 81.15∓4.02 80.267 ∓3.97 0.193 Breast cancer 94.43 ∓2.311 75.099 ∓6.46 94.459 ∓2.33 93.989 ∓2.66 66.318 ∓19.71 94.74∓7.08 94.71 ∓6.8 0.961 Wine 56.952∓2.99 56.56 ∓3.2 54.83 ∓2.97 54.72 ∓3.16 47.108 ∓6.13 55.82 ∓3.17 56.189 ∓2.97 0.443 Australian 50.259 ∓4.54 61.518 ∓4.98 66.73 ∓4.63 67.186 ∓4.72 55.57 ∓5.99 70.105∓5.29 67.109 ∓5.03 0.961 Balance 87.96∓3.08 76.384 ∓6.62 86.848 ∓3.23 85.288 ∓3.70 62.32 ∓17.55 87.27 ∓3.15 87.928 ∓2.71 0.556 German 70.279 ∓3.59 69.39 ∓3.9 70.805∓3.51 70.526 ∓3.28 70.184 ∓3.53 70.37 ∓3.51 70.730 ∓3.41 0.961 Magic 71.136 ∓0.78 72.236 ∓0.74 76.582 ∓0.88 76.612∓0.85 66.466 ∓2.5 76.27 ∓0.67 76.259 ∓0.73 0.99 Segmentation 80.2909 ∓1.82 50.734 ∓5.54 88.357∓1.74 86.1098 ∓1.92 70.512 ∓8.88 87.77 ∓1.49 87.5623 ∓1.72 0.556 FUZZY INFORMATION AND ENGINEERING 123 Table 4. Classification accuracy obtained by each algorithm using heterogeneous sources. Best results for each data set are bold. Last column shows the Kolmogorov–Smirnov test for the comparison of FDSS and top-down FDSS. Proposed methods DCS- KL DCS-P DCS-LA SB FRBMCS FDSS Top-down FDSS p-value Transfusion 74.53 ∓2.93 74.55 ∓2.63 75.76 ∓2.48 75.29 ∓3.41 76.83∓2.72 76.49 ∓2.58 75.9 ∓2.61 0.77 Banana 677.69 ∓4.24 66.05 ∓3.3 80.57 ∓1.29 70.22 ∓1.31 59.82 ∓5.07 82.08∓1.25 79.31 ∓1.54 2.4894e-07 Page blocks 60.92 ∓1.85 94.08 ∓1.09 95.02 ∓0.64 95.07∓0.77 91.75 ∓1.45 94.65 ∓0.81 94.64 ∓0.63 0.49734 Pima 68.18 ∓4.5 71.60 ∓4.12 72.42 ∓3.98 75.00∓3.82 67.80 ∓4.58 73.33 ∓4.13 74.50 ∓4.26 0.49734 WDBC 75.00 ∓4.98 70.45 ∓11.89 94.03 ∓1.96 94.15 ∓2.32 76.40 ∓15.74 94.42∓2.08 94.35 ∓1.83 0.96548 Vehicle 53.87 ∓4.56 50.54 ∓5.34 61.78 ∓3.62 61.36 ∓3.18 51.91 ∓9.32 62.64∓3.79 62.353 ∓3.73 0.96548 Mammographic 73.86 ∓8.51 70.44 ∓7.43 77.32 ∓3.15 78.22 ∓2.99 65.77 ∓13.35 80.63∓3.04 78.73 ∓4.2 0.49734 Breast cancer 93.98 ∓2.08 71.29 ∓3.61 93.86 ∓3.42 94.08 ∓2.18 72.80 ∓22.48 94.64 ∓2.51 94.87 ∓2.11 0.96548 Wine 56.70∓2.89 56.20 ∓3.14 55.82 ∓2.63 56.56 ∓2.60 52.06 ∓5.75 55.59 ∓3.31 56.62 ∓2.89 0.77095 Australian 51.52 ∓4.90 69.95 ∓4.49 79.15∓3.17 82.20 ∓3.84 63.16 ∓8.75 73.39 ∓6.43 72.74 ∓5.90 0.96548 Balance 76.28 ∓4.52 70.4 ∓5.28 83.08 ∓3.64 68.6 ∓3.59 67.04 ∓9.46 83.72∓3.96 81.76 ∓4.08 0.059142 German 70.14 ∓3.03 68.51 ∓3.06 70.53∓3.56 70.51 ∓3.40 70.36 ∓3.39 70.16 ∓3.36 70.31 ∓3.31 0.96548 Magic 70.14 ∓9.64 68.51 ∓3.88 70.53∓10.36 70.51 ∓10.32 70.36 ∓7.01 70.16 ∓3.21 70.32 ∓2.38 0.98 Segmentation 78.72 ∓2.97 53.39 ∓9.4 90.54∓1.3 89.88 ∓3.04 81.09 ∓6.10 88.74 ∓6.02 89.31 ∓6.15 0.49734 124 F. FATEMIPOUR AND M. R. AKBARZADEH-T Table 5. Average number of generated rules over different number of local sources in the proposed approaches. Data set FDSS Top-down FDSS Transfusion 17.77 41.88 Banana 153.44 92.99 Page Blocks 76.68 58.43 Pima 40.65 22.39 WDBC 18.21 9.88 Vehicle 39.88 18.96 Mammographic 45.04 21.66 Breast Cancer 12.84 9.04 Wine 41.72 33.16 Australian 43.15 19.46 Balance 31.63 5.91 German 45.63 34.75 Magic 619.37 751.45 Segmentation 86.1 27.6 Table 6. Average runtime of the different dynamic algorithms for decid- ing about a single data in milliseconds. Best result for each dataset is bold. Proposed methods DCS-KL DCS-P DCS-LA FDSS Top-down FDSS Transfusion 0.110 0.446 0.356 0.032 0.041 Banana 0.836 5.46 5.064 0.3 0.19 Page blocks 0.935 6.455 5.717 0.390 0.221 Pima 0.110 0.434 0.345 0.0398 0.012 WDBC 0.101 0.296 0.295 0.029 0.15 Vehicle 0.150 0.474 0.366 0.068 0.038 Mammographi c 0.124 0.520 0.426 0.041 0.029 Breast cancer 0.070 0.25 0.209 0.020 0.002 Wine 0.144 0.484 0.360 0.087 0.054 Australian 0.110 0.371 0.290 0.051 0.012 Balance 0.087 0.355 0.298 0.031 0.12 German 0.109 0.541 0.405 0.071 0.43 Magic 4.92 62.43 59.45 4.40 4.9 Segmentation 0.439 1.301 0.941 0.222 0.078 4.3. Memory consumption and runtime Mean number of generated rules in the proposed method is shown in Table 5 in order to evaluate the proposed methods in terms of memory consumption. The number of gener- ated rules relates to the number of leaves in the tree of dividing the input space. In most cases, top-down FDSS inherently produces fewer rules than FDSS. Since the top-down man- ner prefers the more global rules to local rules and each iteration only adds better rules, the final rule base is smaller than FDSS. Comparing with storing all the training data in mem- ory, FDSS presents 96.12% and top-down FDSS presents 97.5% improvement in memory. We see from the results that the required memory for storing the produced rule base is much less than the required memory for keeping the complete data. This is while Tables 3–5 demonstrate that it is able to achieve the same or even higher classification performance with lower consumed memory. FUZZY INFORMATION AND ENGINEERING 125 Average runtime for one single source is compared in Table 6 for dynamic algorithms. The algorithms are implemented in MATLAB and runtimes are calculated using Intel(R) Core(TM) i3, with 4 GB RAM using Windows 7. As the table shows, runtime of the proposed algorithms are significantly less than other dynamic algorithms. The reason is that proposed algorithm does not need to search for a specific number of neighbors for the input data which is very time consuming. The runtime for top-down FDSS is generally less than FDSS since the average number of rules generated by top-down FDSS is generally less than FDSS as shown in Table 5. 5. Conclusion In this paper, we propose two algorithms based on fuzzy logic for dynamic source selection in decision fusion systems. The first algorithm works in a recursive manner using specifi- cations of fuzzy rules including truth and coverage values to construct the rule base. The second algorithm works independently from these two parameters. We find that by com- pressing the knowledge extracted from the training data set into a single fuzzy rule base, we can achieve similar or better performance while storing much less data in memory. The proposed approach can then be regarded as an alternative for neighborhood-based approaches especially when data size is large, memory usage is limited and process speed is important. There are several directions for the extending the proposed approach. First, the pro- posed approach divides data into two clusters at each iteration; whereas it may be useful to choose an appropriate number of clusters at each iteration based on the available data. Second, it would be desirable to cluster data not only by distance measures but also con- sider the accuracies of different sources in order to find the optimal division that leads to the best set of possible rules. Finally, a post-processing phase can be considered that removes or merges rules in order to have a more efficient rule base. Disclosure statement No potential conflict of interest was reported by the authors. ORCID M. R. Akbarzadeh-T http://orcid.org/0000-0001-5626-5559 References [1] Britto AS Jr, Sabourin R, Oliveira LE. Dynamic selection of classifiers – a comprehensive review. Pattern Recognit. 2014;47(11):3665–3680. [2] Prodromidis A, Chan P, Stolfo S. Meta-learning in distributed data mining systems: issues and approaches. Adv Distrib Parallel Knowl Discov. 2000;3:81–113. [3] Zhang L, Zhou W-D. Sparse ensembles using weighted combination methods based on linear programming. Pattern Recognit. 2011;44(1):97–106. [4] Woloszynski T, Kurzynski M, Podsiadlo P, et al. A measure of competence based on random classification for dynamic ensemble selection. Inf Fusion. 2012;13(3):207–213. [5] Woods K, Kegelmeyer WP, Bowyer K. Combination of multiple classifiers using local accuracy estimates. IEEE Trans Pattern Anal Mach Intell. 1997;19(4):405–410. 126 F. FATEMIPOUR AND M. R. AKBARZADEH-T [6] Jurek A, Bi Y, Wu S, et al. A survey of commonly used ensemble-based classification techniques. Knowl Eng Rev. 2014;29:551–581. [7] Liu R, Yuan B. Multiple classifiers combination by clustering and selection. Inf Fusion. 2001;2(3):163–168. [8] Dos Santos EM, Sabourin R, Maupin P. A dynamic overproduce-and-choose strategy for the selection of classifier ensembles. Pattern Recognit. 2008;41(10):2993–3009. [9] Wu D, Mendel JM. Linguistic summarization using IF–THEN rules and interval type-2 fuzzy sets. IEEE Trans Fuzzy Syst. 2011;19(1):136–151. [10] Didaci L, Giacinto G. Dynamic classifier selection by adaptive k-nearest-neighbourhood rule. In: Roli F, Kittler J, Windeatt T, editors. Multiple classifier systems. MCS 2004. Berlin: Springer; 2004. (Lecture notes in computer science; vol. 3077). [11] Yin XC, Huang K, Hao HW, et al. A novel classifier ensemble method with sparsity and diversity. Neurocomputing. 2014;134:214–221. [12] Schapire RE. The strength of weak learnability. Mach Learn. 1990;5(2):197–227. [13] Breiman L. Bagging predictors. Mach Learn. 1996;24(2):123–140. [14] Tsoumakas G, Partalas I, Vlahavas I. An ensemble pruning primer, In: Okun O, Valentini G, editors. Applications of supervised and unsupervised ensemble methods. Berlin, Heidelberg: Springer; 2009. p. 1–13. [15] Ruta D, Gabrys B. Classifier selection for majority voting. Inf Fusion. 2005;6(1):63–81. [16] Abdelazeem S. A greedy approach for building classification cascades. Machine Learning and Applications, 2008. ICMLA’08. Seventh International Conference; Dec 11; San Diego, CA, USA. Washington (DC): IEEE; 2008. p. 115–120. [17] Kuncheva LI, Jain LC. Designing classifier fusion systems by genetic algorithms. IEEE Trans Evol Comput. 2000;4(4):327–336. [18] Lam L, Suen CY. Optimal combinations of pattern classifiers. Pattern Recognit Lett. 1995;16(9):945 –954. [19] Sirlantzis K, Fairhurst MC, Hoque MS. Genetic algorithms for multi-classifier system configuration: a case study in character recognition. International Workshop on Multiple Classifier Systems. Berlin: Springer; 2001. [20] Parvin H, MirnabiBaboli M, Alinejad-Rokny H. Proposing a classifier ensemble framework based on classifier selection and decision tree. Eng Appl Artif Intell. 2015;37:34–42. [21] Fatemipour F, Akbarzadeh-T MR, Ghasempour R. A new fuzzy approach for multi-source decision fusion. Fuzzy Systems (FUZZ-IEEE), 2014 IEEE International Conference; Jul 6; Beijing, China. IEEE; 2014. p. 2238–2243. [22] Burduk R, Walkowiak K. Static classifier selection with interval weights of base classifiers. Asian Conference on Intelligent Information and Database Systems; Mar 23. Cham: Springe; 2015. p. 494–502. [23] Brown G, Wyatt J, Harris R, et al. Diversity creation methods: a survey and categorisation. Inf Fusion. 2005;6(1):5–20. [24] Kuncheva LI, Whitaker CJ, Shipp CA, et al. Is independence good for combining classifiers? Pat- tern Recognition, 2000. Proceedings 15th International Conference; Vol. 2; Barcelona, Spain. IEEE; 2000. p. 168–171. [25] Yin X-C, Huang K, Yang C, et al. Convex ensemble learning with sparsity and diversity. Inf Fusion. 2014;20:49–59. [26] Aksela M. Comparison of classifier selection methods for improving committee performance. International Workshop on Multiple Classifier Systems; Jun 11. Berlin: Springer; 2003. p. 84–93. [27] Giacinto G, Roli F. Dynamic classifier selection based on multiple classifier behaviour. Pattern Recognit. 2001;34(9):1879–1881. [28] Didaci L, Giacinto G, Roli F, et al. A study on the performances of dynamic classifier selection based on local accuracy estimation. Pattern Recognit. 2005;38(11):2188–2191. [29] Mendialdua I, Martínez-Otzeta JM, Rodriguez-Rodriguez I, et al. Dynamic selection of the best base classifier in one versus one. Knowl Based Syst. 2015;85:298–306. [30] Cevikalp H, Polikar R. Local classifier weighting by quadratic programming. IEEE Trans Neural Netw. 2008;19(10):1832–1838. FUZZY INFORMATION AND ENGINEERING 127 [31] Li L, Zou B, Hu Q, et al. Dynamic classifier ensemble using classification confidence. Neurocom- puting. 2013;99:581–591. [32] Ko AH, Sabourin R, Britto Jr AS. From dynamic classifier selection to dynamic ensemble selection. Pattern Recognit. 2008;41(5):1718–1731. [33] Lysiak R, Kurzynski M, Woloszynski T. Probabilistic approach to the dynamic ensemble selection using measures of competence and diversity of base classifiers, In: Corchado E, editor. Hybrid artificial intelligent systems. Berlin: Springer; 2011. p. 229–236. [34] Li S, Zheng Z, Wang Y, et al. A new hyperspectral band selection and classification framework based on combining multiple classifiers. Pattern Recognit Lett. 2016;83:152–159. [35] Swiderski B, Osowski S, Kruk M, et al. Aggregation of classifiers ensemble using local discrimina- tory power and quantiles. Expert Syst Appl. 2016;46:316–323. [36] Nazemi A, Fatemi Pour F, Heidenreich K, and Fabozzi F. Fuzzy decision fusion approach for loss- given-default modeling. Eur J Oper Res. 2017;262(2):780–791. [37] Ykhlef H, Bouchaffra D. An efficient ensemble pruning approach based on simple coalitional games. Inf Fusion. 2017;34:28–42. [38] Lee HE, Park KH, Bien ZZ. Iterative fuzzy clustering algorithm with supervision to construct probabilistic fuzzy rule base from numerical data. IEEE Trans Fuzzy Syst. 2008;16(1):263–277. [39] Dos Santos EM, Sabourin R, Maupin P. Overfitting cautious selection of classifier ensembles with genetic algorithms. Inf Fusion. 2009;10(2):150–162. [40] Tsymbal A, Pechenizkiy M, Cunningham P. Sequential genetic search for ensemble feature selec- tion. Proceedings of the 19th International Joint Conference on Artificial intelligence; Edinburgh, Scotland; 2005. p. 877–882. [41] Kuncheva LI. Switching between selection and fusion in combining classifiers: an experiment. IEEE Trans Syst Man Cybern Part B Cybern. 2002;32(2):146–156. [42] Blake C, Merz CJ. {UCI} Repository of machine learning databases. Irvine (CA): Department of Information and Computer Science, University of California; 1998. [43] Alcalá J, Fernández A, Luengo J. Keel data-mining software tool: data set repository, inte- gration of algorithms and experimental analysis framework. J Mult Valued Log Soft Comput. 2010;17:255–287. [44] Trawinski K, Cordon O, Sanchez L, et al. A genetic fuzzy linguistic combination method for fuzzy rule-based multiclassifiers. IEEE Trans Fuzzy Syst. 2013;21(5):950–965. [45] Schapire RE, Freund Y, Bartlett P, et al. Boosting the margin: a new explanation for the effective- ness of voting methods. Ann Stat. 1998;26(5):1651–1686. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Fuzzy Information and Engineering Taylor & Francis

Dynamic Fuzzy Rule-based Source Selection in Distributed Decision Fusion Systems

Dynamic Fuzzy Rule-based Source Selection in Distributed Decision Fusion Systems

Abstract

A key challenge in decision fusion systems is to determine the best performing combination of local decision makers. This selection process can be performed statically at the training phase or dynamically at the execution phase, taking into consideration various features of the data being processed. Dynamic algorithms for the selection of competent sources are generally more accurate, but they are also computationally more intensive and require more memory. In this research, we propose a...
Loading next page...
 
/lp/taylor-francis/dynamic-fuzzy-rule-based-source-selection-in-distributed-decision-ytSkYpVh7S
Publisher
Taylor & Francis
Copyright
© 2018 The Author(s). Published by Taylor & Francis Group on behalf of the Fuzzy Information and Engineering Branch of the Operations Research Society of China & Operations Research Society of Guangdong Province.
ISSN
1616-8666
eISSN
1616-8658
DOI
10.1080/16168658.2018.1509524
Publisher site
See Article on Publisher Site

Abstract

FUZZY INFORMATION AND ENGINEERING 2018, VOL. 10, NO. 1, 107–127 https://doi.org/10.1080/16168658.2018.1509524 Dynamic Fuzzy Rule-based Source Selection in Distributed Decision Fusion Systems F. Fatemipour and M. R. Akbarzadeh-T Department of Computer Engineering, Center of Excellence on Soft Computing and Intelligent Information Processing, Ferdowsi University of Mashhad, Mashhad, Iran ABSTRACT ARTICLE HISTORY Received 16 January 2018 A key challenge in decision fusion systems is to determine the best Revised 15 March 2018 performing combination of local decision makers. This selection pro- Accepted 19 April 2018 cess can be performed statically at the training phase or dynamically at the execution phase, taking into consideration various features KEYWORDS of the data being processed. Dynamic algorithms for the selection Fuzzy linguistic rule base; of competent sources are generally more accurate, but they are distributed decision-making; also computationally more intensive and require more memory. In dynamic classifier selection; classifier selection; decision this research, we propose a fuzzy rule-based approach for dynamic fusion source selection (FDSS) that compresses the knowledge from local sources using a divide-and-conquer strategy along with the basic concepts of coverage and truth value criteria, leading to less memory requirement and faster processing. A top-down approach to FDSS is then used to reach a parameter-free algorithm, i.e. one that avoids the restrictive parameters/threshold settings of FDSS. The rule bases in both approaches are created recursively and use the conditional probabilities of each class’s correctness as the rule’s weight. The pro- posed approaches are compared against several competing dynamic classifier selection methods based on local accuracy. Results indi- cate that the proposed fuzzy rule structures are generally faster and require less memory, while they also lead to more accurate decisions from the uncertain decisions from multiple sources. 1. Introduction Nowadays, data are produced at staggering rates due to the widespread communication and sensor technologies around the world. While the availability of data is an enormous opportunity for making better decisions, storing and processing it presents a great chal- lenge. A reasonable approach here could be to keep only the most relevant and to process only the most appropriate. If so, data could be gainfully transformed to information and subsequently processed to knowledge and ultimately to wisdom; with wisdom here being defined as the essence of what is most relevant, widely appropriate and most lasting. However, it is non-trivial to define the concepts ‘relevance’, ‘appropriateness’, and ‘last- ing’. What complicates this process further is when these data are not in one place, which is a common occurrence due to the distributed nature of the available data today. These CONTACT F. Fatemipour farnoosh.fatemipour@stu-mail.um.ac.ir © 2018 The Author(s). Published by Taylor & Francis Group on behalf of the Fuzzy Information and Engineering Branch of the Operations Research Society of China & Operations Research Society of Guangdong Province. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/ by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. 108 F. FATEMIPOUR AND M. R. AKBARZADEH-T Figure 1. A simple visualization of a decision-making center that is connected to four information sources where each information source is connected to a different data source. data are often provided from different sources and contain different aspects of information with concerns for their privacy. Besides, it is virtually impossible to aggregate all data in one place for processing and mining, due to databases’ high volume, variety and velocity. Accordingly, handling vast data sets and developing distributed data mining algorithms that analyze and summarize distributed data into usable knowledge is highly challeng- ing. In this process, decision fusion is a key methodology for making decisions with vast amounts of data located at different sources. The main concerns in developing such decision-making system is handling the inconsistency and uncertainty of decisions, dif- fering types of local sources and competency levels, and the variety of decision fusion strategies. Decision fusion is the process of combining the decisions made by multiple informa- tion sources. This kind of systems can be either human experts, classifiers, regressors, or any other kind of decision makers. A simple visualization of a decision fusion center is depicted in Figure 1. Decision fusion approaches let each source make its own decision with its local information. The main purpose of such approaches is to make effective use of the disparate local sites without direct access, in order to preserve their privacy. Three main phases typically included in fusion-based decision-making systems are generating local sources, pruning them and fusing their decisions [1]. Decision fusion has many advan- tages for data mining including the possibility of using a set of low cost computers to train a bunch of algorithms with subsets of the whole training data that fit in their main memory. In this way, it would also be possible to apply algorithms on very large data sets in a feasible time. There are a number of challenges in distributed decision fusion. First of all, the local systems are individually trained by the separately collected data obtained from each site. Based on the specifics of gathering information such as data location and time, these local FUZZY INFORMATION AND ENGINEERING 109 data may not cover the entire feature space. These differences also lead to different views about the data. Moreover, inconsistencies commonly exist in a group of separately made decisions from different views and localities. As a result, some sources may be inefficient and reduce the performance of the system [2]. Too many of these poor sources can suppress correct predictions of good sources [3]. For this reason, some local sources are eliminated before fusion and the final decision is made by using a selected subset of sources [4]. In this paper, we focus on effective selection of sources. The current approaches for selecting local sources in the literature generally falls into two categories: static and dynamic. Static approaches apply a region of competence for each source in the feature space at the training phase, while dynamic approaches apply the com- petence regions during the execution phase and determine the best sources using training data considering the data being analyzed. In other words, the accuracy of each source is estimated in a local area surrounding the data named ‘region of competence’ or ‘local accu- racy’ [5]. In general, dynamic approaches are more accurate than static approaches [6]. The most common strategy for dynamic approaches in the literature so far uses neighborhood of the data [1], which requires keeping all training data in memory and searching them for each test data in order to find its neighbors. Determining a specific number of neigh- bors for each data is very time consuming [7] and the final decision depends on the size and shape of the neighborhood [1]. Therefore, dynamic approaches require larger memory and are more computationally intensive compared to static methods [7,8]. This is partic- ularly so in applications with larger scale and higher computational complexity [1,8]; and as a result, their applicability is often criticized [1] and their benefits have remained out of reach. The objective of the present work is to use fuzzy logic to compress the knowledge about the expertise of sources into a single rule base that fits well in memory while also provid- ing fast and accurate results. Specifically, we propose two algorithms which extract useful knowledge about the competence regions and performances of local sources and store this knowledge in the form of a fuzzy rule base. Upon arrival of new data, both algorithms gradually partition the input space and define one or more rules for each partition. Each rule in the final rule base assigns a competence region to a local source. Since the firing degree of each rule depends on the location of the data, the contribution of each source in the final decision is determined based on the data at hand. Hence, the above methods are categorized as dynamic selection methods. The first proposed Fuzzy Distributed Source Selection (FDSS) algorithm dynamically weights the local sources for each new data using a rule-based system which represents the priori stored knowledge about the competence regions of sources. This rule-based sys- tem is constructed using the previously available data in the fusion center called validation set. Divide and conquer strategy is used to recursively partition the input space and deter- mine the competence regions of the local sources to construct the rule base. Inspired by fuzzy rule measures and linguistic summarization concepts, truth value and coverage [9], we propose two measures for defining the conditions of recursion termination. We pre- vent the generation of redundant rules by defining two appropriate thresholds for these two measures. In order to control the contribution of each source in the final decision, we also provide the rule base with an extra parameter that controls the firing degrees of rules. We use the conditional probability of correctness of each rule for each class estimated over the validation set and create the rule base in a probabilistic form. The second proposed 110 F. FATEMIPOUR AND M. R. AKBARZADEH-T Table 1. List of symbols. x Vector of input data, x = [x , ... , x ]; 1 d L ; Label of x; Output of source S for data x ; i k O = [O , ... , O , ... , O ]; S ,x S ,x ,1 S ,x ,j S ,x ,m i i i i k k k k s ,x i k 1, S labels x to class j, i k O = j = 1, ... , m S ,x ,j i k 0, o.w., 1, S labels x correctly, i k S ,x i k 0, Otherwise. β {x |α = 1&L = C }, ij k S ,x x j i k k β {x |α = 0&L = C }, ij k S ,x x j i k k S Sourcei, i = 1, ... , n, T Truth value of S , i i V Coverage of S , i i C Class j, j = 1, ... , m, |∗| Cardinality of set *, ¯ ¯ Vector of conditional probabilities, P = [P , ... , P , ... , P ], P = P(C |S), j = 1, ... , m, S S 1 j m j j i i R Number of rules in fuzzy rule-based system, A Average of B . Used as center of the corresponding rule. i i algorithm, top-down FDSS, differs from FDSS by omitting the pre-defined threshold val- ues while constructing the rule base in a top-down manner. This methodology does not require the pre-defined parameters and conditions of the first method and hence is no longer affected by the threshold settings. Training and storing a fuzzy rule-based system for weighting and selecting sources carries several benefits. Using fuzzy logic, we combine the merits of dynamic selection methods with those of static methods. While less memory is typically required for storing the rule-based system than storing the complete data set, we also omit the computation- ally intensive process of searching the whole training data set to find the nearest neighbors of each test data. Avoiding the risk of defining a single region of competence for each source and let- ting each source contribute to the estimation of the accuracies according to its adja- cency to the data results in avoiding the problems arising from crisp boundaries [10]. Using fuzzy logic also better enables coping with uncertainties that often exist in the outputs of local sources. The proposed approaches also benefit from the interpretability and representation ability of fuzzy rule-based systems that make it suitable for real-life applications. The rest of this paper is organized as follows. For simplification, symbols used in the paper are shown in Table 1. In Section 2, we review the current algorithms in the literature for selecting sources in fusion systems. In Section 3, the details of the proposed approaches are explained. Experimental results are included in Section 4. Finally, Section 5 concludes the paper. 2. Literature review Static and dynamic methods are two main categories of source selection strategies in deci- sion fusion systems. The first category or static selection, selects the best decision maker or decision makers at the training phase. Second category or dynamic selection applies the FUZZY INFORMATION AND ENGINEERING 111 selection process at the testing phase involving the input data. Yin et al. [11] classifies deci- sion combination methods into two main groups. The first group aims to train and combine an ensemble of classifiers during the learning process, e.g. Boosting [12] and Bagging [13]. Selection phase of this group in the literature commonly appears under different names, such as ensemble pruning, ensemble selection or ensemble thinning [14]. The second group combines the results of multiple available decision makers to solve the targeted problem. This group often trains a meta-learner to combine the component decision makers intelli- gently [11]. Although this paper focuses on the second group, we explore current literature in both groups in the viewpoints of source selection strategies. The rest of this section is dedicated to a brief literature review on each category, emphasizing dynamic methods which this paper focuses on. 2.1. Static selection The selection phase in static selection methods aims to find the subset of classifiers with optimal accuracy [8]. Different methods have been proposed in the literature to select one subset of classifiers. Based on the categorization of ensemble selection methods [14], one of the simplest methods for static selection is ranking-based approaches which choose N-best performing classifiers or weighted N-best performing classifiers over the training data set [6,15]. Greedy approaches [16] are also used in the literature which add or remove from the ensemble iteratively aiming to increase the overall accuracy. Genetic algorithms are also popular for selecting and/or weighting local sources [17–19]. Clustering the input space into disjoint regions and dominating one local classifier in each region is adversely used in the literature such as [7,20]. The authors have used fuzzy logic earlier in [21] for weighting local sources in a multi-source decision fusion problem. They train a fuzzy rule base which approximates the reliabilities of sources over the input space. The estimated reliabilities are used as weights of local sources. In ensemble-based multiple classifier systems, different criteria such as incorrect predictions of local classifiers [22], diversity [15,23], independency [24] or combination of measures such as diversity together with sparsity [11,25]are usedto prune the ensemble and select a subset of classifiers. Some of these criteria are discussed in [26]. 2.2. Dynamic selection Using neighborhood of the pattern to estimate the accuracy of multiple classifiers, in notion of ‘local accuracy’ or ‘region of competence’ was first triggered by Woods et al. [5]asDCS- LA algorithm. This algorithm estimates the accuracy of each source in the vicinity of the test pattern using its K nearest neighbors. This algorithm was further extended in the literature. In [27] multiple classifier behavior or MCB is used to determine the K nearest neighbors of the test pattern. In [28], linear programming is used to weight sources in the neighborhood the test pattern. In [4], the competence of each source in the neighborhood is determined by comparing it to a random classifier. In [29], KNNE is used instead of KNN which selects the K nearest neighbors of each class separately. Cevikalp and Polikar [30] use quadratic programming to weight local sources in the neighborhood of the test pattern based on their accuracy. 112 F. FATEMIPOUR AND M. R. AKBARZADEH-T In [1], it is emphasized that using K nearest neighbors for defining local region of com- petence has several disadvantages. First of all, using the concept of neighborhood, it is assumed that the local accuracy of each source is constant in the region. Also, the relia- bility of the results is deeply dependent on the number of points in the neighborhood. In other words, the result depends on the shape (distance measure) and size (number of points) of the neighborhood. Some attempts are made to reduce these disadvantages. Bringing into consideration that using KNN, the final decision depends on the number of K, Zou et al. [31] proposed to add another phase for selecting a suitable number for K to the algorithm. This selection is performed using the margin error. KO et al. [32]alsopro- posed to reduce the number of neighbors considered in estimating local accuracy until at least one source is found that correctly classifies all the neighbors. Although these attempts have improved the performance of neighborhood-based methods, adding an extra phase for deciding about number of neighbors is time consuming and increases the complexity of the system. Ensemble-based methods also apply dynamic selection using different criteria for effec- tive selection of local sources. Calculating confidence measures rather than performance, Dos Santos et al. [8] proposed to select an ensemble with less ambiguity from a pool of ensembles which increases the degree of certainty of the final decision about the test pattern. Lysiak et al. [33] propose dynamic weighing of sources in which sources are first eliminated using diversity measure and then the remained sources are weighted using their accuracy over the entire data set. Considering both local accuracy and diversity, Giacinto and Roli [26] propose to select the most accurate classifiers in the vicinity of the test data together with the most diverse set of classifiers between them. Li et al. [34] consider error diversity measures to select from the initial pool of classifiers and then use local accuracy for selecting the final classifier. Swiderski et al. [35] propose to use the area under curve (AUC) of the receiver operating characteristic of each classifier as a measure to select a proper subset of classifiers. Nazemi et al. [36] use fuzzy logic for dynamically weighting ensemble mem- bers in loss given default modeling problem. They create fuzzy rule base using clustering methods and use it to weight trained regression sources dynamically. Several combination formulas are tested and compared in this paper. Ykhlef and Bouchaffra [37] consider game theory algorithms and solve the selection problem as a coalitional game. Dealing with vast amount of data, it is hard to find one local source which has exper- tise over the entire input space. Applying a selection phase before combination in order to use locally accurate sources in the vicinity of the input pattern leads to more precise results. Dynamic methods involve the input pattern in the process of selecting sources by defining a local region of competence in the neighborhood of the input pattern. Since selecting a weak source can significantly reduce the performance of the system, determin- ing this region directly effects the performance of the system [28]. Until now, the most used method for dynamically defining the local accuracies in the literature is K nearest neigh- bor method [1]. Dynamic methods so far are memory consuming and execute under a high computationally intensive process. Even though the presented dynamic selection methods so far have shown remarkable results, there is a notable lack of methodologies that present high accuracy of dynamic algorithms while avoiding the time-consuming process of find- ing neighbors and high memory consumption. In this paper, we aim to expand the limits of dynamic selection methods to better use their advantages in effective selection of local sources. FUZZY INFORMATION AND ENGINEERING 113 3. The proposed fuzzy algorithm for dynamically selecting local sources 3.1. The proposed FDSS algorithm Let {S , S , ... , S } be a set of local decision makers each learnt with n separate data sets. 1 2 n The feature vector x = [x , x , ... , x ] is presented to be labeled into one of m classes 1 2 d [c , c , ... , c ]. After all sources are trained, one separate data set called the validation set 1 2 m is labeled by all the local sources. The labeled data set is then used as the training set at the fusion center. Figure 2 shows this process. The proposed approach tries to generate a fuzzy rule base that contains useful information about the competence regions of local sources and their performances in order to dynamically assign a proper weight to each source for each given data. To this end, the algorithm searches for the local regions in the search space in which there is at least one decision maker with high performance. After the rule base is constructed, each rule defines a competence region and specifies an efficient deci- sion maker for that region. The proposed algorithm constructs the rule base iteratively and no pre-defined number of regions is necessary. The pseudo code of the algorithm under the name FDSS is shown in Figure 3. FDSS stands for Dynamic Selection Approach-Fuzzy Rule-Based System. In the following, the training and testing phases of the algorithm are explained in detail. Since the proposed algorithm constructs the rule base iteratively, its convergence is discussed at the end of this section. Figure 2. Training the local sources and the decision-making center with separate data sets. The decision-making center is also provided with outputs of local sources for the validation set. 114 F. FATEMIPOUR AND M. R. AKBARZADEH-T Figure 3. First FDSS proposed algorithm uses coverage and truth value as the terminating conditions in a recursive manner. 3.1.1. Training phase in the proposed FDSS algorithm Inspired by [38], each iteration of the algorithm at the training phase divides the feature space into two regions, aiming at finding local regions with powerful decision makers and assign a fuzzy rule to that region. The added rule is in the following form IF x is A THEN SELECT S with P . This rule specifies that, if x is in the region A, then the result of decision maker s would be A A correct with probability P . P is the conditional probability of correctness of s given each S S class. Selection algorithms are prone to overfitting [39,40]. Overfitting happens when the algorithm that selects the local sources fits the training data set so well that while being accurate for training data set, its selections fail to make proper decisions during the exe- cution. For the generated rules to present high reliability and to prevent overfitting, each region could be turned into a rule if the following conditions are met: (1) There exists at least one source with high correctness probability for the data in that region. (2) There exists a sufficient number of data in that region (for the sake of generalization ability of the final rule base). To evaluate the fulfillment of these conditions, we propose two measures, truth value and coverage, based on quality measures introduced in [13]. The proposed measures are described in the following. 3.1.2. Degree of truth value (T) T can be viewed as the rate of the data satisfying the consequent from those which satisfy the antecedent [9]. Since each rule in our problem specifies one of the local decision mak- ers that are suitable for making decision about the incoming data, the truth value of the rule depends on the performance of the specified decision maker. In other words, degree FUZZY INFORMATION AND ENGINEERING 115 of truth of each rule is related to the correctness degree of the specified decision maker. T increases as more data satisfying the antecedent part (located in the determined com- petence region) also satisfy the consequent part (are labeled correctly by the specified decision maker). Hence, we formulate T as follows, min(μ (x ), α , x ) A k S k k=1 T = .(1) μ (x ) A k k=1 In this formula, the numerator shows the membership of the data which the corre- sponding decision maker labels correctly. The formula computes the normalized sum of these membership values. As the above formula shows, we have considered the correctness probability of the decisions in the training data set. 3.1.3. Degree of coverage (V) Coverage value specifies the generalization ability of the rule. Degree of sufficient coverage or V describes if the rule is supported by enough data [9]. V increases as more data satisfy both antecedent and consequent parts. As V increases, the generality of the rule increases. Since each rule in our problem indicates a competence region and a decision maker, we consider its coverage value as the number of train data that the source labels correctly in the specified region, as shown in (2) α , x i k k=1 V = .(2) Two threshold values are initiated at the beginning of the algorithm for the truth and coverage values above. At each iteration, each divided region turns into a rule if there exists at least one source that its truth value for the corresponding data exceeds the desired threshold. If no sources satisfy this condition, the coverage measure is checked. When cov- erage is below the threshold for all of the sources, it means that further division of the space leads to rules that are not supported by enough train data, hence not efficient for the rule base. Therefore, the division is stopped. To increase the diversity of the sources included in the rule base, all sources with truth value higher than average of truth values in that region are added to the rule base. Then the algorithm breaks the chain of recursion. Otherwise, the process repeats by further dividing the area. A conditional probability vector in the following form is assigned to each rule in the final rule base: P = [P , ... , P , ... , P ], j = 1, ... , m, S 1 j m ij P = P(C |S ) = .(3) j j i β + β s ij ij This vector assigns a probability to source S per class. Whenever each rule uses a source to label data to class C , P indicates the probability that this decision is correct. In this way, j j for each new data, the correctness probabilities of decisions of sources are considered in the final decision further to its membership degree to local competence regions. 116 F. FATEMIPOUR AND M. R. AKBARZADEH-T At the end, the final constructed rule base includes R rules in the following form: 1 1 1 1 Rule :IF x is A THEN SELECT s with P , k 1 ··· R R r Rule :IF x is A THEN SELECT s with P k r Such that s ∈{S , ... , S }, k = 1, ... , r is one of the base decision makers. A is a vector 1 n th showing the center of β in the competence region that the r rule specifies. 3.1.4. Decision-making process in the proposed method After the training phase, an unseen pattern is received for being processed. The output of each rule suggests one source with a vector ω which is the result of multiplying the firing value of the rule and its conditional probability vector. The result of the rule base is a matrix W of size n ∗ m as below, ⎡ ⎤ ω ··· ω 1 1 s ,c s ,c 1 m ⎢ ⎥ ω ··· ω 2 2 s ,c s ,c ⎢ 1 m⎥ W = .(4) Rules ⎣ ⎦ ··· ω ··· ω R R s ,c s ,c 1 m W is then set to the element by element multiplication of itself and the matrix of results Rules of sources is, ⎡ ⎤ ⎡ ⎤ ω 1 ··· ω 1 O 1 ··· O 1 s ,x ,1 s ,x ,m s ,c s ,c 1 m k k ⎢ ⎥ ⎢ ⎥ ω 2 ··· ω 2 O 2 ··· O 2 s ,x ,1 s ,x ,m s ,c s ,c ⎢ 1 m⎥ ⎢ k k ⎥ W = ∗ .(5) Rules ⎣ ⎦ ⎣ ⎦ ··· ··· ω R ··· ω R O R ··· O R s ,c s ,c s ,x ,1 s ,x ,m 1 m k k After calculating (5), W is multiplied by the result of each source. O equals one s ,x ,j Rules if the corresponding source labels x to class j and zero otherwise. Therefore, W turns s ,c into zero if s labels the data to a class other than c . Then we compute the weight for each i j source per class as below, ⎡ ⎤ W ··· W s ,c s ,c 1 1 1 m ⎢ ⎥ W ··· W s ,c s ,c 2 1 2 m ⎢ ⎥ W = .(6) ⎣ ⎦ ··· W ··· W s ,c s ,c n 1 n m W specifies the weight of source S for class C which is computed as below, S ,C i j i j W = max r r.(7) s ,c i j r s =s (ω)s ,c | i j As Equation (7) shows, the determined weight of S for class C is the maximum output i j value of rules for that specific source and class. At the end, we compute the weight of each class as the maximum weight among all sources. If there exists one class which strongly dominates other classes, that class is selected as the final decision, as in [41]. Otherwise, we compute the weight of each class FUZZY INFORMATION AND ENGINEERING 117 Figure 4. The inference process at the decision-making center for new data. as the average of weights in the rule base as shown in (8). Then, the class with maximum weight is selected as the final decision, as below, (W r > mean W r ) s ,c r s ,c r=1 j j W = .(8) r r W > mean W s ,c r s ,c j j The process of making decision about the new data is shown in Figure 4. 3.1.5. Remarks on convergence of the algorithm Since the proposed algorithms run iteratively and recursively, we discuss their conver- gence here. The proposed algorithm continues dividing the regions until the following two conditions are met: (1) The local regions cover an appropriate number of data. (2) No source is accurate enough for the current region’s data. 118 F. FATEMIPOUR AND M. R. AKBARZADEH-T If there does not exist any source with sufficiently high accuracy for any region, the cover- age criteria breaks the chain of recursion and prevents the unlimited number of divisions. Therefore, the algorithm always converges and does not fall into an unlimited loop. 3.2. The proposed top-down FDSS algorithm One of the main disadvantages of dynamic approaches is their dependence on pre-defined parameters. Although FDSS has many advantages including less memory consumption, its performance depends on two parameter settings, minimum truth and minimum coverage threshold values. One idea to remove the parameters of the proposed FDSS algorithm is to construct the rule base in a top-down manner. The top-down algorithm starts with one rule over the entire data set. After this, the algorithm starts dividing the area recursively just like FDSS. The difference is that the maximum achieved conditional probability for each class so far by the added rules to the areas containing the current area is passed to each step. In each step, only those rules are added to the rule base that outperform the previous rules in terms of conditional probabilities for at least one class. In this way, the algorithm gives up searching for suitable rules which need suitability parameters to be previously defined, in favor of adding better rules in each step. We also omit the minimum coverage threshold. The FDSS divides the search space to find at least one suitable rule. In some cases, there might not exist any suitable source for the local regions. Therefore we define the coverage threshold to stop the algorithm from adding dysfunctional rules which are not supported by enough amounts of data. Two parameters are provided in the FDSS algorithm to prevent overfitting. The methodology of creating the rule base in top-down FDSS is immune to overfitting. When using top-down manner, we add a rule in each step only if it is better than the previously added ones. If in any situation one rule with very low coverage is added to the rule base, it does not lead to overfitting because of its very low variance and the fact that the algorithm has added efficient rules for this area in the previous steps. This lets us remove the coverage parame- ter. Newly added rules in each step focus on local regions where sources perform better in comparison to the previous more global areas. We should note that this algorithm provides a rule base with overlapping local regions of competence. It means that unlike the previous algorithm, each region might fit to more than one rule. Figure 5 shows the pseudo code of the proposed top-down FDSS algorithm. 4. Experimental results To evaluate the performance of our proposed method, we consider two sets of experi- ments. First we evaluate the accuracy of classification using homogeneous local sources on 14 benchmark data sets from UCI [42] and Keel [43] machine learning database repositories. Table 2 shows the main features of the selected data sets. The number of features in the selected data sets varies from 2 to 24, and number of instances varies from 569 to 19,020. After describing the experimental setup, classification results to homogeneous bench- marks are presented. Then we evaluate the proposed algorithm using heterogeneous local sources. Performance results and comparisons in this part are included in Section 4.2. Finally we present the memory consumption of the proposed method in Section 4.3. FUZZY INFORMATION AND ENGINEERING 119 Figure 5. Second proposed algorithm is performed in top-down manner. The coverage and truth value thresholds in FDSS are removed in this algorithm. Vector of conditional probabilities is set to zero in the first iteration. Table 2. Main specifications of the considered benchmark datasets. Dataset Source #Features #Instances #Classes Blood Transfusion TTransfusionTrTrTransfusion Keel 4 748 2 WBDC Keel 30 569 2 Balance UCI 4 625 3 Australian UCI 14 690 2 Breast cancer UCI 9 699 2 Pima UCI 8 768 2 Mammographic UCI 5 830 2 Vehicle UCI 3 846 4 Wine UCI 11 962 6 German UCI 24 1000 2 segmentation UCI 19 2310 7 Banana UCI 2 5300 2 Page Blocks UCI 10 5473 5 MAGIC UCI 10 19,020 2 The classification results are compared against five other selection approaches: (1) DCS-LA [5]: This algorithm defines the competence of each local source classifier as its local accuracy. The local accuracy is estimated using the k nearest neighbors of the test data. We choose k = 10 since this value of k represents the best performance according to [5]. (2) DCS-P and DCS-KL [4]: These two algorithms are dynamic approaches that use compar- ison to a random classifier as a measure of competence. These algorithms use k nearest neighbor method and we set k = 10 according to the paper’s results. (3) Single Best (SB): This approach selects the best performing source on training data set, This one source is used to make decision about all the test data. (4) FRBMCS [44]: This static approach trains a fuzzy rule base at the training phase that statically weights local sources. This method needs the outputs of local sources in the form of a vector of probabilities assigned to each class. 120 F. FATEMIPOUR AND M. R. AKBARZADEH-T 4.1. Experimental setup In order to simulate the distributed local sources where each source is trained by its own separate data set, we divide the training data randomly between sources with no overlap. Data for training and testing sets are extracted using fivefold cross validation. At each itera- tion, three folds are used for training sources, one fold as validation data set and one fold as testing data set. Therefore, the presented results are the average of 20 times running of the algorithm. Naïve Based is used as the local classifier and FCM clustering is used for dividing the space. The threshold values for truth value is set to 0.96 and for coverage value is set to 10/(#training data) based on experiments. In both homogeneous and heterogeneous tests, we apply the following process. At each iteration, we first train the local sources, each using its own dedicated set of training data. Then we compute the outputs of local sources for the validation data set and train the combination algorithm using the output values together with the validation set. The algorithm is then tested on the testing data set. 4.1.1. Test with homogeneous sources In this test, we use the same classifier and the same set of features for all the local sources. In order to evaluate the performance of the algorithm in dealing with unreliability of local sources, we increase the number of local sources that the training data will divide into. Since the data are divided with no overlap, the available training data for each source decreases as we increase the number of local sources. For 2–10 number of local sources, each source receives approximately 50%, 25%, 16%, 12% and 10% of the whole available train data, respectively. This leads to less reliability for local sources. The accuracy results averaged over the different number of sources is shown in Table 2. Figure 6 compares the accuracies of different algorithms for different number of sources. As this figure and Table 3 show, the proposed approaches perform better than other approaches in most cases. Increasing the number of local sources presents interesting results. Single best approach produces an average of 0.234 decrease in accuracy. This is because of the fact that decreasing the number of available train data leads to classifiers with lower performance. FRBMCS, DCS-P, SB, FDSS, DCS-LA, DCS-KL produce an average of −4.69, −3.09, 0.578, 0.054, −0.245, 0.035 values increase in accuracy. As the results show, increasing the number of local sources might lead to better results. This is because when number of local sources increases, the fusion center could select between more comple- mentary classifiers. While the overall performance of the proposed method is better than other methods according to Table 3, the change of accuracy is lower in the proposed method except for DCS-KL. On the other hand, DCS-KL shows less accuracy in predictions. In other words, the proposed method produces a smoother and more accurate perfor- mance by increasing the number of local sources. This is the result of the main feature of the proposed algorithm that facilitates the power of fuzzy logic in handling uncertainty. The Kolmogorov–Smirnov test is used for a deeper comparison of the two proposed algorithms. The Kolmogorov–Smirnov test is a nonparametric test that does not assume that data are sampled from any specific distribution. The confidence level for the null hypothesis rejection is considered as 5%. The p-values of the test are also shown in Table 3. As the test shows, the two proposed algorithms almost perform the same. The answer to the question of which algorithm is beneficial refers to the differences of the two algo- rithms. FDSS has two parameters that control the number of iterations of the algorithm. FUZZY INFORMATION AND ENGINEERING 121 Figure 6. Average results of fivefold cross validation on different data sets when train data is divided between 2, 4, 6, 8 and 10 homogeneous local sources separately for each approach. The larger the minimum coverage threshold or the smaller the minimum truth value thresh- old, the faster the algorithm’s convergence, and the smaller the rule base size. On the other hand, Top-down FDSS is parameter free. Although this is a significant advantage, but makes the algorithm continue searching to the deepest possible level and leads to more time-consuming process of training phase. The memory consumption of two algorithms is compared in Subsection 4.3. 4.2. Test with heterogeneous sources In this test, we use different sets of features for each local classifier, hence heterogeneous. For this purpose, we use Random Subspace [45] method to create the local data. The Ran- dom Subspace method uses different sets of features for each local classifier. Here we use ten local sources each trained with 50% of the features that are randomly selected. The rule base is trained with the whole set of features of the validation set. Table 4 shows average of fivefold cross validation results. For heterogeneous sources, we observe that the proposed approach works better than the others for six datasets. The average difference between FDSS and best of other approaches for all data sets is −0.7 and for top-down FDSS it is −1.14. FDSS also works better than top-down FDSS in eight data sets. The p-values of the test are also shown in Table 4. As the table shows, the two proposed algorithms maintains similar performance. 122 F. FATEMIPOUR AND M. R. AKBARZADEH-T Table 3. Classification accuracies obtained by each algorithm using homogeneous sources averaged over different number of local sources. Best results for each data set are bold. Last column shows the Kolmogorov–Smirnov test for the comparison of FDSS and top-down FDSS. Proposed methods DCS- KL DCS-P DCS-LA SB FRBMCS FDSS Top-down FDSS p-value Transfusion 74.80 ∓3.07 70.1 ∓5.73 75.648 ∓3.46 74.967 ∓3.26 75.977 ∓3.34 76.139 ∓3.01 76.36 ∓2.99 0.556 Banana 72.007 ∓1.6 65.53 ∓4.36 77.246 ∓1.98 72.246 ∓1.63 57.345 ∓3.62 79.084∓1.71 76.398 ∓2.03 2.95E-11 Page blocks 59.63 ∓1.58 92.905 ∓0.882 94.04 ∓0.668 93.515 ∓0.782 90.33 ∓1.23 94.172∓0.678 93.89 ∓0.678 0.139 Pima 69.921 ∓3.46 70.329 ∓3.71 73.316 ∓3.2 72.904 ∓3.27 66.319 ∓4.99 73.86 ∓3.43 74.226 ∓2.7 0.139 WDBC 74.753 ∓4.18 71.51 ∓9.4 93.78 ∓2.3 94.141∓2.48 65.938 ∓8.02 92.09 ∓6.7 93.968 ∓6.4 0.556 Vehicle 53.954 ∓3.8 50.548 ∓4.5 60.386 ∓3.8 56.704 ∓3.44 43.950 ∓9.49 61.56∓5.3 61.454 ∓5.5 0.556 Mammographi c 79.887 ∓3.87 70.7117 ∓8.18 79.05 ∓3.4 80.254 ∓3.6 62.185 ∓11.32 81.15∓4.02 80.267 ∓3.97 0.193 Breast cancer 94.43 ∓2.311 75.099 ∓6.46 94.459 ∓2.33 93.989 ∓2.66 66.318 ∓19.71 94.74∓7.08 94.71 ∓6.8 0.961 Wine 56.952∓2.99 56.56 ∓3.2 54.83 ∓2.97 54.72 ∓3.16 47.108 ∓6.13 55.82 ∓3.17 56.189 ∓2.97 0.443 Australian 50.259 ∓4.54 61.518 ∓4.98 66.73 ∓4.63 67.186 ∓4.72 55.57 ∓5.99 70.105∓5.29 67.109 ∓5.03 0.961 Balance 87.96∓3.08 76.384 ∓6.62 86.848 ∓3.23 85.288 ∓3.70 62.32 ∓17.55 87.27 ∓3.15 87.928 ∓2.71 0.556 German 70.279 ∓3.59 69.39 ∓3.9 70.805∓3.51 70.526 ∓3.28 70.184 ∓3.53 70.37 ∓3.51 70.730 ∓3.41 0.961 Magic 71.136 ∓0.78 72.236 ∓0.74 76.582 ∓0.88 76.612∓0.85 66.466 ∓2.5 76.27 ∓0.67 76.259 ∓0.73 0.99 Segmentation 80.2909 ∓1.82 50.734 ∓5.54 88.357∓1.74 86.1098 ∓1.92 70.512 ∓8.88 87.77 ∓1.49 87.5623 ∓1.72 0.556 FUZZY INFORMATION AND ENGINEERING 123 Table 4. Classification accuracy obtained by each algorithm using heterogeneous sources. Best results for each data set are bold. Last column shows the Kolmogorov–Smirnov test for the comparison of FDSS and top-down FDSS. Proposed methods DCS- KL DCS-P DCS-LA SB FRBMCS FDSS Top-down FDSS p-value Transfusion 74.53 ∓2.93 74.55 ∓2.63 75.76 ∓2.48 75.29 ∓3.41 76.83∓2.72 76.49 ∓2.58 75.9 ∓2.61 0.77 Banana 677.69 ∓4.24 66.05 ∓3.3 80.57 ∓1.29 70.22 ∓1.31 59.82 ∓5.07 82.08∓1.25 79.31 ∓1.54 2.4894e-07 Page blocks 60.92 ∓1.85 94.08 ∓1.09 95.02 ∓0.64 95.07∓0.77 91.75 ∓1.45 94.65 ∓0.81 94.64 ∓0.63 0.49734 Pima 68.18 ∓4.5 71.60 ∓4.12 72.42 ∓3.98 75.00∓3.82 67.80 ∓4.58 73.33 ∓4.13 74.50 ∓4.26 0.49734 WDBC 75.00 ∓4.98 70.45 ∓11.89 94.03 ∓1.96 94.15 ∓2.32 76.40 ∓15.74 94.42∓2.08 94.35 ∓1.83 0.96548 Vehicle 53.87 ∓4.56 50.54 ∓5.34 61.78 ∓3.62 61.36 ∓3.18 51.91 ∓9.32 62.64∓3.79 62.353 ∓3.73 0.96548 Mammographic 73.86 ∓8.51 70.44 ∓7.43 77.32 ∓3.15 78.22 ∓2.99 65.77 ∓13.35 80.63∓3.04 78.73 ∓4.2 0.49734 Breast cancer 93.98 ∓2.08 71.29 ∓3.61 93.86 ∓3.42 94.08 ∓2.18 72.80 ∓22.48 94.64 ∓2.51 94.87 ∓2.11 0.96548 Wine 56.70∓2.89 56.20 ∓3.14 55.82 ∓2.63 56.56 ∓2.60 52.06 ∓5.75 55.59 ∓3.31 56.62 ∓2.89 0.77095 Australian 51.52 ∓4.90 69.95 ∓4.49 79.15∓3.17 82.20 ∓3.84 63.16 ∓8.75 73.39 ∓6.43 72.74 ∓5.90 0.96548 Balance 76.28 ∓4.52 70.4 ∓5.28 83.08 ∓3.64 68.6 ∓3.59 67.04 ∓9.46 83.72∓3.96 81.76 ∓4.08 0.059142 German 70.14 ∓3.03 68.51 ∓3.06 70.53∓3.56 70.51 ∓3.40 70.36 ∓3.39 70.16 ∓3.36 70.31 ∓3.31 0.96548 Magic 70.14 ∓9.64 68.51 ∓3.88 70.53∓10.36 70.51 ∓10.32 70.36 ∓7.01 70.16 ∓3.21 70.32 ∓2.38 0.98 Segmentation 78.72 ∓2.97 53.39 ∓9.4 90.54∓1.3 89.88 ∓3.04 81.09 ∓6.10 88.74 ∓6.02 89.31 ∓6.15 0.49734 124 F. FATEMIPOUR AND M. R. AKBARZADEH-T Table 5. Average number of generated rules over different number of local sources in the proposed approaches. Data set FDSS Top-down FDSS Transfusion 17.77 41.88 Banana 153.44 92.99 Page Blocks 76.68 58.43 Pima 40.65 22.39 WDBC 18.21 9.88 Vehicle 39.88 18.96 Mammographic 45.04 21.66 Breast Cancer 12.84 9.04 Wine 41.72 33.16 Australian 43.15 19.46 Balance 31.63 5.91 German 45.63 34.75 Magic 619.37 751.45 Segmentation 86.1 27.6 Table 6. Average runtime of the different dynamic algorithms for decid- ing about a single data in milliseconds. Best result for each dataset is bold. Proposed methods DCS-KL DCS-P DCS-LA FDSS Top-down FDSS Transfusion 0.110 0.446 0.356 0.032 0.041 Banana 0.836 5.46 5.064 0.3 0.19 Page blocks 0.935 6.455 5.717 0.390 0.221 Pima 0.110 0.434 0.345 0.0398 0.012 WDBC 0.101 0.296 0.295 0.029 0.15 Vehicle 0.150 0.474 0.366 0.068 0.038 Mammographi c 0.124 0.520 0.426 0.041 0.029 Breast cancer 0.070 0.25 0.209 0.020 0.002 Wine 0.144 0.484 0.360 0.087 0.054 Australian 0.110 0.371 0.290 0.051 0.012 Balance 0.087 0.355 0.298 0.031 0.12 German 0.109 0.541 0.405 0.071 0.43 Magic 4.92 62.43 59.45 4.40 4.9 Segmentation 0.439 1.301 0.941 0.222 0.078 4.3. Memory consumption and runtime Mean number of generated rules in the proposed method is shown in Table 5 in order to evaluate the proposed methods in terms of memory consumption. The number of gener- ated rules relates to the number of leaves in the tree of dividing the input space. In most cases, top-down FDSS inherently produces fewer rules than FDSS. Since the top-down man- ner prefers the more global rules to local rules and each iteration only adds better rules, the final rule base is smaller than FDSS. Comparing with storing all the training data in mem- ory, FDSS presents 96.12% and top-down FDSS presents 97.5% improvement in memory. We see from the results that the required memory for storing the produced rule base is much less than the required memory for keeping the complete data. This is while Tables 3–5 demonstrate that it is able to achieve the same or even higher classification performance with lower consumed memory. FUZZY INFORMATION AND ENGINEERING 125 Average runtime for one single source is compared in Table 6 for dynamic algorithms. The algorithms are implemented in MATLAB and runtimes are calculated using Intel(R) Core(TM) i3, with 4 GB RAM using Windows 7. As the table shows, runtime of the proposed algorithms are significantly less than other dynamic algorithms. The reason is that proposed algorithm does not need to search for a specific number of neighbors for the input data which is very time consuming. The runtime for top-down FDSS is generally less than FDSS since the average number of rules generated by top-down FDSS is generally less than FDSS as shown in Table 5. 5. Conclusion In this paper, we propose two algorithms based on fuzzy logic for dynamic source selection in decision fusion systems. The first algorithm works in a recursive manner using specifi- cations of fuzzy rules including truth and coverage values to construct the rule base. The second algorithm works independently from these two parameters. We find that by com- pressing the knowledge extracted from the training data set into a single fuzzy rule base, we can achieve similar or better performance while storing much less data in memory. The proposed approach can then be regarded as an alternative for neighborhood-based approaches especially when data size is large, memory usage is limited and process speed is important. There are several directions for the extending the proposed approach. First, the pro- posed approach divides data into two clusters at each iteration; whereas it may be useful to choose an appropriate number of clusters at each iteration based on the available data. Second, it would be desirable to cluster data not only by distance measures but also con- sider the accuracies of different sources in order to find the optimal division that leads to the best set of possible rules. Finally, a post-processing phase can be considered that removes or merges rules in order to have a more efficient rule base. Disclosure statement No potential conflict of interest was reported by the authors. ORCID M. R. Akbarzadeh-T http://orcid.org/0000-0001-5626-5559 References [1] Britto AS Jr, Sabourin R, Oliveira LE. Dynamic selection of classifiers – a comprehensive review. Pattern Recognit. 2014;47(11):3665–3680. [2] Prodromidis A, Chan P, Stolfo S. Meta-learning in distributed data mining systems: issues and approaches. Adv Distrib Parallel Knowl Discov. 2000;3:81–113. [3] Zhang L, Zhou W-D. Sparse ensembles using weighted combination methods based on linear programming. Pattern Recognit. 2011;44(1):97–106. [4] Woloszynski T, Kurzynski M, Podsiadlo P, et al. A measure of competence based on random classification for dynamic ensemble selection. Inf Fusion. 2012;13(3):207–213. [5] Woods K, Kegelmeyer WP, Bowyer K. Combination of multiple classifiers using local accuracy estimates. IEEE Trans Pattern Anal Mach Intell. 1997;19(4):405–410. 126 F. FATEMIPOUR AND M. R. AKBARZADEH-T [6] Jurek A, Bi Y, Wu S, et al. A survey of commonly used ensemble-based classification techniques. Knowl Eng Rev. 2014;29:551–581. [7] Liu R, Yuan B. Multiple classifiers combination by clustering and selection. Inf Fusion. 2001;2(3):163–168. [8] Dos Santos EM, Sabourin R, Maupin P. A dynamic overproduce-and-choose strategy for the selection of classifier ensembles. Pattern Recognit. 2008;41(10):2993–3009. [9] Wu D, Mendel JM. Linguistic summarization using IF–THEN rules and interval type-2 fuzzy sets. IEEE Trans Fuzzy Syst. 2011;19(1):136–151. [10] Didaci L, Giacinto G. Dynamic classifier selection by adaptive k-nearest-neighbourhood rule. In: Roli F, Kittler J, Windeatt T, editors. Multiple classifier systems. MCS 2004. Berlin: Springer; 2004. (Lecture notes in computer science; vol. 3077). [11] Yin XC, Huang K, Hao HW, et al. A novel classifier ensemble method with sparsity and diversity. Neurocomputing. 2014;134:214–221. [12] Schapire RE. The strength of weak learnability. Mach Learn. 1990;5(2):197–227. [13] Breiman L. Bagging predictors. Mach Learn. 1996;24(2):123–140. [14] Tsoumakas G, Partalas I, Vlahavas I. An ensemble pruning primer, In: Okun O, Valentini G, editors. Applications of supervised and unsupervised ensemble methods. Berlin, Heidelberg: Springer; 2009. p. 1–13. [15] Ruta D, Gabrys B. Classifier selection for majority voting. Inf Fusion. 2005;6(1):63–81. [16] Abdelazeem S. A greedy approach for building classification cascades. Machine Learning and Applications, 2008. ICMLA’08. Seventh International Conference; Dec 11; San Diego, CA, USA. Washington (DC): IEEE; 2008. p. 115–120. [17] Kuncheva LI, Jain LC. Designing classifier fusion systems by genetic algorithms. IEEE Trans Evol Comput. 2000;4(4):327–336. [18] Lam L, Suen CY. Optimal combinations of pattern classifiers. Pattern Recognit Lett. 1995;16(9):945 –954. [19] Sirlantzis K, Fairhurst MC, Hoque MS. Genetic algorithms for multi-classifier system configuration: a case study in character recognition. International Workshop on Multiple Classifier Systems. Berlin: Springer; 2001. [20] Parvin H, MirnabiBaboli M, Alinejad-Rokny H. Proposing a classifier ensemble framework based on classifier selection and decision tree. Eng Appl Artif Intell. 2015;37:34–42. [21] Fatemipour F, Akbarzadeh-T MR, Ghasempour R. A new fuzzy approach for multi-source decision fusion. Fuzzy Systems (FUZZ-IEEE), 2014 IEEE International Conference; Jul 6; Beijing, China. IEEE; 2014. p. 2238–2243. [22] Burduk R, Walkowiak K. Static classifier selection with interval weights of base classifiers. Asian Conference on Intelligent Information and Database Systems; Mar 23. Cham: Springe; 2015. p. 494–502. [23] Brown G, Wyatt J, Harris R, et al. Diversity creation methods: a survey and categorisation. Inf Fusion. 2005;6(1):5–20. [24] Kuncheva LI, Whitaker CJ, Shipp CA, et al. Is independence good for combining classifiers? Pat- tern Recognition, 2000. Proceedings 15th International Conference; Vol. 2; Barcelona, Spain. IEEE; 2000. p. 168–171. [25] Yin X-C, Huang K, Yang C, et al. Convex ensemble learning with sparsity and diversity. Inf Fusion. 2014;20:49–59. [26] Aksela M. Comparison of classifier selection methods for improving committee performance. International Workshop on Multiple Classifier Systems; Jun 11. Berlin: Springer; 2003. p. 84–93. [27] Giacinto G, Roli F. Dynamic classifier selection based on multiple classifier behaviour. Pattern Recognit. 2001;34(9):1879–1881. [28] Didaci L, Giacinto G, Roli F, et al. A study on the performances of dynamic classifier selection based on local accuracy estimation. Pattern Recognit. 2005;38(11):2188–2191. [29] Mendialdua I, Martínez-Otzeta JM, Rodriguez-Rodriguez I, et al. Dynamic selection of the best base classifier in one versus one. Knowl Based Syst. 2015;85:298–306. [30] Cevikalp H, Polikar R. Local classifier weighting by quadratic programming. IEEE Trans Neural Netw. 2008;19(10):1832–1838. FUZZY INFORMATION AND ENGINEERING 127 [31] Li L, Zou B, Hu Q, et al. Dynamic classifier ensemble using classification confidence. Neurocom- puting. 2013;99:581–591. [32] Ko AH, Sabourin R, Britto Jr AS. From dynamic classifier selection to dynamic ensemble selection. Pattern Recognit. 2008;41(5):1718–1731. [33] Lysiak R, Kurzynski M, Woloszynski T. Probabilistic approach to the dynamic ensemble selection using measures of competence and diversity of base classifiers, In: Corchado E, editor. Hybrid artificial intelligent systems. Berlin: Springer; 2011. p. 229–236. [34] Li S, Zheng Z, Wang Y, et al. A new hyperspectral band selection and classification framework based on combining multiple classifiers. Pattern Recognit Lett. 2016;83:152–159. [35] Swiderski B, Osowski S, Kruk M, et al. Aggregation of classifiers ensemble using local discrimina- tory power and quantiles. Expert Syst Appl. 2016;46:316–323. [36] Nazemi A, Fatemi Pour F, Heidenreich K, and Fabozzi F. Fuzzy decision fusion approach for loss- given-default modeling. Eur J Oper Res. 2017;262(2):780–791. [37] Ykhlef H, Bouchaffra D. An efficient ensemble pruning approach based on simple coalitional games. Inf Fusion. 2017;34:28–42. [38] Lee HE, Park KH, Bien ZZ. Iterative fuzzy clustering algorithm with supervision to construct probabilistic fuzzy rule base from numerical data. IEEE Trans Fuzzy Syst. 2008;16(1):263–277. [39] Dos Santos EM, Sabourin R, Maupin P. Overfitting cautious selection of classifier ensembles with genetic algorithms. Inf Fusion. 2009;10(2):150–162. [40] Tsymbal A, Pechenizkiy M, Cunningham P. Sequential genetic search for ensemble feature selec- tion. Proceedings of the 19th International Joint Conference on Artificial intelligence; Edinburgh, Scotland; 2005. p. 877–882. [41] Kuncheva LI. Switching between selection and fusion in combining classifiers: an experiment. IEEE Trans Syst Man Cybern Part B Cybern. 2002;32(2):146–156. [42] Blake C, Merz CJ. {UCI} Repository of machine learning databases. Irvine (CA): Department of Information and Computer Science, University of California; 1998. [43] Alcalá J, Fernández A, Luengo J. Keel data-mining software tool: data set repository, inte- gration of algorithms and experimental analysis framework. J Mult Valued Log Soft Comput. 2010;17:255–287. [44] Trawinski K, Cordon O, Sanchez L, et al. A genetic fuzzy linguistic combination method for fuzzy rule-based multiclassifiers. IEEE Trans Fuzzy Syst. 2013;21(5):950–965. [45] Schapire RE, Freund Y, Bartlett P, et al. Boosting the margin: a new explanation for the effective- ness of voting methods. Ann Stat. 1998;26(5):1651–1686.

Journal

Fuzzy Information and EngineeringTaylor & Francis

Published: Jan 2, 2018

Keywords: Fuzzy linguistic rule base; distributed decision-making; dynamic classifier selection; classifier selection; decision fusion

References