Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Extracting Interpretable Fuzzy Models for Nonlinear Systems Using Gradient-based Continuous Ant Colony Optimization

Extracting Interpretable Fuzzy Models for Nonlinear Systems Using Gradient-based Continuous Ant... Fuzzy Inf. Eng. (2013) 3: 255-277 DOI.10.1007/s12543-013-0144-2 ORIGINAL ARTICLE Extracting Interpretable Fuzzy Models for Nonlinear Systems Using Gradient-based Continuous Ant Colony Optimization M. Eftekhari · M. Zeinalkhani Received: 15 October 2012/ Revised: 15 March 2013/ Accepted: 10 April 2013/ © Springer-Verlag Berlin Heidelberg and Fuzzy Information and Engineering Branch of the Operations Research Society of China Abstract This paper exploits the ability of a novel ant colony optimization al- gorithm called gradient-based continuous ant colony optimization, an evolutionary methodology, to extract interpretable first-order fuzzy Sugeno models for nonlin- ear system identification. The proposed method considers all objectives of system identification task, namely accuracy, interpretability, compactness and validity con- ditions. First, an initial structure of model is obtained by means of subtractive clus- tering. Then, an iterative two-step algorithm is employed to produce a simplified fuzzy model in terms of number of fuzzy sets and rules. In the first step, the param- eters of the model are adjusted by utilizing the gradient-based continuous ant colony optimization. In the second step, the similar membership functions of an obtained model merge. The results obtained on three case studies illustrate the applicability of the proposed method to extract accurate and interpretable fuzzy models for nonlinear system identification. Keywords Continuous ant colony· Interpretable · Fuzzy modeling 1. Introduction Fuzzy models become useful when a system cannot be defined in precise mathemat- ical terms [1]. The non-fuzzy or traditional representations require a well-structured model and well defined model parameters [1]. However, in practice, there may be uncertainties, unpredicted dynamics and other unknown phenomena that cannot be mathematically modeled. The main contribution of fuzzy modeling theory is its abil- ity to handle many practical problems that cannot be adequately represented by con- ventional methods. Fuzzy modeling of nonlinear systems has been the focus of many M. Eftekhari ()· M. Zeinalkhani Department of Computer Engineering, Shahid Bahonar University of Kerman, Kerman, Iran email: m.eftekhari@mail.uk.ac.ir 256 M. Eftekhari· M. Zeinalkhani (2013) scientific researches. Takagi and Sugeno (TS) [2] have proposed a search algorithm for a fuzzy controller and generalized their techniques to fuzzy identification. Jang [2] proposed an architecture and a learning procedure which combines fuzzy logic with neural networks for inference. The adaptive network based fuzzy inference sys- tem (FIS) is capable of constructing input-output mapping accurately based on both human’s knowledge and stipulated input-output data pairs. However, once a fuzzy model is developed, in most cases it needs to undergo an optimization process. There are two main goals in such optimization tasks. The first one is a model structure optimization (i.e., the number of rules) and the second objective is the fine tuning of model’s parameters (i.e., the parameters of input and output membership functions). Several investigations about the interpretability of fuzzy classification problems have recently been performed [3-5]. Also many studies regarding to the transparency have been reported in the fields of decision making, prediction of process behavior, data mining and others [6-8]. An automatic data-driven based method for constructing the initial fuzzy models is Chiu’s subtractive clustering [9]. This method has the advantage of avoiding the explosion of the rule base, a problem known as the “curse of dimensionality”. There- fore, clustering-based methods would be preferred to the grid partitioning techniques. In this contribution, the subtractive clustering technique is used for creating an initial FIS model. Recently, some studies have been performed in order to develop TS models for multi-input multi-output (MIMO) nonlinear system identification [10-12]. The main goal of these studies is to achieve objectives of system identification as well as secur- ing the interpretability of fuzzy model. In a recent research, a twofold taxonomy of interpretability has been presented in the literature based on the previous studies in this field [13]. According to this classification, low-level and high-level interpretabil- ity are introduced. The low-level interpretability defines some constraints for the sake of securing interpretability at fuzzy set level, while the high-level interpretabil- ity is defined at the fuzzy rule level. As mentioned in [13], distinguishability is the most important semantic constraint for achieving low-level interpretability. Accord- ing to Occam’s razor parsimony principle in machine learning, from all models that can describe a process accurately the simplest one is the best. So, parsimony is a basic criterion for achieving high-level interpretability. Therefore, distinguishability and parsimony are the first concerns toward achieving both low and high-level inter- pretability separately. Obtaining a suitable interpretability is a complex and computa- tionally prohibitive problem especially when the goal is to design approximate fuzzy models [10-12]. Evolutionary algorithms (EAs) have been widely used for eliciting fuzzy mod- els owing to their ability in searching for optimal solutions in an irregular and high dimensional solution space [10-12]. Differential evolution (DE), one of the most well-known EAs, has been utilized in order to develop transparent fuzzy models [10, 12]. EAs and other stochastic search techniques seem to be a promising alternative to traditional techniques. A particularly promising EA for continuous search space, which has been recently developed by authors, is called gradient-based continuous Fuzzy Inf. Eng. (2013) 3: 255-277 257 ant colony optimization (GCACO) [14, 15]. Two versions of this algorithm have been introduced and the second one named GCACO II is more sophisticated than the first one [15]. It has the ability of solving most of nonlinear optimization problems along with handling various imposing constraints. This study, investigates the ability of GCACO II to elicit transparent fuzzy models for nonlinear processes. The proposed procedure is twofold, firstly, subtractive clustering is utilized in order to generate an initial TS fuzzy model, and GCACO II is employed for tuning the parameters of ob- tained fuzzy model. Secondly, tuned fuzzy model in previous step is simplified for eliminating redundant membership functions (MFs). In simplification step, a similar- ity measure is used for merging similar pairs of membership functions resulting from the subtractive clustering. Parameter tuning by GCACO II and membership function simplification are iteratively performed until no more pairs of membership functions satisfy the similarity criterion. The remainder of this paper is organized as following. In Section 2, fuzzy model- ing of nonlinear dynamical systems is described. Section 3 deals with GCACO. Our proposed procedure for extracting interpretable fuzzy models is discussed in Section 4. Section 5 presents case studies and computational results. Finally, Section 6 con- cludes the paper. 2. Fuzzy Modeling Fuzzy modeling and identification from input-output process data has proved to be effective for approximation of uncertain nonlinear dynamic systems [10, 11, 16, 17]. The most frequently applied TS-FIS model tries to decompose the input space of the nonlinear model into fuzzy subspaces and then approximate the system in each subspace by a simple linear regression model. Hence, TS models are often used to represent nonlinear dynamic systems by interpolating between local linear time- invariant (LTI) auto-regressive with exogenous (ARX) input models. So far, most of the attention have been devoted to single input single output (SISO) or multiple-input single-output (MISO) systems. Recently, methods for MIMO systems have been in- vestigated [10, 11] and applied in model-based predictive control. The aim of constructing an FIS is to obtain a set of fuzzy rules that describe the sys- tem behavior as accurately as possible, given a set of operating data and if available an initial set of linguistic rules collected from experts. In this paper, a general fuzzy ARE structure is assumed to describe the system. In such a structure, the system is represented by the following nonlinear vector function: ), U (k− n ) ··· U (k− n − n + 1)), (1) Y (k) = g(Y (k− 1) ··· Y (k− n p d q d where g represents the nonlinear model, Y = [y , y ··· y ] is an n dimensional 1 2 n y output vector, U = [u , u ··· u ] is an n dimensional input vector, n and n are 1 2 n u p q maximum lags considered for the outputs and inputs, respectively, and n is the min- imum discrete dead time. While it may not be possible to find a model that is universally applicable to de- scribe the unknown g(·) system, it would certainly be worthwhile to build local linear models for specific operating points of the process. The modeling framework that is 258 M. Eftekhari· M. Zeinalkhani (2013) based on combining a number of local models, in which each local model has a pre- defined valid operating region, is called operating regime based model [11, 12, 17, 18]. This model is formulated as: n n N p q i i i Y (k) = ψ (X(k− 1))( A Y (k− j)+ B U (k− j− n + 1)+ C ), (2) i d j j i=1 j=1 j=1 th where ψ (X(k)) function describes the operating regime of the i (i = 1 ··· N ) local i r linear ARX model, X = [x , x ··· x ] is a scheduling vector, which is usually a 1 2 n subset of the previous process inputs and outputs X(k− 1) = u (k− n ) ··· u (k− n − n + 1), y (k− 1),··· , y (k− n ) . (3) 1 d n q d 1 n p u y i i i The local models are defined by θ = A , B , C parameter set. As n and n denote i q p j j the maximum lags considered for the previous inputs and outputs, and n is the min- imum discrete dead time, the lags considered for the separate input-output channels i i can be handled by zeroing the appropriate elements of the A and B matrices. j j The main advantage of this framework is its transparency. Furthermore, the op- erating regime of the local models can also be represented by fuzzy sets [10, 11]. This representation is appealing, since many systems change behaviors smoothly as a function of the operating point, and soft transition between the regimes introduced by fuzzy set representation captures this feature in an elegant fashion. Hence, the entire global model can be conveniently represented by TS fuzzy rules [18]. This MIMO nonlinear auto-regressive with exogenous (NARX) input TS fuzzy model is formulated by rules as R : If x is D and ··· and x is D i 1 i,1 n i,n n n p q (4) i i i i then Y (k) = A Y (k− j)+ B U (k− j− n + 1)+ C , j j j=1 j=1 th th where D (x ) is the i antecedent fuzzy set for the j input. i, j j The one-step-ahead prediction of the MIMO fuzzy model Y (k), is inferred by com- puting the weighted average of the output of the consequent multivariable models Nr Y (k) = ψ (X(k− 1)) Y (k), (5) i=1 th where N is the number of the rules, and ψ (X(k− 1)) is the weight of the i rule: r i ⎛ ⎞ n N n ⎜   ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ψ (X(k− 1)) = D (x ) D (x ) . (6) i i, j j ⎜ i, j j ⎟ ⎝ ⎠ j=1 i=1 j=1 In order to obtain a set of N rules and avoiding the problems inherent in grid par- titioning (i.e., rule base explosion), clustering techniques are applied [9-11, 17, 18]. These techniques are employed since they allow scatter input-output space partition- ing. In the following, one of the well-known clustering methods for fuzzy modeling Fuzzy Inf. Eng. (2013) 3: 255-277 259 is introduced. − Subtractive clustering Subtractive clustering is, essentially, a modified form of the mountain method [9, 10]. Thus, let Z be the set of N data points obtained by concatenation of X(k− 1) and Y (k). In the algorithm, each point is seen as a potential cluster center, for which some measure of potential is assigned according to Equation (7): −α z −z [ i j] p = e , (7) j=1 where α = 4/r and r > 0 define the neighborhood radius for each cluster center. a a Thus, the potential associated with each cluster depends on its distance to all the points, leading to clusters with high potential where neighborhood is dense. After calculating potential for each point, the one with higher potential is selected as the ∗ ∗ first cluster center. Let Z be the center of the first group and P its potential. Then 1 1 the potential for each Z is reduced according to Equation (8), especially for the points closer to the center of the cluster: ∗ −β z −z p = p − p e . (8) i i Also β = 4/r and r > 0 represent the radius of the neighborhood for which signif- icant potential reduction will occur. The radius for reduction of potential should be to some extend higher than the neighborhood radius to avoid closely spaced clusters. Typically, r = 1.25r . Since the points closer to the cluster center will have their b a potential strongly reduced, the probability for those points to be chosen as the next cluster is lower. This procedure (selecting centers and reducing potential) is carried out iteratively until stopping criteria is satisfied. Additionally, two threshold levels are defined, one above which a point is selected for a cluster center and the other below which a point is rejected. By the end of clustering, a set of fuzzy rules will be obtained. Each cluster repre- sents a rule. However, since the clustering is carried out in a multidimensional space, the related fuzzy sets must be obtained. As each axis refers to a variable, the centers of the member ship functions are obtained by projecting the center of each cluster in the corresponding axis. The widths are obtained on the basis of the radius r . Generally speaking, developing fuzzy models is a complex process and needs to employ some powerful optimization techniques [10-12, 19]. On the other hand, vari- ous EAs are popular as global optimization methods, which are derivative free [10-12, 19]. Therefore, in several recent researches, EAs have been utilized in order to de- velop FS for modeling, identification and classification tasks [10-12]. These systems are generally named evolutionary fuzzy systems [19]. 3. Continuous Ant Colony Optimization Ant colony optimization (ACO) algorithms belong to the class of meta-heuristic search methods, which are inspired by the social behavior of real permits and ants. 260 M. Eftekhari· M. Zeinalkhani (2013) The initial versions of ACO meta-heuristic algorithms were developed [14, 15] a few years ago. Meta-heuristic algorithms can be considered as a general algorithmic framework for solving different combinatorial optimization problems [14, 15]. The ACO meta- heuristic have become one of the most promising methods for attacking hard combi- natorial optimization problems such as traveling salesman problem (TSP) [14, 15]. A good comparison between evolutionary algorithm (EA) and ACO ones, both as bio-inspired algorithms, was presented in [20]. Also the similarities and differences between population-based incremental learning (PBIL), as an EA, and ACO were dis- cussed in [20]. Both of these methods are memoristic information based to guide the search process. But despite of ACO, the PBIL does not use any heuristic information in its solution construction process. The EAs can be used for solving complex prob- lems in both discrete and continuous domains [21]. The researchers in recent years have been motivated to extend the ACO meta-heuristic into continuous domains. In contrast to conventional use of ACO for solving problems in discrete domains, rela- tively a few works with the purpose of extending ACO algorithms to continuous space have been reported. The first continuous ACO (CACO) algorithm was introduced by Bilchev [21] and other variants and modifications have subsequently been reported by other authors [22, 23]. The Bilchev’s approach and most of the recent methods comprises two stages: global and local. The set of ants are divided into two classes. One type is used to search globally for promising regions in search space and the other ants are used to perform a local search inside the most promising regions. The creation of new regions for global searching is handled by a genetic algorithm like process, while the local ants provide the metaphoric like to ant colonies. Bilchev and Wodrich have reported some weakness of this approach [22]. The expensive main- taining of a history of regions was one of the disadvantages which they pointed out. Also the first CACO didn’t handle constrained optimization problems [22]. Bilchev and Wodrich showed that a few modifications were needed to adapt the CACO for constraint handling [22]. After that, other authors attempted to resolve the constraint handling problem in their new versions of CACO [23]. The authors of this paper have developed a novel algorithm for solving unconstrained continuous numerical optimization problems called GCACO [14]. In GCACO I, global search stage is not connected with the previous developed CACO. Therefore, there aren’t any regions and some GA-like notations for creating them. In this approach, our intension is to maintain the basic framework of ACO meta-heuristic. The GCACO II, the extended version of GCACO I, have developed by authors [15] in order to handle the con- strained problems with linear/nonlinear and equality/inequality constraints. Details of GCACO II are given in [15] for further studies. − GCACO The main design and modifications proposed to adapt the algorithm for the continu- ous spaces are described as below in Table 1 which is a pseudo-code for the proposed algorithm. The first modification of the general ACO toward adaptation to the con- tinuous case is the assignment of nests of the colonies. The first nest is initialized randomly, and the subsequent nests are placed at the best point obtained by the cur- Fuzzy Inf. Eng. (2013) 3: 255-277 261 rent colony. This is illustrated at line 10 of Table 1. The search by each colony is implemented by the function ants generation and activity() as shown in Table 1. The search starts by randomly selecting a feasi- ble point in the search space as the starting nest for the first colony. A number of ants (ant no) are assigned to each colony. Each ant begins the search from the nest exploring and exploiting new feasible points by walking in the search space using pheromone trial based or random directions and a variable step size. In order to achieve a balance between exploration and exploitation, the choice for direction se- lection method is based on anε-greedy like technique. The direction of movement is selected as the most probably best existing direction in the memory, but occasionally, with a small probability q the random method is chosen. In this work, the probabilistic selection mechanism is similar to the movement of real ant which is based on the amount of pheromone trial and heuristics. The mixed information about pheromone trail and heuristic value of each direction are stored as variables in the ant routing table (ART). The probabilities for selecting directions are calculated using this table. Therefore, for each path, the values η and τ are used to i i calculate the decision variable a according to Equation (9). The selection probability P for each direction (path) is calculated based on Equation (10): Directions β β a = (τ ×η )/( τ ×η ) ∀i∈{Directions}, whereβ = 0.4, (9) i i j i j j=1 Directions P = a/( a ). (10) i i j j=1 Now based on the calculated selection probabilities, P , a roulette wheel or greedy like technique may be used for actual selection of the movement direction. Memory consist of a predefined number of direction vectors which are initially null vectors and are replaced by a set of normalized gradient vectors calculated from the previous motions that resulted in a better evaluation function value as the algorithm progresses. The normalized gradient vectors (NGVs) are employed to guide our searching pro- cess. GV is calculated numerically and is the discrete interpretation of gradient vector in which the difference between the new and old function values is divided by cor- responding difference of variables values. NGV is the normalized GV. On the other hand, each ant moves from its previous position in two ways: 1) Along the most promising NGVs, leading to better points in previous motions. 2) Along a unified random vector. The incremental movement of each ant in the search space is based on both variable step size and memorized or random directions. The following update rule is designed which tune and control the step size for the movement of an ant (Lines 28 and 30 of Table 1). X = X + R .ψ(·). (11) new old a X and X are vectors of new and old object variables, R is the radius of activity of new old a 262 M. Eftekhari· M. Zeinalkhani (2013) Table 1: Pseudo-code for GCACO II. 1 Procedure GCACO Meta heuristic( ) 2 Initialize the parameters of GCACO(); 3 While (t < T ) max 4 [Best solution] = ants generation and activity(); 5 if (not Better(Best solution, Global solution)) 6 Evaporate Pheromone(); 7 endif 8 Daemon action(); 9 Reduce the Radius of colony Movement(); 10 Set Nest for colony(Best solution); 11 t = t+ 1; 12 endwhile 13 endprocedure 14 Procedure ants generation and activity() 15 Initialize the Radius of colony by R ; 16 While (Ant count < Total Ants) 17 [best Ant solution] = new active ant(); 18 if (Better(best Ant solution, global Ant solution)) 19 global Ant solution = best Ant solution; 20 Ant count = Ant count+1; 21 endwhile 22 endprocedure 23 procedure new active ant() 24 Initialize R = R , Nest = Nest of colony,τ = 2,ϕ = 0.05; a c 0 25 While (ant is alive) 26 q = generate random value in[0,1]; 27 if(q <= q ) 28 cur location = pre location+ Random generated vector proportional to R ; 29 else 30 cur location = pre location + The best NGV proportional to R ; 31 endif else 32 sum of violations = Check Constraint for violation(current location); 33 penalty = Dynamic constraint handling(sum of violations, Colony number); 34 current eval fun = function (current location) + penalty; 35 if(memory is used) (the use of memorized NGV) new old 36 τ = (1−ϕ)τ +ϕτ ; best index best index 37 η = Calculate cost of solution(current eval fun); best index 38 else 39 NGV = calculate the gradient and normalize it; 40 Bad index = compute worst direction(); 41 Memory[Bad index] = NGV; new old 42 τ = (1−ϕ)τ +ϕτ ; Bad index Bad index 43 η = Calculate cost of solution(current eval fun); bad index 44 endif-else 45 Update ant routing Table(); 46 pre eval fun = cur eval fun; 47 pre location = cur location; 48 if ( R < Ant die radius ) 49 Ant is died; 50 endif 51 R = R × DF; decreasing the radius by Decrease Factor(DF) a a 52 endwhile 53 if (online delayed pheromone update) 54 [Best H index, Best H value] = Find direction with max heuristic value; 55 NGV Best H = Memory[Best H index]; 56 SNGV = calculate the size of NGV; 57 Cost of the corresponding direction = Best H value + SNGV; new old 58 τ = (1−ϕ)τ +ϕ(Cost of the best direction) Bh Bh 59 Update ant routing Table(); 60 end if 61 return the global best ant solution; 62 endprocedure Fuzzy Inf. Eng. (2013) 3: 255-277 263 an ant,ψ(·) is the direction of movement. The best direction of movement is taken as either the most probably best direction in the memory as described above (Line 30 in pseudo-code) or generated randomly as vectors with elements taken from a uniform distribution (Line 28 in pseudo-code) with values in the range of [-1, 1]. A radius is considered for each colony and it reduces when the algorithm proceeds (Line 9 in pseudo-code). Therefore the last colonies have smaller radius for search. The radius also diminishes by ants of a colony (Line 51 in pseudo-code). All ants in a colony start their search with radius defined for that colony and finish their search when the radius reaches to zero (this reduction is performed by each ant in colony). Consequently, radius reduction occurs in two levels, firstly in colony level, secondly for ants that exist in an arbitrary colony. After the movement of an ant to a new point, its feasibility is inspected for the pur- pose of avoiding the violation of the variable bounds. For moving to the new feasible point, if the direction in the memory has been used, then the amount of pheromone and heuristic value of corresponding direction vector are updated according to Equa- tions (12), (13). new old τ = (1−ϕ)τ +ϕτ , i ∈ direction vector indices, (12) i i 1/ f , if f > 0, (a) current current η = i ∈ direction vector indices. (13) | f |, if f ≤ 0, (b) current current Equation (13) is used for minimizing while interchanging (a) and (b) is utilized for maximization. The absolute of the current value of evaluation function is consid- ered as the heuristic for evaluating the decision variable for that particular direction. However, for a random motion a new random vector is generated as the trial of the ant movement. If the evaluation function of the new position shows an improvement, the new vector is viewed as a good direction, leading to a better situation. The NGV is calculated based on the coordinates of the previous and this new points and their corresponding evaluation function. The calculated NGV replaces the worst vector in memory which is the one with lowest probability of selection. In a similar manner to the case of memory based movement, the ART is updated. Each ant returns the best-visited point as the solution. A global solution of a colony is taken as the best solution among the solutions returned by all ants. The overall solution returned by a colony is the outcome of a foraging behavior of all the colony’s members. Within each colony, the ants are restricted to move inside a circle with initial radius of R = R (R is the radius of activities of the current colony). The a c c radius of activities for an ant is decreased by a fixed rate after each movement until it is less than a predefined threshold in which case the life cycle ends. The next ant starts from the nest with a new radius of activity initialized with the value of the radius of activity of the current colony. The whole procedure is repeated until ants of a colony are exhausted. Another essential difference between the proposed algorithm and general ACO is in the structure and use of memory. The memory in this algorithm is not private as in the case of general ACO. For the purpose of increasing the corporation between the individuals with respect to the concept of stigmergy, the memory is shared among all 264 M. Eftekhari· M. Zeinalkhani (2013) ants of a colony because the routing table will pass all information for one colony to the next colony. Therefore, each ant is capable of sensing the modified environment by utiliz- ing the memory which contains better search directions in space. The procedure new active ant of Table 1 is executed for each ant. Each active ant is assigned certain number of tasks during a life cycle. The global solution returned by a colony is assigned as the nest for the next colony, and the above described procedure for ants leaving the nest for the food is repeated. In order to balance exploitation and exploration, for each colony a radius of activities = 1 for the first colony is defined. This radius is decreased exponentially as with R a function of current nest generation (or current colony number) at the beginning of a new nest. Consequently, the latest colonies will have a smaller search space. The reduction of the radius is based on the following: (−t/(C×T )) max R = R × e , (14) c c where t is the current nest generation (i.e., iteration number or colony number), T max the maximum number of iterations and C is the learning rate (constant). Global pheromone evaporation also takes place when the solution of the current colony, com- pared with the previous one, presents no improvement (the algorithm is in stagnation situation and should forget previous trail of pheromones). This is implemented based on the following: new old τ = τ × evap f actor, (15) where evap f actor is a constant value in range [0, 1] causing the reduction in amount new old of pheromone deposition, τ and τ are old and the new values of the amount of pheromone deposition respectively. The procedure is repeated until either some defined criteria are satisfied or the max number of colonies is exhausted. There exist several approaches for coping with constrained optimization problems and most of them have involved the use of penalty functions causing to produce an unconstrained problem from the constrained one by using the modified evaluation function [24]. f (X), if X ∈ F, evaluation(X) = (16) f (X)± Penalty(X), otherwise, where F is the feasible region, X stands for the solution vector and plus is used when the goal is minimizing and minus is for maximizing. Penalty function is a function of violation measure that measures how far an infeasible solution is from the feasible region F. Some form of distance measures are usually used to measure the violations. In this paper, a dynamic method of penalization, that has been introduced in [25], is employed. The penalty function is defined according to that method as following: θ λ penalty(X) = (C ) V (X), (17) no j=1 Fuzzy Inf. Eng. (2013) 3: 255-277 265 where C is the colony number (i.e., t in the algorithm), θ is a constant for adjust- no ing the weight of penalization when the search process is progressing. The notation th V (1 ≤ j ≤ k) is used for denoting the j constraint violation, where k is the total number of constraints. Also λ is a constant and usually is set to 2. Instead of this constant, the absolute value of V can be used. In this work, the potential of GCACO II for extracting interpretable fuzzy mod- els for the sake of identification is investigated. This algorithm is utilized for the parameter tuning of a TS-FIS model produced by subtractive clustering. The mean square error (MSE) between model output and actual output is considered as evalua- tion function. 4. Generating Interpretable Fuzzy Model Similar fuzzy sets represent almost the same region in the universe of discourse of a fuzzy variable, i.e., they describe the same concept. Therefore, the first step in at- taining model interpretability consists of finding and merging similar MFs. Structure learning by means of subtractive clustering technique leads to a set of initial MFs, some of which may be highly similar and result in some unnecessary rules. Thus, the model will lack transparency and it seems useful to merge similar MFs. In the fol- lowing, at first an interpretability measure is introduced that has been utilized in the literature and by which similar MFs can be merged. Then based on this measure, our proposed procedure for generating interpretable fuzzy models is described in detail. The proposed procedure is and evolutionary fuzzy system that employs GCACO II for producing approximate and interpretable fuzzy models. − Proposed procedure Let A and B be two fuzzy sets. The measure of similarity of these sets may vary from 0, indicating completely distinct to 1 indicating completely similar. The most common similarity measure fuzzy sets given in the literature [10, 25, 26] is based on the intersection and union operations and defined as follows: |A∩ B| |A∩ B| S (A, B) = = , (18) |A∪ B| |A|+|B|−|A∩ B| where S is the similarity measure and |·| indicates the size of a set, and intersection and union operators are denoted by ∩ and ∪ respectively [25]. The implementation of this measure in a continuous universe of discourse will prove computationally intensive, particularly for Gaussian MF which are produced by subtractive clustering. Therefore, some simplification methods for calculating the similarity of fuzzy sets have been suggested, for example, triangular function with centre c and width σ π is utilized by [26] for similarity evaluation. This method is adapted in this work for calculation of the similarity between fuzzy sets. If the similarity measure for two fuzzy sets is greater than a predefined threshold, both sets are replaced by a new fuzzy set having parameters given by the following equations σ c +σ c 1 1 2 2 c = , (19) new σ +σ 1 2 266 M. Eftekhari· M. Zeinalkhani (2013) σ +σ 1 2 σ = . (20) new Assume that the three following rules have been generated by employing subtrac- tive clustering for a system with n inputs and one output. R : If x is D and ··· and x is D then g (x , x ,··· , x ) 1 1 1,1 n 1,n 1 1 2 n R : If x is D and ··· and x is D then g (x , x ,··· , x ) (21) 2 1 2,1 n 2,n 2 1 2 n R : If x is D and ··· and x is D then g (x , x ,··· , x ). 3 1 3,1 n 3,n 3 1 2 n g (x , x ··· x ) = a x + a x ··· a x + a , (22) i 1 2 n 1,i 1 2,i 2 n,i n n+1,i th th th where D indicates the i MF on the j input. The antecedent of i rule is consti- i, j th tuted from the conjunction (and)of i MFs on each of n inputs. a , a ··· a , a 1,i 2,i n,i n+1,i are the output parameters of the TS-FIS. If the third MF of each input (D ) variable 3,i merges to the other MFs on that input, then the third rule constructing from the con- junction of the third MFs will be omitted. In this manner, redundant rules may be removed from the rule base during the MF merging process. The procedure of mem- bership function merging, one pair per iteration, continues until no more pairs satisfy the merging threshold. The function created after merging is a merging candidate in the following iterations. The flowchart of algorithm for generating a comprehensible fuzzy system is shown in Fig.1 given below. Fig. 1 The algorithm for generating interpretable FIS Fuzzy Inf. Eng. (2013) 3: 255-277 267 The subtractive clustering method is used for creating an initial FIS model. The FIS is refined and redundant rules are removed by the procedure described above, the GCACO II is utilized for parameter tuning. After terminating the above procedure, the simplified FIS are tuned by GCACO II in order to obtain an appropriate degree of accuracy (i.e., MSE). Accuracy can be considered as another criterion for stopping the procedure. A threshold imposed on the required degree of accuracy can result in the termination of the simplifying procedure and consequently the whole procedure. The proposed algorithm in this section is similar to that of proposed in [10] by authors in which differential evolution (DE) has been employed in tuning step. The added advantage of GCACO II over DE is its ability to come up with constraint optimization problems. Due to abilities of GCACO II for handling constraints, in the tuning step of the above algorithm, good models in terms of validity and performance are identified. Validation criteria are utilized as imposed constraints to the performance objective which cause to produce valid models. In the case of NARX model validation, the noise model is not specifically estimated and consequently, the residuals may be col- ored. Specific tests are required and the estimated non-linear model will only be unbiased if and only if [12, 27] XC 2 2 (τ) = 0 ∀τ, (23) u ε XC 2 (τ) = 0 ∀τ, (24) u ε XC (τ) = 0 ∀τ, (25) uε where XC denotes the cross correlation, ε is the residual vector (containing all resid- uals), and u is the input vector (containing all inputs). The target value for these constraints is set up to be the 95% confidence limit. The correlation based validation objectives are contained in a (2τ+ 1) element vector. In order to define these functions as scalar, the infinity norm is utilized as following: in f N = ||XC 2 2|| , (26) 2 2 u ε u ε in f N = ||XC 2 || , (27) u ε ∞ u ε in f N = ||XC || , (28) uε uε ∞ The identification process then evolves through regions of the search space where valid fuzzy models are located (i.e., the above scalars be less or equal than predefined values). 5. Computational Results The proposed method is applied to Lorenz system, Mackey-Glass time series, and flexible robot arm problems and obtain results compared with those obtained from similar methods. 5.1. Simulation Conditions The radius of subtractive clustering algorithm is set to 0.8. The other parameters 268 M. Eftekhari· M. Zeinalkhani (2013) of this algorithm are considered by their default values. The similarity threshold is selected as 0.8 in all case studies. 2000 and 1000 data samples are generated for the first and second case studies respectively through solving their differential equations. 1000 data samples for the third example are taken randomly from the public domain for system identification [28]. The lags for outputs and inputs and the dead time are set n = 4, n = 0 and n = 1 respectively for the last case study. The parameters of p q d GCACO II are given in Table 2 below: Table 2: Parameters values of GCACO II used in this work. No. of ants 50 No. of colonies 50 Initial radius 1 Radius decrease factor 0.7 C.(constant for decreasing colony radius) 3 q prob. 0.35 Evap. rate 0.95 ϕ 0.03 ρ 0.05 Mem. size 10 5.2. Case Studies Case 1 Lorenz System The Lorenz system is described by the following differential equations [9]: 2 2 x ˙ = −y − z − a(x− F ), (29) Lorenz y ˙ = xy− bxz− y+ G , (30) Lorenz z ˙ = bxy+ xz− z. (31) Parameters of Lorenz system are a = 0.25, b = 4.0, F = 8.0, and G = 1.0. Lorenz Lorenz In the simulation, x(t) is predicted from x(t− 1), y(t− 1) and z(t− 1). Two thousand data points are obtained from Equations (29)-(31) using the fourth order Runge-Kutta method with a step length of 0.05, where 1000 pairs of data are used for training and the other 1000 for test. The sampling data pairs are shown in Fig.2. The best fuzzy model and its output are shown in Figs.3-6. The antecedent fuzzy MFs are depicted in Fig.4. The only fuzzy rule is given in Fig.3. The solid curves in Fig.5 and Fig.6 depict actual outputs for the first 400 samples of train and test data respectively. The dotted curves represent the output behavior of the obtained model on 400 samples of train and test data respectively. Mean square error (MSE) for train and test data are MS E − train = 1.2345E-4 and MS E − test = 2.8456E-4. Case 2 Mackey-Glass Time Series Fuzzy Inf. Eng. (2013) 3: 255-277 269 Fig. 2 Inputs: x(t− 1), y(t− 1) and z(t− 1) and output x(t) of Lorenz system R :If (x(t− 1) is Middle) and (y(t− 1) is Middle) and (z(t− 1) is Middle) then x(t) = 0.9952 x(t− 1)− 0.0092 y(t− 1)− 0.0281 z(t− 1)+ 0.0033 Fig. 3 Rule representation of fuzzy system for Lorenz system Fig. 4 Distribution of MFs for antecedent of the above rule 270 M. Eftekhari· M. Zeinalkhani (2013) Fig. 5 The actual and model output for the first 400 samples of train data Fig. 6 The actual and model output for the first 400 samples of test data Fuzzy Inf. Eng. (2013) 3: 255-277 271 The Mackey-Glass time series is described as follows [8]: ax(t− r) x ˙ = − cx(t). (32) 1+ x (t− r) Parameters of Mackey-Glass time series are a = 0.2, b = 10, c = 0.1 and r = 30. The goal is to predict x(t) from x(t− 1), x(t− 2) and x(t− 3) 1000 data points are obtained from Equation (32) using the fourth order Runge-Kutta method with a step length of 1 and the initial condition x(0) = 1.2, where 500 pairs of data are used for training and the other 500 for test. The sampling datum pairs are shown in Fig.7. Fig. 7 Inputs: x(t− 1), x(t− 2) and x(t− 3) and output x(t) of Mackey-Glass system The best fuzzy model and its output are shown in Figs.8-11. The antecedent fuzzy MFs are depicted in Fig.9. The only fuzzy rule is given in Fig.8. R1: If (x(t− 3) is Big) and (x(t− 2) is Big) and (x(t− 1) is Small) then x(t) = 0.0184x(t− 3)− 1.0149x(t− 2)+ 1.9841x(t− 1)+ 0.0110 Fig. 8 Rule representation of fuzzy system for Mackey-Glass system The solid curves in Fig.10 and Fig.11 depict the actual outputs for the first 400 samples of train and test data respectively. The dotted curves represent the output behavior of the obtained model on 400 samples of train and test data respectively. MSE for train and test data are MS E train = 3.2672E-5 and MS E test = 3.6383E-5. Case 3 Flexible Robot Arm The arm is installed on an electrical motor. The transfer function represents the rela- 272 M. Eftekhari· M. Zeinalkhani (2013) Fig. 9 MFs of antecedent of the above rule Fig. 10 The actual and model output for the first 400 samples of train data Fig. 11 The actual and model output for the first 400 samples of test data Fuzzy Inf. Eng. (2013) 3: 255-277 273 tion between measured reaction torque of the structure on the ground to the accelera- tion of the flexible arm. Therefore, the input is reaction torque of the structure and the output is acceleration of the flexible arm. The applied input is a periodic sine sweep. As mentioned above, 1000 data samples for the third example are taken randomly from the public domain for system identification [28]. Seventy percent of the data is selected randomly for training and the rest is used for test. The best fuzzy model and its output are shown in Figs.12-15. The antecedent fuzzy MFs are depicted in Fig.13. The only fuzzy rule is given in Fig.12. R1: If u(t− 1) is Small and y(t− 1) is Middle and y(t− 2) is Middle and y(t− 3) is Big and y(t− 4) is Small then y(t) = 0.5432u(t − 1) + 0.24341 y(t − 1) + 0.3550y(t − 2) − 0.6754y(t− 3)+ 0.29995y(t− 4)+ 0.0723 Fig. 12 Rule representation of fuzzy system for flexible robot arm plant Fig. 13 Antecedent MFs of the only rule Fig.14 and Fig.15 depict the model and actual outputs for 200 number of the train and test data respectively. The calculated MSE of the trained and test data for the best models are 6.6129E-5 and 7.5843E-5, respectively. 5.3. Comparisons As mentioned in previous sections, authors have recently developed a similar algo- rithm in which DE was utilized instead of GCACO. In this section, Table 3 containing a comparison between results obtained in this work and those of past researches is given. As can be seen from the above given table, results of this study are well comparable to recently performed studies by authors in terms of accuracy and compactness of 274 M. Eftekhari· M. Zeinalkhani (2013) Fig. 14 Model and actual output for train data Fig. 15 Model and actual output for test data Fuzzy Inf. Eng. (2013) 3: 255-277 275 Table 3: Comparing the best recently reported fuzzy models in the literature and that of this paper. Method Case studies No. of rules MSE train MSE test [10] No.1 1 2.223E-4 3.123E-4 No.2 1 3.3116E-5 3.6563E-5 No.3 1 6.8935E-5 7.8843E-5 [11] No.1 1 2.4566E-4 3.4453E-4 No.2 1 3.7846E-5 3.8736E-5 No.3 1 4.6129E-5 4.5843E-5 [12] No.1 1 1.2344E-4 2.7654E-4 No.2 1 3.0101E-5 3.1121E-5 No.3 1 3.9888E-5 3.8276E-5 This paper No.1 1 1.2355E-4 2.8456E-4 No.2 1 3.2672E-5 3.6383E-5 No.3 1 6.6129E-5 7.5843E-5 rule base. Although MSE values are not very better than those of past researches, they are near to them. Furthermore, models obtained in this study have been checked for validity via the added advantage of GCACO II that is the constraint handling. Consequently, this study considers all objectives of system identification task, namely accuracy, interpretability, compactness and validity conditions. Recently, in Ref.12 authors have proposed a hybrid based evolutionary algorithm for fuzzy identification of MIMO nonlinear systems. In that study, all objectives of fuzzy based system identification were taken into account and the proposed procedure has employed the abilities of a multi-objective genetic algorithm as well as DE. Moreover, the type of MFs was considered to be Gaussian combinational MFs (abbreviated as Gauss2mf). Therefore as it is apparent from Table 3, the best results in terms of accuracy and compactness are related to the method described in [12]. Considering abilities of the method introduced in [12] and regarding the results obtained in this paper, it is concluded that the proposed method based on GCACO II is well comparable to the others in terms of accuracy and interpretability of resulting fuzzy models. Also the procedure is easy to implement and computationally efficient. 6. Conclusion The ability of GCACO for tuning an FIS of the TS type is investigated in this paper. Chiu’s subtractive clustering as an automatic data-driven based method is utilized for constructing the primary FIS. This method, as the other clustering-based ones, has the advantage of avoiding the explosion of the rule base. Then an iterative algorithm with two steps is employed to produce a simplified FIS in terms of number of fuzzy sets and rules. In the first step, the parameters of the model are adjusted by utilizing the GCACO. In the second step, the similar membership functions of the previous 276 M. Eftekhari· M. Zeinalkhani (2013) obtained model merge. These two steps are performed until no more pairs of mem- bership function satisfy the merging criterion. Moreover, two well-known chaotic time series, namely Lorenz and Mackey-Glass and an industrial plant (i.e., a flexible robot arm) are used as case studies. The results illustrate the success of proposed sim- plification procedure along with GCACO for obtaining accurate and transparent fuzzy models. Models obtained in this study have been checked for validity via the added advantage of GCACO II that is the constraint handling. Consequently, this study considers all objectives of system identification task, namely accuracy, interpretabil- ity, compactness and validity conditions comparing to similar past researches. Acknowledgements The authors would like to appreciate Dr. Malihe Maghfoori Farsangi for the time she has devoted to proofreading the paper. References 1. Eksin I, Erol O K (2000) A fuzzy identification method for nonlinear systems. Turkish Journal of Electrical Engineering & Computer Sciences 8(2): 125-135 2. Jang J S R (1993) ANFIS: Adaptive-network-based fuzzy inference systems. IEEE Transactions on Systems, Man, and Cybernetics 23(3): 665-685 3. Sanchez L, Couso I, Corrales J A (2001) Combining GP operators with SA search to evolve fuzzy rule based classifiers. Information Sciences 136: 175-191 4. Casillas J, Cordon O, Herrera F, Magdalena L (2003) Accuracy improvements in linguistic fuzzy modeling. Berlin: Springer 5. Paiva R P, Dourado A (2004) Interpretability and learning in neuro-fuzzy systems. Fuzzy Sets and Systems 147(1): 17-38 6. Jin Y, Sendhoff B (2003) Extracting interpretable fuzzy rules from RBF networks. Neural Processing Letters 17(2): 149-164 7. Wang H, Kwong S, Jin Y, Wei W, Man K F (2005) Agent-based evolutionary approach for inter- pretable rule-based knowledge extraction. IEEE Transactions on Systems, Man, and Cybernetics-Part C 35(2): 143-155 8. Wang H, Kwong S, Jin Y, Wei W, Man K F (2005) Multi-objective hierarchical genetic algorithm for interpretable fuzzy rule-based knowledge extraction. Fuzzy Sets and Systems 149(1): 149-186 9. Chiu S L (1994) Fuzzy model identification based on cluster estimation. Journal of Intelligent and Fuzzy Systems 2: 267-278 10. Eftekhari M, Katebi S D, Karimi M, Jahanmiri A H (2008) Eliciting transparent fuzzy model using differential evolution. Applied Soft Computing 8(1): 466-476 11. Eftekhari M, Katebi S D (2008) Extracting compact fuzzy rules for nonlinear system modeling using subtractive clustering, GA and unscented filter. Applied Mathematical Modeling 32(12): 2634-2651 12. Eftekhari M, Majidi M, Nezamabadi P H (2012) Securing interpretability of fuzzy models for mod- eling nonlinear MIMO systems using a hybrid of evolutionary algorithms. Iranian Journal of Fuzzy Systems 9(1): 61-77 13. Zhou S M, Gan J Q (2008) Low-level interpretability and high-level interpretability: A unified view of data-driven interpretable fuzzy system modeling. Fuzzy Sets and Systems 159(23): 3091-3131 14. Eftekhari M, Daei B, Katebi S D (2006) Gradient-based ant colony optimization for continuous spaces. Esteghlal Journal of Eng. 25(1): 33-45 15. Eftekhari M, Moosavi M R, Katebi S D (2006) Solving constrained continuous optimization problems with GCACO II. 11th Annual Conference of Computer Society of Iran: 180-188 16. Abonyi J (2003) Fuzzy model identification for control. Boston: Birkhauser 17. Abonyi J, Babuska R, Szeifert F (2002) Modified gath-geva fuzzy clustering for identification of Takagi-Sugeno fuzzy models. IEEE Transactions on Systems, Man, and Cybernetics-Part B 32(5): 612-621 18. Feil B, Abonyi J, Madar J, Nemeth S, Arva P (2004) Identification and analysis of MIMO systems based on clustering algorithm. Acta Agraria Kaposvariensis 8(3): 191-203 Fuzzy Inf. Eng. (2013) 3: 255-277 277 19. Herrera F (2008) Genetic fuzzy systems: taxonomy, current research trends and prospects. Evolu- tionary Intelligence 1(1): 27-46 20. Alonso S, Cordon O, Fernandez de Viana I, Herrera F (2004) Integrating evolutionary computation components in ant colony optimization evolutionary algorithms: an experimental study. In: L. Nunes de Castro, F. J. Von Zuben (Eds.), Recent Developments in Biologically Inspired Computing, Idea Group Publishing 21. Bilchev G, Parmee I C (1995) The ant colony metaphor for searching continuous design spaces. Lecture Notes in Computer Science 993: 25-39 22. Wodrich M, Bilchev G (1997) Cooperative distributed search: The ants way. Control and Cybernetics 26(3): 413-445 23. Socha K, Dorigo M (2008) Ant colony optimization for continuous domains. European Journal of Operational Research 185: 1155-1173 24. Michalewicz Z, Fogel D B (2005) How to solve it: Modern heuristics. Berlin: Springer Verlag 25. Dubois D J, Prade H M (1980) Fuzzy sets and systems: Theory and applications. New York: Aca- demic Press 26. Chao C T, Chen Y J, Teng C C (1996) Simplification of fuzzy-neural systems using similarity analy- sis. IEEE Transactions on Systems, Man, and Cybernetics-Part B 26(2): 344-354 27. Rodriguez-Vazquez K (1999) Multiobjective evolutionary algorithms in non-linear system identifica- tion. Ph. D Thesis, Department of Automatic Control and Systems Engineering, The University of Sheffield 28. Moor B D (2010) DaISy: Database for the identification of systems. Department of Electrical Engi- neering, ESAT/SISTA, K.U.Leuven, Belgium. http://www.esat.kuleuven.ac.be/sista/daisy http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Fuzzy Information and Engineering Taylor & Francis

Extracting Interpretable Fuzzy Models for Nonlinear Systems Using Gradient-based Continuous Ant Colony Optimization

Loading next page...
 
/lp/taylor-francis/extracting-interpretable-fuzzy-models-for-nonlinear-systems-using-NRJZdHU8qm

References (32)

Publisher
Taylor & Francis
Copyright
© 2013 Taylor and Francis Group, LLC
ISSN
1616-8666
eISSN
1616-8658
DOI
10.1007/s12543-013-0144-2
Publisher site
See Article on Publisher Site

Abstract

Fuzzy Inf. Eng. (2013) 3: 255-277 DOI.10.1007/s12543-013-0144-2 ORIGINAL ARTICLE Extracting Interpretable Fuzzy Models for Nonlinear Systems Using Gradient-based Continuous Ant Colony Optimization M. Eftekhari · M. Zeinalkhani Received: 15 October 2012/ Revised: 15 March 2013/ Accepted: 10 April 2013/ © Springer-Verlag Berlin Heidelberg and Fuzzy Information and Engineering Branch of the Operations Research Society of China Abstract This paper exploits the ability of a novel ant colony optimization al- gorithm called gradient-based continuous ant colony optimization, an evolutionary methodology, to extract interpretable first-order fuzzy Sugeno models for nonlin- ear system identification. The proposed method considers all objectives of system identification task, namely accuracy, interpretability, compactness and validity con- ditions. First, an initial structure of model is obtained by means of subtractive clus- tering. Then, an iterative two-step algorithm is employed to produce a simplified fuzzy model in terms of number of fuzzy sets and rules. In the first step, the param- eters of the model are adjusted by utilizing the gradient-based continuous ant colony optimization. In the second step, the similar membership functions of an obtained model merge. The results obtained on three case studies illustrate the applicability of the proposed method to extract accurate and interpretable fuzzy models for nonlinear system identification. Keywords Continuous ant colony· Interpretable · Fuzzy modeling 1. Introduction Fuzzy models become useful when a system cannot be defined in precise mathemat- ical terms [1]. The non-fuzzy or traditional representations require a well-structured model and well defined model parameters [1]. However, in practice, there may be uncertainties, unpredicted dynamics and other unknown phenomena that cannot be mathematically modeled. The main contribution of fuzzy modeling theory is its abil- ity to handle many practical problems that cannot be adequately represented by con- ventional methods. Fuzzy modeling of nonlinear systems has been the focus of many M. Eftekhari ()· M. Zeinalkhani Department of Computer Engineering, Shahid Bahonar University of Kerman, Kerman, Iran email: m.eftekhari@mail.uk.ac.ir 256 M. Eftekhari· M. Zeinalkhani (2013) scientific researches. Takagi and Sugeno (TS) [2] have proposed a search algorithm for a fuzzy controller and generalized their techniques to fuzzy identification. Jang [2] proposed an architecture and a learning procedure which combines fuzzy logic with neural networks for inference. The adaptive network based fuzzy inference sys- tem (FIS) is capable of constructing input-output mapping accurately based on both human’s knowledge and stipulated input-output data pairs. However, once a fuzzy model is developed, in most cases it needs to undergo an optimization process. There are two main goals in such optimization tasks. The first one is a model structure optimization (i.e., the number of rules) and the second objective is the fine tuning of model’s parameters (i.e., the parameters of input and output membership functions). Several investigations about the interpretability of fuzzy classification problems have recently been performed [3-5]. Also many studies regarding to the transparency have been reported in the fields of decision making, prediction of process behavior, data mining and others [6-8]. An automatic data-driven based method for constructing the initial fuzzy models is Chiu’s subtractive clustering [9]. This method has the advantage of avoiding the explosion of the rule base, a problem known as the “curse of dimensionality”. There- fore, clustering-based methods would be preferred to the grid partitioning techniques. In this contribution, the subtractive clustering technique is used for creating an initial FIS model. Recently, some studies have been performed in order to develop TS models for multi-input multi-output (MIMO) nonlinear system identification [10-12]. The main goal of these studies is to achieve objectives of system identification as well as secur- ing the interpretability of fuzzy model. In a recent research, a twofold taxonomy of interpretability has been presented in the literature based on the previous studies in this field [13]. According to this classification, low-level and high-level interpretabil- ity are introduced. The low-level interpretability defines some constraints for the sake of securing interpretability at fuzzy set level, while the high-level interpretabil- ity is defined at the fuzzy rule level. As mentioned in [13], distinguishability is the most important semantic constraint for achieving low-level interpretability. Accord- ing to Occam’s razor parsimony principle in machine learning, from all models that can describe a process accurately the simplest one is the best. So, parsimony is a basic criterion for achieving high-level interpretability. Therefore, distinguishability and parsimony are the first concerns toward achieving both low and high-level inter- pretability separately. Obtaining a suitable interpretability is a complex and computa- tionally prohibitive problem especially when the goal is to design approximate fuzzy models [10-12]. Evolutionary algorithms (EAs) have been widely used for eliciting fuzzy mod- els owing to their ability in searching for optimal solutions in an irregular and high dimensional solution space [10-12]. Differential evolution (DE), one of the most well-known EAs, has been utilized in order to develop transparent fuzzy models [10, 12]. EAs and other stochastic search techniques seem to be a promising alternative to traditional techniques. A particularly promising EA for continuous search space, which has been recently developed by authors, is called gradient-based continuous Fuzzy Inf. Eng. (2013) 3: 255-277 257 ant colony optimization (GCACO) [14, 15]. Two versions of this algorithm have been introduced and the second one named GCACO II is more sophisticated than the first one [15]. It has the ability of solving most of nonlinear optimization problems along with handling various imposing constraints. This study, investigates the ability of GCACO II to elicit transparent fuzzy models for nonlinear processes. The proposed procedure is twofold, firstly, subtractive clustering is utilized in order to generate an initial TS fuzzy model, and GCACO II is employed for tuning the parameters of ob- tained fuzzy model. Secondly, tuned fuzzy model in previous step is simplified for eliminating redundant membership functions (MFs). In simplification step, a similar- ity measure is used for merging similar pairs of membership functions resulting from the subtractive clustering. Parameter tuning by GCACO II and membership function simplification are iteratively performed until no more pairs of membership functions satisfy the similarity criterion. The remainder of this paper is organized as following. In Section 2, fuzzy model- ing of nonlinear dynamical systems is described. Section 3 deals with GCACO. Our proposed procedure for extracting interpretable fuzzy models is discussed in Section 4. Section 5 presents case studies and computational results. Finally, Section 6 con- cludes the paper. 2. Fuzzy Modeling Fuzzy modeling and identification from input-output process data has proved to be effective for approximation of uncertain nonlinear dynamic systems [10, 11, 16, 17]. The most frequently applied TS-FIS model tries to decompose the input space of the nonlinear model into fuzzy subspaces and then approximate the system in each subspace by a simple linear regression model. Hence, TS models are often used to represent nonlinear dynamic systems by interpolating between local linear time- invariant (LTI) auto-regressive with exogenous (ARX) input models. So far, most of the attention have been devoted to single input single output (SISO) or multiple-input single-output (MISO) systems. Recently, methods for MIMO systems have been in- vestigated [10, 11] and applied in model-based predictive control. The aim of constructing an FIS is to obtain a set of fuzzy rules that describe the sys- tem behavior as accurately as possible, given a set of operating data and if available an initial set of linguistic rules collected from experts. In this paper, a general fuzzy ARE structure is assumed to describe the system. In such a structure, the system is represented by the following nonlinear vector function: ), U (k− n ) ··· U (k− n − n + 1)), (1) Y (k) = g(Y (k− 1) ··· Y (k− n p d q d where g represents the nonlinear model, Y = [y , y ··· y ] is an n dimensional 1 2 n y output vector, U = [u , u ··· u ] is an n dimensional input vector, n and n are 1 2 n u p q maximum lags considered for the outputs and inputs, respectively, and n is the min- imum discrete dead time. While it may not be possible to find a model that is universally applicable to de- scribe the unknown g(·) system, it would certainly be worthwhile to build local linear models for specific operating points of the process. The modeling framework that is 258 M. Eftekhari· M. Zeinalkhani (2013) based on combining a number of local models, in which each local model has a pre- defined valid operating region, is called operating regime based model [11, 12, 17, 18]. This model is formulated as: n n N p q i i i Y (k) = ψ (X(k− 1))( A Y (k− j)+ B U (k− j− n + 1)+ C ), (2) i d j j i=1 j=1 j=1 th where ψ (X(k)) function describes the operating regime of the i (i = 1 ··· N ) local i r linear ARX model, X = [x , x ··· x ] is a scheduling vector, which is usually a 1 2 n subset of the previous process inputs and outputs X(k− 1) = u (k− n ) ··· u (k− n − n + 1), y (k− 1),··· , y (k− n ) . (3) 1 d n q d 1 n p u y i i i The local models are defined by θ = A , B , C parameter set. As n and n denote i q p j j the maximum lags considered for the previous inputs and outputs, and n is the min- imum discrete dead time, the lags considered for the separate input-output channels i i can be handled by zeroing the appropriate elements of the A and B matrices. j j The main advantage of this framework is its transparency. Furthermore, the op- erating regime of the local models can also be represented by fuzzy sets [10, 11]. This representation is appealing, since many systems change behaviors smoothly as a function of the operating point, and soft transition between the regimes introduced by fuzzy set representation captures this feature in an elegant fashion. Hence, the entire global model can be conveniently represented by TS fuzzy rules [18]. This MIMO nonlinear auto-regressive with exogenous (NARX) input TS fuzzy model is formulated by rules as R : If x is D and ··· and x is D i 1 i,1 n i,n n n p q (4) i i i i then Y (k) = A Y (k− j)+ B U (k− j− n + 1)+ C , j j j=1 j=1 th th where D (x ) is the i antecedent fuzzy set for the j input. i, j j The one-step-ahead prediction of the MIMO fuzzy model Y (k), is inferred by com- puting the weighted average of the output of the consequent multivariable models Nr Y (k) = ψ (X(k− 1)) Y (k), (5) i=1 th where N is the number of the rules, and ψ (X(k− 1)) is the weight of the i rule: r i ⎛ ⎞ n N n ⎜   ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ψ (X(k− 1)) = D (x ) D (x ) . (6) i i, j j ⎜ i, j j ⎟ ⎝ ⎠ j=1 i=1 j=1 In order to obtain a set of N rules and avoiding the problems inherent in grid par- titioning (i.e., rule base explosion), clustering techniques are applied [9-11, 17, 18]. These techniques are employed since they allow scatter input-output space partition- ing. In the following, one of the well-known clustering methods for fuzzy modeling Fuzzy Inf. Eng. (2013) 3: 255-277 259 is introduced. − Subtractive clustering Subtractive clustering is, essentially, a modified form of the mountain method [9, 10]. Thus, let Z be the set of N data points obtained by concatenation of X(k− 1) and Y (k). In the algorithm, each point is seen as a potential cluster center, for which some measure of potential is assigned according to Equation (7): −α z −z [ i j] p = e , (7) j=1 where α = 4/r and r > 0 define the neighborhood radius for each cluster center. a a Thus, the potential associated with each cluster depends on its distance to all the points, leading to clusters with high potential where neighborhood is dense. After calculating potential for each point, the one with higher potential is selected as the ∗ ∗ first cluster center. Let Z be the center of the first group and P its potential. Then 1 1 the potential for each Z is reduced according to Equation (8), especially for the points closer to the center of the cluster: ∗ −β z −z p = p − p e . (8) i i Also β = 4/r and r > 0 represent the radius of the neighborhood for which signif- icant potential reduction will occur. The radius for reduction of potential should be to some extend higher than the neighborhood radius to avoid closely spaced clusters. Typically, r = 1.25r . Since the points closer to the cluster center will have their b a potential strongly reduced, the probability for those points to be chosen as the next cluster is lower. This procedure (selecting centers and reducing potential) is carried out iteratively until stopping criteria is satisfied. Additionally, two threshold levels are defined, one above which a point is selected for a cluster center and the other below which a point is rejected. By the end of clustering, a set of fuzzy rules will be obtained. Each cluster repre- sents a rule. However, since the clustering is carried out in a multidimensional space, the related fuzzy sets must be obtained. As each axis refers to a variable, the centers of the member ship functions are obtained by projecting the center of each cluster in the corresponding axis. The widths are obtained on the basis of the radius r . Generally speaking, developing fuzzy models is a complex process and needs to employ some powerful optimization techniques [10-12, 19]. On the other hand, vari- ous EAs are popular as global optimization methods, which are derivative free [10-12, 19]. Therefore, in several recent researches, EAs have been utilized in order to de- velop FS for modeling, identification and classification tasks [10-12]. These systems are generally named evolutionary fuzzy systems [19]. 3. Continuous Ant Colony Optimization Ant colony optimization (ACO) algorithms belong to the class of meta-heuristic search methods, which are inspired by the social behavior of real permits and ants. 260 M. Eftekhari· M. Zeinalkhani (2013) The initial versions of ACO meta-heuristic algorithms were developed [14, 15] a few years ago. Meta-heuristic algorithms can be considered as a general algorithmic framework for solving different combinatorial optimization problems [14, 15]. The ACO meta- heuristic have become one of the most promising methods for attacking hard combi- natorial optimization problems such as traveling salesman problem (TSP) [14, 15]. A good comparison between evolutionary algorithm (EA) and ACO ones, both as bio-inspired algorithms, was presented in [20]. Also the similarities and differences between population-based incremental learning (PBIL), as an EA, and ACO were dis- cussed in [20]. Both of these methods are memoristic information based to guide the search process. But despite of ACO, the PBIL does not use any heuristic information in its solution construction process. The EAs can be used for solving complex prob- lems in both discrete and continuous domains [21]. The researchers in recent years have been motivated to extend the ACO meta-heuristic into continuous domains. In contrast to conventional use of ACO for solving problems in discrete domains, rela- tively a few works with the purpose of extending ACO algorithms to continuous space have been reported. The first continuous ACO (CACO) algorithm was introduced by Bilchev [21] and other variants and modifications have subsequently been reported by other authors [22, 23]. The Bilchev’s approach and most of the recent methods comprises two stages: global and local. The set of ants are divided into two classes. One type is used to search globally for promising regions in search space and the other ants are used to perform a local search inside the most promising regions. The creation of new regions for global searching is handled by a genetic algorithm like process, while the local ants provide the metaphoric like to ant colonies. Bilchev and Wodrich have reported some weakness of this approach [22]. The expensive main- taining of a history of regions was one of the disadvantages which they pointed out. Also the first CACO didn’t handle constrained optimization problems [22]. Bilchev and Wodrich showed that a few modifications were needed to adapt the CACO for constraint handling [22]. After that, other authors attempted to resolve the constraint handling problem in their new versions of CACO [23]. The authors of this paper have developed a novel algorithm for solving unconstrained continuous numerical optimization problems called GCACO [14]. In GCACO I, global search stage is not connected with the previous developed CACO. Therefore, there aren’t any regions and some GA-like notations for creating them. In this approach, our intension is to maintain the basic framework of ACO meta-heuristic. The GCACO II, the extended version of GCACO I, have developed by authors [15] in order to handle the con- strained problems with linear/nonlinear and equality/inequality constraints. Details of GCACO II are given in [15] for further studies. − GCACO The main design and modifications proposed to adapt the algorithm for the continu- ous spaces are described as below in Table 1 which is a pseudo-code for the proposed algorithm. The first modification of the general ACO toward adaptation to the con- tinuous case is the assignment of nests of the colonies. The first nest is initialized randomly, and the subsequent nests are placed at the best point obtained by the cur- Fuzzy Inf. Eng. (2013) 3: 255-277 261 rent colony. This is illustrated at line 10 of Table 1. The search by each colony is implemented by the function ants generation and activity() as shown in Table 1. The search starts by randomly selecting a feasi- ble point in the search space as the starting nest for the first colony. A number of ants (ant no) are assigned to each colony. Each ant begins the search from the nest exploring and exploiting new feasible points by walking in the search space using pheromone trial based or random directions and a variable step size. In order to achieve a balance between exploration and exploitation, the choice for direction se- lection method is based on anε-greedy like technique. The direction of movement is selected as the most probably best existing direction in the memory, but occasionally, with a small probability q the random method is chosen. In this work, the probabilistic selection mechanism is similar to the movement of real ant which is based on the amount of pheromone trial and heuristics. The mixed information about pheromone trail and heuristic value of each direction are stored as variables in the ant routing table (ART). The probabilities for selecting directions are calculated using this table. Therefore, for each path, the values η and τ are used to i i calculate the decision variable a according to Equation (9). The selection probability P for each direction (path) is calculated based on Equation (10): Directions β β a = (τ ×η )/( τ ×η ) ∀i∈{Directions}, whereβ = 0.4, (9) i i j i j j=1 Directions P = a/( a ). (10) i i j j=1 Now based on the calculated selection probabilities, P , a roulette wheel or greedy like technique may be used for actual selection of the movement direction. Memory consist of a predefined number of direction vectors which are initially null vectors and are replaced by a set of normalized gradient vectors calculated from the previous motions that resulted in a better evaluation function value as the algorithm progresses. The normalized gradient vectors (NGVs) are employed to guide our searching pro- cess. GV is calculated numerically and is the discrete interpretation of gradient vector in which the difference between the new and old function values is divided by cor- responding difference of variables values. NGV is the normalized GV. On the other hand, each ant moves from its previous position in two ways: 1) Along the most promising NGVs, leading to better points in previous motions. 2) Along a unified random vector. The incremental movement of each ant in the search space is based on both variable step size and memorized or random directions. The following update rule is designed which tune and control the step size for the movement of an ant (Lines 28 and 30 of Table 1). X = X + R .ψ(·). (11) new old a X and X are vectors of new and old object variables, R is the radius of activity of new old a 262 M. Eftekhari· M. Zeinalkhani (2013) Table 1: Pseudo-code for GCACO II. 1 Procedure GCACO Meta heuristic( ) 2 Initialize the parameters of GCACO(); 3 While (t < T ) max 4 [Best solution] = ants generation and activity(); 5 if (not Better(Best solution, Global solution)) 6 Evaporate Pheromone(); 7 endif 8 Daemon action(); 9 Reduce the Radius of colony Movement(); 10 Set Nest for colony(Best solution); 11 t = t+ 1; 12 endwhile 13 endprocedure 14 Procedure ants generation and activity() 15 Initialize the Radius of colony by R ; 16 While (Ant count < Total Ants) 17 [best Ant solution] = new active ant(); 18 if (Better(best Ant solution, global Ant solution)) 19 global Ant solution = best Ant solution; 20 Ant count = Ant count+1; 21 endwhile 22 endprocedure 23 procedure new active ant() 24 Initialize R = R , Nest = Nest of colony,τ = 2,ϕ = 0.05; a c 0 25 While (ant is alive) 26 q = generate random value in[0,1]; 27 if(q <= q ) 28 cur location = pre location+ Random generated vector proportional to R ; 29 else 30 cur location = pre location + The best NGV proportional to R ; 31 endif else 32 sum of violations = Check Constraint for violation(current location); 33 penalty = Dynamic constraint handling(sum of violations, Colony number); 34 current eval fun = function (current location) + penalty; 35 if(memory is used) (the use of memorized NGV) new old 36 τ = (1−ϕ)τ +ϕτ ; best index best index 37 η = Calculate cost of solution(current eval fun); best index 38 else 39 NGV = calculate the gradient and normalize it; 40 Bad index = compute worst direction(); 41 Memory[Bad index] = NGV; new old 42 τ = (1−ϕ)τ +ϕτ ; Bad index Bad index 43 η = Calculate cost of solution(current eval fun); bad index 44 endif-else 45 Update ant routing Table(); 46 pre eval fun = cur eval fun; 47 pre location = cur location; 48 if ( R < Ant die radius ) 49 Ant is died; 50 endif 51 R = R × DF; decreasing the radius by Decrease Factor(DF) a a 52 endwhile 53 if (online delayed pheromone update) 54 [Best H index, Best H value] = Find direction with max heuristic value; 55 NGV Best H = Memory[Best H index]; 56 SNGV = calculate the size of NGV; 57 Cost of the corresponding direction = Best H value + SNGV; new old 58 τ = (1−ϕ)τ +ϕ(Cost of the best direction) Bh Bh 59 Update ant routing Table(); 60 end if 61 return the global best ant solution; 62 endprocedure Fuzzy Inf. Eng. (2013) 3: 255-277 263 an ant,ψ(·) is the direction of movement. The best direction of movement is taken as either the most probably best direction in the memory as described above (Line 30 in pseudo-code) or generated randomly as vectors with elements taken from a uniform distribution (Line 28 in pseudo-code) with values in the range of [-1, 1]. A radius is considered for each colony and it reduces when the algorithm proceeds (Line 9 in pseudo-code). Therefore the last colonies have smaller radius for search. The radius also diminishes by ants of a colony (Line 51 in pseudo-code). All ants in a colony start their search with radius defined for that colony and finish their search when the radius reaches to zero (this reduction is performed by each ant in colony). Consequently, radius reduction occurs in two levels, firstly in colony level, secondly for ants that exist in an arbitrary colony. After the movement of an ant to a new point, its feasibility is inspected for the pur- pose of avoiding the violation of the variable bounds. For moving to the new feasible point, if the direction in the memory has been used, then the amount of pheromone and heuristic value of corresponding direction vector are updated according to Equa- tions (12), (13). new old τ = (1−ϕ)τ +ϕτ , i ∈ direction vector indices, (12) i i 1/ f , if f > 0, (a) current current η = i ∈ direction vector indices. (13) | f |, if f ≤ 0, (b) current current Equation (13) is used for minimizing while interchanging (a) and (b) is utilized for maximization. The absolute of the current value of evaluation function is consid- ered as the heuristic for evaluating the decision variable for that particular direction. However, for a random motion a new random vector is generated as the trial of the ant movement. If the evaluation function of the new position shows an improvement, the new vector is viewed as a good direction, leading to a better situation. The NGV is calculated based on the coordinates of the previous and this new points and their corresponding evaluation function. The calculated NGV replaces the worst vector in memory which is the one with lowest probability of selection. In a similar manner to the case of memory based movement, the ART is updated. Each ant returns the best-visited point as the solution. A global solution of a colony is taken as the best solution among the solutions returned by all ants. The overall solution returned by a colony is the outcome of a foraging behavior of all the colony’s members. Within each colony, the ants are restricted to move inside a circle with initial radius of R = R (R is the radius of activities of the current colony). The a c c radius of activities for an ant is decreased by a fixed rate after each movement until it is less than a predefined threshold in which case the life cycle ends. The next ant starts from the nest with a new radius of activity initialized with the value of the radius of activity of the current colony. The whole procedure is repeated until ants of a colony are exhausted. Another essential difference between the proposed algorithm and general ACO is in the structure and use of memory. The memory in this algorithm is not private as in the case of general ACO. For the purpose of increasing the corporation between the individuals with respect to the concept of stigmergy, the memory is shared among all 264 M. Eftekhari· M. Zeinalkhani (2013) ants of a colony because the routing table will pass all information for one colony to the next colony. Therefore, each ant is capable of sensing the modified environment by utiliz- ing the memory which contains better search directions in space. The procedure new active ant of Table 1 is executed for each ant. Each active ant is assigned certain number of tasks during a life cycle. The global solution returned by a colony is assigned as the nest for the next colony, and the above described procedure for ants leaving the nest for the food is repeated. In order to balance exploitation and exploration, for each colony a radius of activities = 1 for the first colony is defined. This radius is decreased exponentially as with R a function of current nest generation (or current colony number) at the beginning of a new nest. Consequently, the latest colonies will have a smaller search space. The reduction of the radius is based on the following: (−t/(C×T )) max R = R × e , (14) c c where t is the current nest generation (i.e., iteration number or colony number), T max the maximum number of iterations and C is the learning rate (constant). Global pheromone evaporation also takes place when the solution of the current colony, com- pared with the previous one, presents no improvement (the algorithm is in stagnation situation and should forget previous trail of pheromones). This is implemented based on the following: new old τ = τ × evap f actor, (15) where evap f actor is a constant value in range [0, 1] causing the reduction in amount new old of pheromone deposition, τ and τ are old and the new values of the amount of pheromone deposition respectively. The procedure is repeated until either some defined criteria are satisfied or the max number of colonies is exhausted. There exist several approaches for coping with constrained optimization problems and most of them have involved the use of penalty functions causing to produce an unconstrained problem from the constrained one by using the modified evaluation function [24]. f (X), if X ∈ F, evaluation(X) = (16) f (X)± Penalty(X), otherwise, where F is the feasible region, X stands for the solution vector and plus is used when the goal is minimizing and minus is for maximizing. Penalty function is a function of violation measure that measures how far an infeasible solution is from the feasible region F. Some form of distance measures are usually used to measure the violations. In this paper, a dynamic method of penalization, that has been introduced in [25], is employed. The penalty function is defined according to that method as following: θ λ penalty(X) = (C ) V (X), (17) no j=1 Fuzzy Inf. Eng. (2013) 3: 255-277 265 where C is the colony number (i.e., t in the algorithm), θ is a constant for adjust- no ing the weight of penalization when the search process is progressing. The notation th V (1 ≤ j ≤ k) is used for denoting the j constraint violation, where k is the total number of constraints. Also λ is a constant and usually is set to 2. Instead of this constant, the absolute value of V can be used. In this work, the potential of GCACO II for extracting interpretable fuzzy mod- els for the sake of identification is investigated. This algorithm is utilized for the parameter tuning of a TS-FIS model produced by subtractive clustering. The mean square error (MSE) between model output and actual output is considered as evalua- tion function. 4. Generating Interpretable Fuzzy Model Similar fuzzy sets represent almost the same region in the universe of discourse of a fuzzy variable, i.e., they describe the same concept. Therefore, the first step in at- taining model interpretability consists of finding and merging similar MFs. Structure learning by means of subtractive clustering technique leads to a set of initial MFs, some of which may be highly similar and result in some unnecessary rules. Thus, the model will lack transparency and it seems useful to merge similar MFs. In the fol- lowing, at first an interpretability measure is introduced that has been utilized in the literature and by which similar MFs can be merged. Then based on this measure, our proposed procedure for generating interpretable fuzzy models is described in detail. The proposed procedure is and evolutionary fuzzy system that employs GCACO II for producing approximate and interpretable fuzzy models. − Proposed procedure Let A and B be two fuzzy sets. The measure of similarity of these sets may vary from 0, indicating completely distinct to 1 indicating completely similar. The most common similarity measure fuzzy sets given in the literature [10, 25, 26] is based on the intersection and union operations and defined as follows: |A∩ B| |A∩ B| S (A, B) = = , (18) |A∪ B| |A|+|B|−|A∩ B| where S is the similarity measure and |·| indicates the size of a set, and intersection and union operators are denoted by ∩ and ∪ respectively [25]. The implementation of this measure in a continuous universe of discourse will prove computationally intensive, particularly for Gaussian MF which are produced by subtractive clustering. Therefore, some simplification methods for calculating the similarity of fuzzy sets have been suggested, for example, triangular function with centre c and width σ π is utilized by [26] for similarity evaluation. This method is adapted in this work for calculation of the similarity between fuzzy sets. If the similarity measure for two fuzzy sets is greater than a predefined threshold, both sets are replaced by a new fuzzy set having parameters given by the following equations σ c +σ c 1 1 2 2 c = , (19) new σ +σ 1 2 266 M. Eftekhari· M. Zeinalkhani (2013) σ +σ 1 2 σ = . (20) new Assume that the three following rules have been generated by employing subtrac- tive clustering for a system with n inputs and one output. R : If x is D and ··· and x is D then g (x , x ,··· , x ) 1 1 1,1 n 1,n 1 1 2 n R : If x is D and ··· and x is D then g (x , x ,··· , x ) (21) 2 1 2,1 n 2,n 2 1 2 n R : If x is D and ··· and x is D then g (x , x ,··· , x ). 3 1 3,1 n 3,n 3 1 2 n g (x , x ··· x ) = a x + a x ··· a x + a , (22) i 1 2 n 1,i 1 2,i 2 n,i n n+1,i th th th where D indicates the i MF on the j input. The antecedent of i rule is consti- i, j th tuted from the conjunction (and)of i MFs on each of n inputs. a , a ··· a , a 1,i 2,i n,i n+1,i are the output parameters of the TS-FIS. If the third MF of each input (D ) variable 3,i merges to the other MFs on that input, then the third rule constructing from the con- junction of the third MFs will be omitted. In this manner, redundant rules may be removed from the rule base during the MF merging process. The procedure of mem- bership function merging, one pair per iteration, continues until no more pairs satisfy the merging threshold. The function created after merging is a merging candidate in the following iterations. The flowchart of algorithm for generating a comprehensible fuzzy system is shown in Fig.1 given below. Fig. 1 The algorithm for generating interpretable FIS Fuzzy Inf. Eng. (2013) 3: 255-277 267 The subtractive clustering method is used for creating an initial FIS model. The FIS is refined and redundant rules are removed by the procedure described above, the GCACO II is utilized for parameter tuning. After terminating the above procedure, the simplified FIS are tuned by GCACO II in order to obtain an appropriate degree of accuracy (i.e., MSE). Accuracy can be considered as another criterion for stopping the procedure. A threshold imposed on the required degree of accuracy can result in the termination of the simplifying procedure and consequently the whole procedure. The proposed algorithm in this section is similar to that of proposed in [10] by authors in which differential evolution (DE) has been employed in tuning step. The added advantage of GCACO II over DE is its ability to come up with constraint optimization problems. Due to abilities of GCACO II for handling constraints, in the tuning step of the above algorithm, good models in terms of validity and performance are identified. Validation criteria are utilized as imposed constraints to the performance objective which cause to produce valid models. In the case of NARX model validation, the noise model is not specifically estimated and consequently, the residuals may be col- ored. Specific tests are required and the estimated non-linear model will only be unbiased if and only if [12, 27] XC 2 2 (τ) = 0 ∀τ, (23) u ε XC 2 (τ) = 0 ∀τ, (24) u ε XC (τ) = 0 ∀τ, (25) uε where XC denotes the cross correlation, ε is the residual vector (containing all resid- uals), and u is the input vector (containing all inputs). The target value for these constraints is set up to be the 95% confidence limit. The correlation based validation objectives are contained in a (2τ+ 1) element vector. In order to define these functions as scalar, the infinity norm is utilized as following: in f N = ||XC 2 2|| , (26) 2 2 u ε u ε in f N = ||XC 2 || , (27) u ε ∞ u ε in f N = ||XC || , (28) uε uε ∞ The identification process then evolves through regions of the search space where valid fuzzy models are located (i.e., the above scalars be less or equal than predefined values). 5. Computational Results The proposed method is applied to Lorenz system, Mackey-Glass time series, and flexible robot arm problems and obtain results compared with those obtained from similar methods. 5.1. Simulation Conditions The radius of subtractive clustering algorithm is set to 0.8. The other parameters 268 M. Eftekhari· M. Zeinalkhani (2013) of this algorithm are considered by their default values. The similarity threshold is selected as 0.8 in all case studies. 2000 and 1000 data samples are generated for the first and second case studies respectively through solving their differential equations. 1000 data samples for the third example are taken randomly from the public domain for system identification [28]. The lags for outputs and inputs and the dead time are set n = 4, n = 0 and n = 1 respectively for the last case study. The parameters of p q d GCACO II are given in Table 2 below: Table 2: Parameters values of GCACO II used in this work. No. of ants 50 No. of colonies 50 Initial radius 1 Radius decrease factor 0.7 C.(constant for decreasing colony radius) 3 q prob. 0.35 Evap. rate 0.95 ϕ 0.03 ρ 0.05 Mem. size 10 5.2. Case Studies Case 1 Lorenz System The Lorenz system is described by the following differential equations [9]: 2 2 x ˙ = −y − z − a(x− F ), (29) Lorenz y ˙ = xy− bxz− y+ G , (30) Lorenz z ˙ = bxy+ xz− z. (31) Parameters of Lorenz system are a = 0.25, b = 4.0, F = 8.0, and G = 1.0. Lorenz Lorenz In the simulation, x(t) is predicted from x(t− 1), y(t− 1) and z(t− 1). Two thousand data points are obtained from Equations (29)-(31) using the fourth order Runge-Kutta method with a step length of 0.05, where 1000 pairs of data are used for training and the other 1000 for test. The sampling data pairs are shown in Fig.2. The best fuzzy model and its output are shown in Figs.3-6. The antecedent fuzzy MFs are depicted in Fig.4. The only fuzzy rule is given in Fig.3. The solid curves in Fig.5 and Fig.6 depict actual outputs for the first 400 samples of train and test data respectively. The dotted curves represent the output behavior of the obtained model on 400 samples of train and test data respectively. Mean square error (MSE) for train and test data are MS E − train = 1.2345E-4 and MS E − test = 2.8456E-4. Case 2 Mackey-Glass Time Series Fuzzy Inf. Eng. (2013) 3: 255-277 269 Fig. 2 Inputs: x(t− 1), y(t− 1) and z(t− 1) and output x(t) of Lorenz system R :If (x(t− 1) is Middle) and (y(t− 1) is Middle) and (z(t− 1) is Middle) then x(t) = 0.9952 x(t− 1)− 0.0092 y(t− 1)− 0.0281 z(t− 1)+ 0.0033 Fig. 3 Rule representation of fuzzy system for Lorenz system Fig. 4 Distribution of MFs for antecedent of the above rule 270 M. Eftekhari· M. Zeinalkhani (2013) Fig. 5 The actual and model output for the first 400 samples of train data Fig. 6 The actual and model output for the first 400 samples of test data Fuzzy Inf. Eng. (2013) 3: 255-277 271 The Mackey-Glass time series is described as follows [8]: ax(t− r) x ˙ = − cx(t). (32) 1+ x (t− r) Parameters of Mackey-Glass time series are a = 0.2, b = 10, c = 0.1 and r = 30. The goal is to predict x(t) from x(t− 1), x(t− 2) and x(t− 3) 1000 data points are obtained from Equation (32) using the fourth order Runge-Kutta method with a step length of 1 and the initial condition x(0) = 1.2, where 500 pairs of data are used for training and the other 500 for test. The sampling datum pairs are shown in Fig.7. Fig. 7 Inputs: x(t− 1), x(t− 2) and x(t− 3) and output x(t) of Mackey-Glass system The best fuzzy model and its output are shown in Figs.8-11. The antecedent fuzzy MFs are depicted in Fig.9. The only fuzzy rule is given in Fig.8. R1: If (x(t− 3) is Big) and (x(t− 2) is Big) and (x(t− 1) is Small) then x(t) = 0.0184x(t− 3)− 1.0149x(t− 2)+ 1.9841x(t− 1)+ 0.0110 Fig. 8 Rule representation of fuzzy system for Mackey-Glass system The solid curves in Fig.10 and Fig.11 depict the actual outputs for the first 400 samples of train and test data respectively. The dotted curves represent the output behavior of the obtained model on 400 samples of train and test data respectively. MSE for train and test data are MS E train = 3.2672E-5 and MS E test = 3.6383E-5. Case 3 Flexible Robot Arm The arm is installed on an electrical motor. The transfer function represents the rela- 272 M. Eftekhari· M. Zeinalkhani (2013) Fig. 9 MFs of antecedent of the above rule Fig. 10 The actual and model output for the first 400 samples of train data Fig. 11 The actual and model output for the first 400 samples of test data Fuzzy Inf. Eng. (2013) 3: 255-277 273 tion between measured reaction torque of the structure on the ground to the accelera- tion of the flexible arm. Therefore, the input is reaction torque of the structure and the output is acceleration of the flexible arm. The applied input is a periodic sine sweep. As mentioned above, 1000 data samples for the third example are taken randomly from the public domain for system identification [28]. Seventy percent of the data is selected randomly for training and the rest is used for test. The best fuzzy model and its output are shown in Figs.12-15. The antecedent fuzzy MFs are depicted in Fig.13. The only fuzzy rule is given in Fig.12. R1: If u(t− 1) is Small and y(t− 1) is Middle and y(t− 2) is Middle and y(t− 3) is Big and y(t− 4) is Small then y(t) = 0.5432u(t − 1) + 0.24341 y(t − 1) + 0.3550y(t − 2) − 0.6754y(t− 3)+ 0.29995y(t− 4)+ 0.0723 Fig. 12 Rule representation of fuzzy system for flexible robot arm plant Fig. 13 Antecedent MFs of the only rule Fig.14 and Fig.15 depict the model and actual outputs for 200 number of the train and test data respectively. The calculated MSE of the trained and test data for the best models are 6.6129E-5 and 7.5843E-5, respectively. 5.3. Comparisons As mentioned in previous sections, authors have recently developed a similar algo- rithm in which DE was utilized instead of GCACO. In this section, Table 3 containing a comparison between results obtained in this work and those of past researches is given. As can be seen from the above given table, results of this study are well comparable to recently performed studies by authors in terms of accuracy and compactness of 274 M. Eftekhari· M. Zeinalkhani (2013) Fig. 14 Model and actual output for train data Fig. 15 Model and actual output for test data Fuzzy Inf. Eng. (2013) 3: 255-277 275 Table 3: Comparing the best recently reported fuzzy models in the literature and that of this paper. Method Case studies No. of rules MSE train MSE test [10] No.1 1 2.223E-4 3.123E-4 No.2 1 3.3116E-5 3.6563E-5 No.3 1 6.8935E-5 7.8843E-5 [11] No.1 1 2.4566E-4 3.4453E-4 No.2 1 3.7846E-5 3.8736E-5 No.3 1 4.6129E-5 4.5843E-5 [12] No.1 1 1.2344E-4 2.7654E-4 No.2 1 3.0101E-5 3.1121E-5 No.3 1 3.9888E-5 3.8276E-5 This paper No.1 1 1.2355E-4 2.8456E-4 No.2 1 3.2672E-5 3.6383E-5 No.3 1 6.6129E-5 7.5843E-5 rule base. Although MSE values are not very better than those of past researches, they are near to them. Furthermore, models obtained in this study have been checked for validity via the added advantage of GCACO II that is the constraint handling. Consequently, this study considers all objectives of system identification task, namely accuracy, interpretability, compactness and validity conditions. Recently, in Ref.12 authors have proposed a hybrid based evolutionary algorithm for fuzzy identification of MIMO nonlinear systems. In that study, all objectives of fuzzy based system identification were taken into account and the proposed procedure has employed the abilities of a multi-objective genetic algorithm as well as DE. Moreover, the type of MFs was considered to be Gaussian combinational MFs (abbreviated as Gauss2mf). Therefore as it is apparent from Table 3, the best results in terms of accuracy and compactness are related to the method described in [12]. Considering abilities of the method introduced in [12] and regarding the results obtained in this paper, it is concluded that the proposed method based on GCACO II is well comparable to the others in terms of accuracy and interpretability of resulting fuzzy models. Also the procedure is easy to implement and computationally efficient. 6. Conclusion The ability of GCACO for tuning an FIS of the TS type is investigated in this paper. Chiu’s subtractive clustering as an automatic data-driven based method is utilized for constructing the primary FIS. This method, as the other clustering-based ones, has the advantage of avoiding the explosion of the rule base. Then an iterative algorithm with two steps is employed to produce a simplified FIS in terms of number of fuzzy sets and rules. In the first step, the parameters of the model are adjusted by utilizing the GCACO. In the second step, the similar membership functions of the previous 276 M. Eftekhari· M. Zeinalkhani (2013) obtained model merge. These two steps are performed until no more pairs of mem- bership function satisfy the merging criterion. Moreover, two well-known chaotic time series, namely Lorenz and Mackey-Glass and an industrial plant (i.e., a flexible robot arm) are used as case studies. The results illustrate the success of proposed sim- plification procedure along with GCACO for obtaining accurate and transparent fuzzy models. Models obtained in this study have been checked for validity via the added advantage of GCACO II that is the constraint handling. Consequently, this study considers all objectives of system identification task, namely accuracy, interpretabil- ity, compactness and validity conditions comparing to similar past researches. Acknowledgements The authors would like to appreciate Dr. Malihe Maghfoori Farsangi for the time she has devoted to proofreading the paper. References 1. Eksin I, Erol O K (2000) A fuzzy identification method for nonlinear systems. Turkish Journal of Electrical Engineering & Computer Sciences 8(2): 125-135 2. Jang J S R (1993) ANFIS: Adaptive-network-based fuzzy inference systems. IEEE Transactions on Systems, Man, and Cybernetics 23(3): 665-685 3. Sanchez L, Couso I, Corrales J A (2001) Combining GP operators with SA search to evolve fuzzy rule based classifiers. Information Sciences 136: 175-191 4. Casillas J, Cordon O, Herrera F, Magdalena L (2003) Accuracy improvements in linguistic fuzzy modeling. Berlin: Springer 5. Paiva R P, Dourado A (2004) Interpretability and learning in neuro-fuzzy systems. Fuzzy Sets and Systems 147(1): 17-38 6. Jin Y, Sendhoff B (2003) Extracting interpretable fuzzy rules from RBF networks. Neural Processing Letters 17(2): 149-164 7. Wang H, Kwong S, Jin Y, Wei W, Man K F (2005) Agent-based evolutionary approach for inter- pretable rule-based knowledge extraction. IEEE Transactions on Systems, Man, and Cybernetics-Part C 35(2): 143-155 8. Wang H, Kwong S, Jin Y, Wei W, Man K F (2005) Multi-objective hierarchical genetic algorithm for interpretable fuzzy rule-based knowledge extraction. Fuzzy Sets and Systems 149(1): 149-186 9. Chiu S L (1994) Fuzzy model identification based on cluster estimation. Journal of Intelligent and Fuzzy Systems 2: 267-278 10. Eftekhari M, Katebi S D, Karimi M, Jahanmiri A H (2008) Eliciting transparent fuzzy model using differential evolution. Applied Soft Computing 8(1): 466-476 11. Eftekhari M, Katebi S D (2008) Extracting compact fuzzy rules for nonlinear system modeling using subtractive clustering, GA and unscented filter. Applied Mathematical Modeling 32(12): 2634-2651 12. Eftekhari M, Majidi M, Nezamabadi P H (2012) Securing interpretability of fuzzy models for mod- eling nonlinear MIMO systems using a hybrid of evolutionary algorithms. Iranian Journal of Fuzzy Systems 9(1): 61-77 13. Zhou S M, Gan J Q (2008) Low-level interpretability and high-level interpretability: A unified view of data-driven interpretable fuzzy system modeling. Fuzzy Sets and Systems 159(23): 3091-3131 14. Eftekhari M, Daei B, Katebi S D (2006) Gradient-based ant colony optimization for continuous spaces. Esteghlal Journal of Eng. 25(1): 33-45 15. Eftekhari M, Moosavi M R, Katebi S D (2006) Solving constrained continuous optimization problems with GCACO II. 11th Annual Conference of Computer Society of Iran: 180-188 16. Abonyi J (2003) Fuzzy model identification for control. Boston: Birkhauser 17. Abonyi J, Babuska R, Szeifert F (2002) Modified gath-geva fuzzy clustering for identification of Takagi-Sugeno fuzzy models. IEEE Transactions on Systems, Man, and Cybernetics-Part B 32(5): 612-621 18. Feil B, Abonyi J, Madar J, Nemeth S, Arva P (2004) Identification and analysis of MIMO systems based on clustering algorithm. Acta Agraria Kaposvariensis 8(3): 191-203 Fuzzy Inf. Eng. (2013) 3: 255-277 277 19. Herrera F (2008) Genetic fuzzy systems: taxonomy, current research trends and prospects. Evolu- tionary Intelligence 1(1): 27-46 20. Alonso S, Cordon O, Fernandez de Viana I, Herrera F (2004) Integrating evolutionary computation components in ant colony optimization evolutionary algorithms: an experimental study. In: L. Nunes de Castro, F. J. Von Zuben (Eds.), Recent Developments in Biologically Inspired Computing, Idea Group Publishing 21. Bilchev G, Parmee I C (1995) The ant colony metaphor for searching continuous design spaces. Lecture Notes in Computer Science 993: 25-39 22. Wodrich M, Bilchev G (1997) Cooperative distributed search: The ants way. Control and Cybernetics 26(3): 413-445 23. Socha K, Dorigo M (2008) Ant colony optimization for continuous domains. European Journal of Operational Research 185: 1155-1173 24. Michalewicz Z, Fogel D B (2005) How to solve it: Modern heuristics. Berlin: Springer Verlag 25. Dubois D J, Prade H M (1980) Fuzzy sets and systems: Theory and applications. New York: Aca- demic Press 26. Chao C T, Chen Y J, Teng C C (1996) Simplification of fuzzy-neural systems using similarity analy- sis. IEEE Transactions on Systems, Man, and Cybernetics-Part B 26(2): 344-354 27. Rodriguez-Vazquez K (1999) Multiobjective evolutionary algorithms in non-linear system identifica- tion. Ph. D Thesis, Department of Automatic Control and Systems Engineering, The University of Sheffield 28. Moor B D (2010) DaISy: Database for the identification of systems. Department of Electrical Engi- neering, ESAT/SISTA, K.U.Leuven, Belgium. http://www.esat.kuleuven.ac.be/sista/daisy

Journal

Fuzzy Information and EngineeringTaylor & Francis

Published: Sep 1, 2013

Keywords: Continuous ant colony; Interpretable; Fuzzy modeling

There are no references for this article.