Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Agriculture Crop Suitability Prediction Using Rough Set on Intuitionistic Fuzzy Approximation Space and Neural Network

Agriculture Crop Suitability Prediction Using Rough Set on Intuitionistic Fuzzy Approximation... FUZZY INFORMATION AND ENGINEERING 2019, VOL. 11, NO. 1, 64–85 https://doi.org/10.1080/16168658.2021.1886813 Agriculture Crop Suitability Prediction Using Rough Set on Intuitionistic Fuzzy Approximation Space and Neural Network a b A. Anitha and D. P. Acharjya a b School of Information Technology and Engineering, VIT Vellore, India; School of Computer Science and Engineering, VIT Vellore, India ABSTRACT ARTICLE HISTORY Received 13 October 2016 Agriculture plays a vital role in Indian economy. On considering the Revised 25 March 2017 overall geographical space verses population in India, 7% of popu- Accepted 10 October 2018 lation is chronicled in Tamilnadu, with 3% of water and 4% of land resources. Thus an automated prediction system becomes essen- KEYWORDS tial for predicting the crop based on the nutritional security of the Almost indiscernible; country. In this paper, effort has been made to process the uncertain- intuitionistic fuzzy proximity relation; neural network; ties by hybridizing rough set on intuitionistic fuzzy approximation knowledge discovery; space (RSIFAS) [Acharjya DP, Tripathy BK. Rough sets on intuitionis- prediction; rough set tic fuzzy approximation spaces and knowledge representation. Int J Artif Int Comput Res. 2009;1 (1):29–36.] and neural network [Hecht NR. Theory of the backpropagation neural network. Proceedings of the international Joint Conference on neural networks, 1 (1989), 593–605.]. RSIFAS identifies the almost indiscernibility among the natural resources, and helps in reducing the computational proce- dure on employing data reduction techniques whereas neural net- work helps in prediction process. It helps to find the crops that may be cultivated based on the available natural resources. The proposed model is analyzed on data accumulated from Vellore district of Tamil- nadu, India and achieved 93.7% of average classification accuracy. The model is compared with earlier models and found 6.9% better accuracy while prediction. 1. Introduction In India, for over 58.4% of its population, agriculture is the principal means of livelihood. In addition, the agricultural merchandises are considered as the main commodity for the international trading. To sustain the growth of the Indian economy, there is a need for a drastic growth in agriculture productivity. For agriculture, the land and water are the main resources, which are inadequate in nature. Consequently, it is necessary to devise a lucrative cropping system with the accessible resources and to increase the productivity. Ever since, the market competition is high, a premeditated planning is mandatory to improve the per- formance to accomplish a profitable yield in the cropping system. The perfect planning in the development and production of the cropping system may step back due to uncer- tainty in forecasting the harvesting and demand for the crop. Therefore, to investigate the CONTACT D. P. Acharjya dpacharjya@gmail.com © 2021 The Author(s). Published by Taylor & Francis Group on behalf of the Fuzzy Information and Engineering Branch of the Operations Research Society, Guangdong Province Operations Research Society of China. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons. org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. FUZZY INFORMATION AND ENGINEERING 65 information for future planning can be achieved by a prediction model. A prediction model developed with the prior knowledge gives more accuracy towards the real-life situations. Thus, the proposed prediction model is based on soil and water resources available in some region to forecast the production of agricultural crops, with reduced risk of loss. Due to the lack of natural and human resources, many farmers agree themselves to alter the agricul- ture land into marketable land. This attitude has to be changed so as to retain the farmers and especially young generation to take up agriculture as their main occupation, and the income from the farm holding should be amplified significantly. The area of study of this paper restricted to Vellore district in Tamil Nadu where agricul- ture is the main vocation. The small and marginal farmers in this region play a key role in the overall improvement in agriculture towards the development of the Indian economy. Thus, the adoption of appropriate cropping system by these farmers needs to be focused. Indian government has taken some productivity measure to improve the crop production by: training the farmers, relaxing the seed cost and loan amount etc. To tackle the increasing competency, it becomes more essential to develop a crop suitability information system to improve the productivity, and profit for the farmers. To develop such an effective sys- tem, data collected from various sources such as soil, water, seedling methodologies and meteorological conditions must be analysed properly instead of saving as archives. Analysing data and discovering knowledge is a challenging and increasingly important task as it contains uncertainties. Additionally, it is not always useful to users as it may not certainly satisfy user’s choice due to the presence of redundancy, inconsistency and vague- ness. Many traditional tools used for discovering knowledge are deterministic, crisp and precise. Thus, it is essential to use some intelligent techniques so as to process the uncer- tainties present in the data. The emergence of intelligent computing techniques like fuzzy set [1], rough set [2,3], rough set on fuzzy approximation space (RSFAS) [4,5], rough set on intuitionistic fuzzy approximation space (RSIFAS) [6], soft set [7], near set [8], fuzzy rough set [9], rough set on two universal sets [10], neutrosophic set [11] etc. plays a vital role in knowledge discovery. Further, RSFAS is hybridised with Bayesian classification, soft set and neural network [12,13,14,15,16] in the development of prediction system. In this paper, effort has been taken to predict decisions from the uncertain and impre- cise data by means of RSIFAS and neural network. The concept of RSIFAS is based on almost indiscernibility present in the data set. The objects in the information system are approxi- mated by a pair of sets, called as lower and upper approximations based on the intuitionistic fuzzy proximity relation. The motivation behind the utilisation of RSIFAS is to obtain (α, β)- equivalence classes, where the attribute values are not qualitative. Further, the classified information system is trained and tested with back propagation neural network that com- forts to explore decisions for unknown associations of the attribute values. This helps us to predict a specific crop that is to be cultivated in a specific area on deliberating various conditions such as soil, water characteristics and rainfall. The remaining part of the paper is planned accordingly: Section 2 presents basics of RSI- FAS, whereas Section 3 discusses the basics of feed-forward back propagation neural net- work. The proposed research design is presented in Section 4. Section 5 deals with analyses of the performance of the trained data with the testing data according to known feature values. An experimental comparative study of the proposed model with various existing techniques is given in Section 6. The paper is concluded by a conclusion in Section 7. 66 A. ANITHA AND D. P. ACHARJYA Table 1. Information system. Avg. relative Avg. evaporation Objects Max temp (a ) Min temp (a ) Avg. wind speed (a ) humidity (a ) rate (a ) 1 2 3 4 5 x 36.6 20.9 8.6 73 4.4 x 36.9 23.1 7.4 72 2.8 x 43.7 24.8 6.2 70 2.6 x 46.9 27.4 3.1 67 3.4 x 46.1 27.2 7.4 62 5.1 x 45.4 26.4 8.9 56 4.2 2. An Information System Procuring knowledge for classification is one of the most essential intentions of data mining and inductive learning. But, in real-life problems, it is not enough to deal with simple clas- sification as it contains uncertainties. To deal with such problems, the classification using RSIFAS was introduced. Before, we discuss the classification power of RSIFAS, one should know about an information system. An information system is a table that offers a suit- able way to describe in detail about the finite set of objects of the universe by finite set of attributes thereby representing all available information and knowledge. From the view of rough set theory, it is common in defining the information system as a data set repre- sented as a table in which every column head represents an attribute that can be measured for each object. More formally, an information system is a quadruple IS = (U, A, V, f ), where U ={x , x , 1 2 ··· , x } is a non-empty finite set of objects called the universe and A ={a , a , ··· , a } is a n 1 2 m non-empty finite set of attributes, V = ∪ V , where V is the set of values that attribute a ∈ a a a∈A A may take. The mapping f : (U × A) → V provides the information about each object. a a Further, if A = (C ∪ D), where C is the set of conditional attributes and D is the decision attribute, we call the information system as decision system. For example, consider the information system as shown in Table 1 where each attribute values are quantitative rather qualitative. It is clear that the attribute values are almost identical rather that matching each other. To deal with such almost similarity, the concept of RSIFAS is introduced. 2.1. Foundations of Rough Set on Intuitionistic Fuzzy Approximation Space Pawlak’s rough set [2] is used to identify the indiscernibility between the attribute values with the help of an equivalence relation. But, in several real-life applications, it is observed that the values of the attributes are not exactly the same but almost the same. To decide the amount of identity between two attribute values, the equivalence relation is replaced with fuzzy tolerance relation on each domain of attributes [4]. Again, it fails to include hes- itation that may arise during the knowledge extraction phase. Therefore, fuzzy tolerance relation is further replaced with intuitionistic fuzzy tolerance relation and the concept of RSIFAS was introduced [6]. For example, on a particular period of time if the maximum temperatures at two different places are 36.6°C and 36.9°C, then the temperatures at these places are approximately identical rather than completely identical. At this instant, RSIFAS reduces to RSFAS if there is no hesitation. Similarly, RSIFAS reduces to rough set if there is no hesitation and the attribute values are exactly the same. Therefore, RSIFAS generalises FUZZY INFORMATION AND ENGINEERING 67 Pawlak’s approach of indiscernibility. To disclose the article, foundations such as notions and concepts of RSIFAS are briefly presented in this section. Let (U = ϕ) be a non-empty finite set of discourse called universe and x is a particu- lar element of U. An intuitionistic fuzzy set X of U is defined as {x, μ (x), ν (x)}, where X X μ : U → [0, 1] and ν : U → [0, 1] defines the degree of membership and degree of non- X X membership, respectively, for every element x ∈ U such that 0 ≤ μ (x)+ ν (x) ≤ 1. The X X value π (x) = 1 − (μ (x) + ν (x)) is called the hesitation part, which may cater either X X X membership value or non-membership value or both. For simply, we will use (μ (x), ν (x)) X X to denote the intuitionistic fuzzy set X [17]. An intuitionistic fuzzy relation IR on U is an intuitionistic fuzzy set defined on (U × U) characterised by the membership μ and the non-membership ν where IR IR IR ={(μ (x , x ), ν (x , x ))|x , x ∈ U} IR i j IR i j i j An intuitionistic fuzzy relation IR on U is said to be an intuitionistic fuzzy (IF) proximity relation if it satisfies the following conditions, where μ (x , x ) represents the degree of IR i j membership and ν (x , x ) represents the degree of non-membership between two objects IR i j x and x . i j (1) μ (x , x ) = 1and ν (x , x ) = 0 for all x ∈ U IR i i IR i i i (2) μ (x , x ) = μ (x , x ),and ν (x , x ) = ν (x , x ), for all x , x ∈ U. IR i j IR j i IR i j IR j i i j Let J ={(α, β)|α, β ∈ [0, 1]} and 0 ≤ α + β ≤ 1. Then for any(α, β) ∈ J,the (α, β) − cut is given as IR ={(x , x )|μ (x , x ) ≥ α and ν (x , x ) ≤ β}. We say that the two objects x α,β i j IR i j IR i j i and x are (α, β)−similar with respect to IR if (x , x ) ∈ IR and we write x IR x .Two j i j (α,β) i (α,β) j objects x and x are said to be (α, β)−identical with respect to IR, if there exists a sequence i j of elements u , u , ··· , u in U such that x IR u , u IR u , ··· , u IR x .Inthe 1 2 n i (α,β) 1 1 (α,β) 2 n (α,β) j above case, we say that x is transitively (α, β)−similar to x with respect to IR. It is clearly i j seen that for any (α, β) ∈ J,IR is an equivalence relation on U. Let us denote IR be (α,β) (α,β) the set of equivalence classes generated by the equivalence relation IR .The IR - (α,β) (α,β) equivalence class of an element x in U is denoted as [x] .The pair K = (U,IR(α, β)) is (α,β) called an intuitionistic fuzzy approximation space [6]. Let X ⊆ U. Then the (α, β)-lower and (α, β)-upper approximation of X in the generalised α,β α,β approximation space K = (U,IR(α, β)) is denoted as (X , X ), where L U α,β X =∪{Y|Y ∈ IR and Y ⊆ X} (1) L α,β α,β X =∪{Y|Y ∈ IR and Y ∩ X = ϕ} (2) U α,β α,β α,β A given set X is said to be (α, β)− rough if and only if X = X . Likewise, a given set X is U L α,β α,β said to be (α, β)−crisp if X = X . Equivalently, a set X is said to be (α, β)−rough if the U L α,β α,β α,β α,β boundary BND = X − X such that BND = ϕ. U L IR IR 3. Feed-forward Back Propagation Neural Network Artificial neural networks (ANN) are a model inspired by the organisation of the human brain. It is generally presented as a system of interconnected simple processing elements 68 A. ANITHA AND D. P. ACHARJYA called neurons. It has gone far away from the biological stimulations in exchanging the mes- sages between neurons. The exchanging of messages is carried out by every neuron in the network after receiving the input signal from the environment. The input signal is processed through hidden neurons and finally sent as output signal. Each neuron is connected with at least one neuron, and each connection have numeric weights [18,19]. These weights are generally tuned in the training phase. This makes the network adaptive to input and capa- ble of learning. The learning process is evaluated by a value called weight coefficient. The set of input neurons is activated by activation function and is passed to the other neurons in the next layer. This process is repeated until the desired output neuron is approximated. The construction of the feed-forward neural network is essential in categorising, estab- lishing and summarising data. The architecture consists of three layers such as input layer, hidden layer and output layer. The input layer is the first layer where the input is fed in to the network, whereas the output layer is the last layer where the desired output is produced. The layer(s) present in between the input and output layer are called hidden layers. The net- work is constructed as such of the human brain as each neuron in one layer is connected with all the neurons in the next layer. The interconnection initiated by the input layer and the mapping of input layer and the net layer is characterised by the weight coefficient. More formally, the input from the ith node of the input layer to the jth node in the next hidden layer is denoted as a . The connection from the ith node to the jth node is characterised by the weight coefficient w and the threshold coefficient v of the ith neuron. Based on all the ij i inputs, each node determines a net input value y by using Equation (3). The output value in y of the ith neuron is determined by Equation (4), where g(y ) is the sigmoid function io in which acts as the activation function in the back propagation neural network y = v + w a (3) in i ij i y = g(y ) (4) io in 4. Research Design Development Research design development and problem definition is most significant in applied research. It includes collection of data, preparation of data, removal of noise, classifica- tion, identification of techniques, validation of the model and moreover comparison of the model with the existing models. The proposed model consists of two stages. In the initial stage, RSIFAS is used for data classification whereas in the final stage, back propagation algorithm of feed-forward neural network is used for the prediction of unseen associations of attribute values. An abstract view of the proposed research design is depicted in Figure 1. Before we process data at the initial stage, a sequence of cleaning task such as abstract- ing noise, consistency check and data plenary are carried out to ascertain that the data are as precise as possible. The target data are processed using intuitionistic fuzzy tol- erance relation to obtain almost indiscernibility of data values for each attributes. The classification generated produces the (α, β)-equivalence classes, where α is the degree of belongingness and β the degree of non-belongingness, respectively. It is obvious that the degree of belongingness must be high and degree of non-belongingness must be low to get good appropriate classification. On making the belongingness as 1 (100%) and non- belongingness as 0, the model fails to analyse the information system as each classification will contain exactly one object. It is because of the attribute values present in the system are FUZZY INFORMATION AND ENGINEERING 69 Figure 1. The proposed research design. non-qualitative. The membership and non-membership relation have been premeditated such that the sum of their values lies between [0, 1] and additionally, these functions must be symmetric. The empirical study that we consider is related to crop suitability prediction of Vellore district of Tamil Nadu. The information system contains attributes such as soil pH, moisture, organic matter etc. It provides information about various agriculture contingency factors of different places along with the crops that are cultivated in these places. A place may not be rich in all agriculture contingency factors for the production of any type of crops. However, out of these, some agriculture contingency factors may have greater importance for the production of a particular crop than the others. On varying the values of α and β, the factors may deviate from each other. Indeed, if we decrement the value of α and incre- ment the value of β, progressively the number of factors shall become indispensable. The membership and non-membership relation have been premeditated such that the sum of their values lies between [0, 1] and additionally, these relations must be symmetric. The first requirement necessitates a major of 2 in the denominators of the non-membership functions [6,20]. The degree of belongingness (μ) and the degree of non-belongingness (ν) between two objects x and x is defined in Equations (5) and (6), respectively, where V is the value i j of the object x for the attribute a i i x j |V − V | a a i i μ (x , x ) = 1 − (5) R i j Max(V ) x j |V − V | a a i i ν (x , x ) = 1 − (6) R i j 2 × Max(V ) The reduced qualitative information system is divided into training data set of 55% and testing data set of 45%. The training data set is alimented into neural network to predict the decision for the new unseen objects. The testing data are used to validate the training phase and to ensure higher accuracy. The article uses back propagation neural network in the final stage to obtain the decisions. The process consists of three layer such as input layer, hidden layer and output layer, as shown in Figure 2. The attribute values, a ;1 ≤ i ≤ m of i 70 A. ANITHA AND D. P. ACHARJYA Figure 2. Design of back propagation neural network. the training data set are fed in the input layer. In the subsequent hidden layer, the actual mapping between the input and output layer is carried out. The number of hidden nodes is generally computed based on trial and error bases based on mean square error and mean percentile error. Let us assume total number of hidden nodes as h. Let us denote hidden node as z ;1 ≤ j ≤ h. The output nodes are denoted as d ;1 ≤ k ≤ n, where n is the total j k number of objects in the training data set. The feed-forward back propagation algorithm [21] is basically gradient descent model where the local minima are identified to converge the input, to the output functions. To facilitate this mean square error between the desired, and actual output is calculated to be minimum. This learning consists of two computational phases such as forward pass and backward pass. Forward pass is a feed-forward propagation of the inputs through the network. The following notions are used in the back propagation algorithm. A = {a , a , a , ··· , a , ··· a }: input attribute values (Training vector); where m = 15; 1 2 3 i m d ={d , d , d , ··· , d , ··· d }: observed decisions (Target vector); 1 2 3 n T = {t , t , t , ··· , t }: actual decisions; 1 2 3 m z : hidden node where ; v: random weight vector connecting the input and hidden layer; w: random weight vector connecting the hidden and output layer; bh : bias on hidden unit, bo : bias on output unit err : error at output node err : error at hidden node z ; j j v: weight correction term at the input layer ; [v ] ;1 ≤ i ≤ m; ij m×h w: weight correction term at the hidden layer ; [w ] ; jk h×n LR: learning rate; E : maximum number of epochs required for training; max FUZZY INFORMATION AND ENGINEERING 71 epoch: one training loop on considering all the input vector Algorithm 1(Back Propagation Algorithm) Input: Training Vector ‘A’, bias on hidden unit ‘bh’, learning rate ‘LR’ Output: The trained data set. 1. Initialise weight vector of the input layer v = [v ] by small random values, typically ij m×h between −1 and 1; i.e. −1 ≤ v ≤ 1. ij 2. Initialise weight vector of the hidden layer w = [w ] by small random values, not jk h×n necessarily between −1and 1. 3. Initialise mean square error, MSE = 0; epoch = 0 and learning rate LR. 4. Each input unit receives the input value a and transmits this value to all units in the hidden layer. 5. Each hidden unit z ;1 ≤ j ≤ h, compute its interconnection weight z as defined j in below: z = bh + (a × v ) in j i ij i=1 Apply activation function to all the interconnection weight z , i.e. z = g(z ) and in j in j j transmits these values to all the units in the output layer. 6. Each output unit d ; k = 1, 2, ······ , n compute its interconnection weight d as k in defined below d = bo + z × w in j h jk j=1 Apply activation function to all the interconnection weight d ; d = g(d ). in in k k 7. For each output unit d ; k = 1, 2, ······ , n, compute the mean square error MSE, and average mean square error (AMSE), is given MSE MSE = MSE + (t − d ) ;ASME = k k Increase epoch by 1, i.e. epoch = epoch + 1 8. If (AMSE ≤ 0.5) or (epoch = E ), then stop training; else repeat steps 9–12. max 9. Each output unit d ; k = 1, 2, ······ , n receives a target pattern corresponding to an input pattern. Compute the error term as given below δ = d (1 − d )(t − d ) k k k k k 10. Each hidden unit z ; j = 1, 2, ······ , h compute its error interconnection weight as defined below δ = δ w in k jk k=1 The error information term can be calculated as δ = δ z (1 − z ) j in j j j 72 A. ANITHA AND D. P. ACHARJYA 11. Each output unit d ; k = 1, 2, ······ , n updates its weights by using weight connection term w as jk w = LR ∗ δ z for j = 1, 2, ··· , h jk k j The bias correction term bo ,given as bo = αδ . Thus, we have k k k w (new) = w (old) + w jk jk jk and bo (new) = bo (old) + bo k k k 12. Each hidden unit z ; j = 1, 2, ······ , h updates its weights by using weight correction term v as below ij v = LR ∗ δ a for i = 1, 2, ··· , m ij j i The bias correction term bh ,given as bh = αδ . Thus, we have the following j j j equations and then go to step 4 and repeat the process. v (new) = v (old) + v ij ij ij and bh (new) = bh (old) + bh j j j 5. An Empirical Study on Crop Suitability Prediction The major objective of the research model taken in to consideration is to analyse and to predict the suitable place for cultivating the agriculture crop to yield maximum benefit with the existing resources on a various period of time. Usually, a layman depends on some agriculture research centre or some advice from the agriculture officers to lay the crops on their land. But in practical, it is time-consuming process. The proposed model act as a tool for a layman to identify the crop to be cultivated in a place based on the richness of vari- ous components of the specific crop. To make apparent research model, we considered a real-life problem pertaining to crop cultivation in Vellore district of Tamil Nadu. Historical data from 2011 to 2014 of Krishi Vigyan Kendra of Vellore district are collected. The major resource such as soil and land classification is considered based on the survey of agricul- ture department of Vellore district, Tamil Nadu. Additionally, Tamil Nadu state agriculture departments has divided Tamil Nadu into seven agro-climate zones such as cauvery delta zone, north-eastern zone, western zone, north western zone, high altitude zone, southern zone and high rainfall zone based on various components such as rainfall, soil, irrigation, another physical and ecological features. Among this, Vellore district is categorised under north-eastern zone which entertains an average rainfall of 1099.1 mm per year. The index map as per Krishi Vigyan Kendra of the study area is depicted in Figure 3. Furthermore, Vellore district has been distributed into nine agricultural divisions in 2011 and is further separated into 20 blocks. A total of 4799 villages of 20 blocks were docu- mented according to Krishi Vigyan Kendra whose main occupation is agriculture. Most of the villages produce major agricultural crops such as paddy, cholam, cumbu, ragi, samai, red gram, black gram etc. Apart from this, some villages produce horticulture crops such as banana, mango, guava, sapota etc. as fruit crops and also vegetable crops such as brinjal, tomato, onion, sweet potato etc. Some also yield flower crops and spices such as jasmine, chrysanthemum, marigold and chillies, turmeric, respectively. In this paper, effort has been FUZZY INFORMATION AND ENGINEERING 73 Figure 3. Index map of the study area. Figure 4. Administrative blocks of the study area. taken to collect data from some villages whose main occupation is agriculture. The admin- istrative block boundary map of Vellore district in 2009 on which the study is carried out is shown in Figure 4. For better understanding, the agriculture divisions along with the blocks are presented in Table 2. The most common attributes for crop production of Vellore district includes, soil com- ponent, water components, rainfall during north-east monsoon, rainfall during south-west monsoon, organic manure, moisture etc. Soil and water components are different at var- ious places and depend on several factors. So, it is essential to identify the availability of NPK (Nitrogen, Phosphorus, Potassium) ratio on soil at congruous stage afore cultivation. It minimises the use of inorganic chemical fertilisers. These parameters form the attribute set 74 A. ANITHA AND D. P. ACHARJYA Table 2. Agricultural divisions in Vellore district S. no. Agriculture division Blocks 1. Vellore (1) Vellore, Kaniyambadi, Anaicut 2. Gudiyatham (2) Gudiyatham, K.V.Kuppam and Katpadi 3. Vaniyambadi (3) Alangayam, Madhanur and Pernambattu 4. Tirupathur (4) Tirupathur, Kandhili, Natrampalli and Jolarpet 5. Walajah (5) Walajah and Sholingur 6. Arcot (6) Arcot and Thimiri 7. Arakonam (7) Arakonam, Nemili, Kaveripakkam 8 Ambur (8) Madhanur 9 Katpadi (9) K.V. Kuppam, and Katpadi Table 3. Notation representation table Attributes Abbreviation Notation Possible values Max value Soil pH SPH a [5.4–8.5] 8.5 Moisture MOI a [5–12.2] 12.2 Organic matter OM a [0.65–1.98] 1.98 Nitrogen N a [ 200–800] 800 Phosphorous P a [ 40–533] 533 Potassium K a [115–1045] 1045 Copper Cu a [0.05–2] 2 Zinc Zn a [ 0.01–2] 2 Manganese Mn a [0.7–4.6] 4.6 Iron Fe a [1.98–99.6] 99.6 Water pH WPH a [6.2–8.5] 8.5 Calcium Ca a [11–420] 420 Nitrate NO a [ 16–140] 140 3 13 Magnesium rain Mg a [ 21–280] 280 Rainfall R R a [ 773.4–1111.2] 1111.2 Places PL d – of analysis. The data collected from Krishi Vigyan Kendra and agriculture department are consolidated and presented in Tables 3 and 4. Table 3 represents the notations of various attributes, possible values and max range value of each attribute whereas Table 4 depicts the consolidated sample data considered to our study. The information system presented in Table 4 provides the information about 20 crops that are cultivated at various blocks of agriculture divisions of Vellore district. The infor- mation system contains essential attributes such as soil pH, moisture, organic matter etc. whereas objects are considered as crops. The decision attribute is considered as agricultural division where the particular crop is essentially cultivated to get maximum yield. The main objective of this study is to help farmers in identifying the crops suitable for their land. But the maximum yield rate depends on the various components like soil, water, rainfall etc. But, land and water are the crucial resource in nature. Additionally, a cultivation land may not rich in all the parameters to engender highest productivity. But, these factors are almost indiscernible and hence can be classified by using intuitionistic fuzzy proximity relation. 5.1. Initial Stage of an Empirical Study This section demonstrates the proposed model by considering data collected from Krishi Vigyan Kendra for extracting information. The collected data contains 26 attributes, out FUZZY INFORMATION AND ENGINEERING 75 Table 4. Sample agriculture information system Obj. a a a a a a a a a a a a a a a Places 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 x 7.3 9 0.96 350 200 160 1.2 1.8 3.2 61.2 7.3 20 16 226.14 787.9 3 x 7.2 11.7 0.99 450 130 115 1.2 1.09 1.088 75 8.5 21 17 25 1045.4 1 x 7.21 11.5 0.91 360 200 645 0.5 1.5 0.8 69 7.1 72.3 45 77.3 1111.2 7 x 7.35 9.5 0.78 432 40 150 1.6 3.3 1.7 61 7.36 23 63 280 1052.2 6 x 7.5 7 0.78 200 44 162.86 0.05 1.2 2.49 57 6.3 39 78 259 995 2 x 5.4 6.1 1.23 560 476 486 0.5 3.5 4.6 47 6.2 40.8 84 166 890 4 x 7.47 8 1.32 475 120 310 0.45 1.1 1.2 1.98 7.43 11 78 26 999.3 2 x 6.2 6.7 0.98 345 527 1045 0.9 4.7 2.7 2.2 6.35 80 56 250 894.3 4 x 6.3 7 1.2 401 222 672 0.05 0.01 0.7 8.4 6.35 53.8 45 21 1037.5 1 x 7.1 5 1.32 400 160 160 1.9 2.1 2.4 51.1 8.3 148 25 176.63 1004.4 2 x 7.45 8 1.67 540 242 370 1.5 5 3.3 8.8 7.41 420 56 110 998.7 7 x 7.2 11.9 1.53 200 160 220 1.8 1.2 3.4 61.4 7.4 20 140 211.55 773.4 3 x 8.5 10 1.52 800 190 340 1.2 2.4 1.8 45 8.31 70 130 120 999 4 x 7.32 12.1 1.98 645 140 120 1.4 2.2 3.5 64 7.42 12 18 27 1008.6 7 x 7.4 9 1.32 450 160 325 1.6 1.6 1.8 62.1 7.2 45 41 23 885.2 3 x 8.47 8 0.65 340 533 477 0.51 2.5 1.68 7.57 7 138 45 71 891.2 5 x 7.1 11.8 0.98 340 349 476 0.5 3.6 0.8 4.5 7.2 128 23 211 1012.6 6 x 5.5 10 0.88 650 170 150 1.1 1.9 4.6 51.5 7.28 60 126 130 880.5 4 x 7.2 8 0.92 460 120 140 1.1 1.2 3.2 60.2 8.11 118 24 69.5 1032.2 1 x 7.21 12.2 1.68 340 480 240 2 3.8 4.2 99.6 7.21 51 57 206.4 1008.1 5 of which to maintain consistency, the core and the reduct is applied for attribute reduc- tion. Thus, the reduced data set is processed with intuitionistic fuzzy proximity relation. To provide a clear understanding, we considered the sample data set presented in Table 4 and employed intuitionistic fuzzy proximity relation. Simultaneously, rough set helps to eliminate the parameters that are superfluous in an information system. The compu- tations are carried out by using Eqnuations (5) and (6) [22]. The results are presented in Table 5 for attribute a (Soil pH) and Table 6 for the attribute a (Moisture), on consider- 1 2 ing the random selection of 55% of the total objects (11 objects) shown in Table 4.The process is repeated for all the 15 attributes present in the considered information sys- tem. Let IR , i = 1, 2, 3, ··· , 15 be the intuitionistic fuzzy proximity relation corresponding to the attributes a , i = 1, 2, 3, ··· , 15. On taking into account the length of the paper, the computation of intuitionistic fuzzy proximity relation for the other attributes is omitted. On considering the degree of membership and non-membership values as α ≤ 0.95, β ≤ 0.3, it can be seen from Table 5 that μ(x , x ) = 1.00, ν(x , x ) = 0; μ(x , x ) = 1 1 1 1 1 2 0.99, ν(x , x ) = 0.01; μ(x , x ) = 1.00, ν(x , x ) = 0; μ(x , x ) = 0.98, ν(x , x ) = 0.01; 1 2 2 3 2 3 3 4 3 4 μ(x , x ) = 0.98, ν(x , x ) = 0.01; μ(x , x ) = 1.00, ν(x , x ) = 0; μ(x , x ) = 0.96, ν(x , 4 5 4 5 5 7 5 7 7 10 7 x ) = 0.02; μ(x , x ) = 0.96, ν(x , x ) = 0.02; μ(x , x ) = 0.98 and ν(x , x ) = 0.01. 10 10 11 10 11 11 1 11 1 Therefore, the objects x , x , x , x , x , x , x , x are (α, β)−indiscernible. Also, the 1 2 3 4 5 7 10 11 object x is not (α, β)−indiscernible with any other objects. Thus, the almost equivalence class generated is given as U/R ={{x , x , x , x , x , x , x , x }, {x }, {x , x }} 1 2 3 4 5 7 10 11 6 8 9 (α,β) In the same way, the computation is conceded for 20 crops (objects) and the almost equiva- lence class obtained for the attributes a , i = 1, 2, 3 ... 15 are given below. It is seen that the attribute values of soil pH (a ) are classified into four categories, namely very high, high, 1 76 A. ANITHA AND D. P. ACHARJYA Table 5. Intuitionistic fuzzy tolerance relation for the attribute a IR x x x x x x x x x x x 1 2 3 4 5 7 8 9 10 11 (α,β) 6 x 1, 0 0.99, 0.01 0.99, 0.00 0.99, 0,00 0.98, 0.01 0.79, 0.11 0.98, 0.01 0.88, 0.06 0.89, 0.06 0.98, 0.01 0.98, 0.01 x 0.99, 0.01 1,0 1.00, 0.00 0.98, 0.01 0.97, 0.02 0.80, 0.10 0.97, 0.02 0.89, 0.06 0.90, 0.05 0.99, 0.01 0.97, 0.01 x 0.99, 0.00 1.00, 0.00 1, 0 0.98, 0.01 0.97, 0.02 0.80, 0.10 0.97, 0.01 0.89 0.06 0.90, 0.05 0.99, 0.01 0.97. 0.01 x 0.99, 0.00 0.98, 0.01 0.98, 0.01 1, 0 0.98, 0.01 0.78, 0.11 0.99, 0.01 0.87, 0.06 0.88, 0.06 0.97, 0.01 0.99, 0.01 x 0.98. 0.01 0.97, 0.02 0.97, 0.02 0.98, 0.01 1, 0 0.77, 0.12 1.00, 0.00 0.86, 0.07 0.87, 0.07 0.96, 0.02 0.99, 0.00 x x 0.79, 0.11 0.80, 0.10 0.80, 0.10 0.78, 0.11 0.77, 0.12 1,0 0.77,0.12 0.91,0.04 0.90,0.05 0.81,0.09 0.77,0.10 6 6 x 0.98,0.01 0.97,0.02 0.97,0.01 0.99,0.01 1.00,0.00 0.77,0.12 1,0 0.86,0.07 0.87,0.07 0.96,0.02 1.00,0.00 x 0.88,0.06 0.89,0.06 0.89,0.06 0.87,0.06 0.86,0.07 0.91,0.04 0.86,0.07 1,0 0.99,0.01 0.90,0.05 0.86,0.07 x 0.89,0.06 0.90,0.05 0.90,0.05 0.88,0.06 0.87,0.07 0.90,0.05 0.87,0.07 0.99,0.01 1,0 0.91,0.04 0.87,0.06 x 0.98,0.01 0.99,0.01 0.99,0.01 0.97,0.01 0.96,0.02 0.81,0.09 0.96,0.02 0.90,0.05 0.91,0.04 1, 0 0.96,0.02 x 0.98,0.01 0.97,0.01 0.97,0.01 0.99,0.01 0.99,0.00 0.77,0.11 1.00,0.00 0.86,0.06 0.87,0.06 0.96,0.02 1, 0 11 FUZZY INFORMATION AND ENGINEERING 77 Table 6. Intuitionistic fuzzy proximity relation for the attribute a IR x x x x x x x x x x x 1 2 3 4 5 7 8 9 10 11 (α,β) 6 x 1, 0 0.78, 0.11 0.80, 0.10 0.96, 0.02 0.84, 0.08 0.76, 0.12 0.92, 0.04 0.81, 0.09 0.84, 0.08 0.67, 0.16 0.92, 0.04 x 0.78,0.11 1, 0 0.98, 0.01 0.82, 0.09 0.61, 0.19 0.54, 0.23 0.70, 0.15 0.59, 0.20 0.61, 0.19 0.45, 0.27 0.70, 0.15 x 0.80,0.10 0.98, 0.01 1, 0 0.84, 0.08 0.63, 0.18 0.56, 0.22 0.71, 0.14 0.61, 0.20 0.63, 0.18 0.47, 0.27 0.71 ,0.14 x 0.96,0.02 0.82,0.09 0.84,0.08 1, 0 0.80, 0,10 0.72,0.14 0.88,0.06 0.77,0.11 0.80,0.10 0.63,0.18 0.88,0.06 x 0.84,0.08 0.61,0.19 0.63,0.18 0.80,0.10 1, 0 0.93,0.04 0.92,0.04 0.98,0.01 1.00,0.00 0.84,0.08 0.92,0.04 x x 0.76,0.12 0.54,0.23 0.56,0.22 0.72,0.14 0.93,0.04 1, 0 0.84,0.08 0.95,0.02 0.93,0.04 0.91,0.05 0.84,0.08 6 6 x 0.92,0.04 0.70,0.15 0.71,0.14 0.88,0.06 0.92,0.01 0.84,0.08 1, 0 0.89,0.05 0.92,0.04 0.75,0.12 1.00,0.00 x 0.81,0.09 0.59,0.20 0.61,0.20 0.77,0.11 0.98,0.01 0.95,0.27 0.89,0.05 1, 0 0.98,0.01 0.86,0.07 0.89,0.05 x 0.84,0.08 0.61,0.19 0.63,0.18 0.80,0.10 1.00,0.00 0.93,0.04 0.92,0.04 0.98,0.01 1, 0 0.84,0.08 0.92,0.04 x 0.67,0.16 0.45,0.27 0.47,0.27 0.63,0.18 0.84,0.08 0.91,0.05 0.75,0.12 0.86,0.07 0.84,0.08 1, 0 0.75,0.12 x 0.92,0.04 0.70,0.15 0.71,0.14 0.88,0.06 0.92,0.04 0.84,0.08 1.00,0.00 0.89,0.05 0.92,0.04 0.75,0.12 1, 0 11 78 A. ANITHA AND D. P. ACHARJYA moderate and low. Alike, the attribute values of other attributes are also classified. U/IR (α,β) ={{x , x , x , x , x , x , x , x , x , x , x , x , x , x }, {x , x }, {x , x }, {x , x }} 1 2 3 4 5 7 10 11 12 14 15 17 19 20 8 9 13 16 6 18 U/IR ={{x , x , x , x , x }, {x , x , x , x }, {x , x , x , x , x , x }, {x , x , x , x }, 1 4 13 15 18 5 6 8 9 2 3 12 14 17 20 7 11 16 19 (α,β) {x }} U/IR ={{x , x , x , x , x , x , x , x , x }, {x , x , x , x , x }, {x , x }, {x , x }, 1 2 3 4 5 8 17 18 19 6 7 9 10 15 11 20 12 13 (α,β) {x }, {x }} 14 16 U/IR ={{x , x , x , x , x , x , x , x , x , x , x , x , x }, {x , x }, {x , x }, {x , x }, 1 2 3 4 7 8 9 10 15 16 17 19 20 5 12 14 18 6 11 (α,β) {x }} U/IR ={{x , x , x , x , x , x , x , x , x , x , x , x , x , x }, {x , x }, {x , x }, 1 2 3 7 9 10 11 12 13 14 15 17 18 19 6 20 8 16 (α,β) {x , x }} 4 5 U/IR ={{x , x , x , x , x , x , x , x , x , x }, {x , x , x }, {x , x }, {x , x , x , 1 2 4 5 10 12 14 18 19 20 6 16 17 3 9 7 11 13 (α,β) x }} U/IR ={{x , x , x , x , x }, {x , x , x , x , x }, {x , x , x , x , x , x }, {x , x , 1 2 13 18 19 3 6 7 16 17 4 5 9 11 14 15 10 12 (α,β) x }, {x }} 20 8 U/IR ={{x , x , x , x , x , x , x , x }, {x , x , x , x }, {x , x , x , x , x }, {x }, 1 3 10 13 14 15 16 18 4 6 17 20 2 5 7 12 19 9 (α,β) {x }, {x }} 11 8 U/IR ={{x , x , x , x , x }, {x , x }, {x , x , x }, {x , x , x , x }, {x , x , x }, {x , 1 11 12 14 19 2 7 3 9 17 4 13 15 16 5 8 10 6 (α,β) x }, {x }} 18 20 U/IR ={{x , x , x , x , x , x , x , x }, {x }, {x , x , x , x }, {x , x , x , x , x , 1 3 4 5 12 14 15 19 2 6 10 13 18 7 8 9 11 17 (α,β) x }, {x }} 16 20 U/IR ={{x , x , x , x , x , x , x , x , x , x , x , x }, {x , x , x , x }, {x , x , x , 1 3 4 7 11 12 14 15 16 17 18 20 2 10 13 19 5 6 8 (α,β) x }} 9 FUZZY INFORMATION AND ENGINEERING 79 U/IR ={{x , x , x , x , x , x , x , x , x }, {x , x , x , x , x , x }, {x , x , x , x }, 1 2 4 5 6 7 12 14 15 3 8 9 13 18 20 10 16 17 19 (α,β) {x }, {x }} 20 11 U/IR ={{x , x , x , x , x , x }, {x , x , x , x }, {x , x , x , x }, {x , x , x }, {x , 1 2 10 14 17 19 3 9 15 16 4 8 11 20 5 6 7 12 (α,β) x , x }} 13 18 U/IR ={{x , x , x , x }, {x , x , x , x , x }, {x , x , x }, {x }, {x , x }, {x , x }, 1 12 17 20 2 7 9 14 15 3 16 19 4 5 8 6 10 (α,β) {x , x , x }} 11 13 18 U/IR ={{x , x }, {x , x , x , x , x , x , x , x , x , x , x , x , x }, {x , x , x , x , 1 12 2 3 4 5 7 9 10 11 13 14 17 19 20 6 8 16 15 (α,β) x }} Unlike the attribute a , the attribute a is categorised into five categories, namely very 1 2 high, high, moderate, low and very low. Similarly, the attributes a , a , a , a , a , a , a , a , 3 4 5 6 7 8 9 10 a , a , a , a , a arecategorisedinto6,5,4,5,5,6,7,6,3,4,6,7and3categories,respec- 11 12 13 14 15 tively. The maximum number of categories is observed to be 7. Let the categories are very high (Vh), high (H), moderate (M), low (L), very low (Vl), poor (P) and negligible (Ne). This con- denses the quantitative information system into qualitative information system, as shown in Table 7. 5.2. Final Stage of Empirical Study The steps involved in the final process of the empirical study are discussed in this section. Predicting the places for cultivating agricultural crops on real data sets is considered as the main objective of this article. We used back propagation feed-forward neural network (BPNN) method for the investigation taken into consideration. The method is based on minimising the mean square error (MSE) and mean percentile error (MPE). The back prop- agation algorithm as discussed in Section 5.4 is used to train the data set. Based on the input attribute values, y and y are computed as discussed in Equations (3) and (4), in out respectively. Back propagation neural network is a supervised learning technique and so the training process can be terminated by declaring certain conditions. The process terminates if the network has procured the average mean square error (MSE) ≤ 0.5 or the number of pre- defined epochs. Generally, the number of neurons in the hidden layer is identified through trial and error basis based on MSE and MPE to get better performance. The weight coeffi- cient is recorded, so as to identify the effect of the number of hidden neurons acquired to map the input space and the output space. The result of recording shows that the best result is obtained at 17th hidden neurons in a single hidden layer architecture. While preserving the number of neurons as 17 and the learning rate as 0.5, the MSE obtained as 0.188 with the number of epochs as 300. It is also observed that on increasing the number of hidden neurons as much as more than 200 and the number of hidden layers ≥ 2, the combina- tions could not achieve the MSE ≤ 0.188. So, the analysis is restricted to 17 hidden neurons 80 A. ANITHA AND D. P. ACHARJYA Table 7. Qualitative information system of sample dataset Obj. a a a a a a a a a a a a a a a Places 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 x H H Vl L M Vl M L M M M Vl P M L Alangayam x H Vh Vl L M Vl M Vl Vl H H Vl P Ne H Anaicut x H Vh Vl L M H Vl L Ne M M L Vl P H Arakonam x H H Vl L L Vl H M P MMVl L Vh H Arcot x H L Vl Vl L Vl H Vl L M L Vl M H H Gudiyatham x L L L M H M Vl M Vh L L Vl M L M Jolarpet x H M L L M L Vl Vl Vl Vl M Vl M Ne H K V Kuppam x M L Vl L Vh Vh L H L Vl L L L H M Kandeli x ML L L M H H P Ne Vl L L Vl Ne H Kaniyambadi x H Vl L L M Vl Vh L L L H H P LH Katpadi x HMHM M L H Vh M M M Vh L Vl H Kaveripakkam x H Vh M Vl M Vl Vh Vl M M M Vl Vh M L Madhanur x Vh H M Vh M L M L P LHLH Vl H Natrampalli x H Vh Vh H M Vl H L M M M Vl P Ne H Nemili x H H L L M L H L P M M Vl Vl Ne M Pemambattu x Vh M P L Vh M Vl L P Vl M H Vl P M Sholingur x H Vh Vl L M M Vl M Ne Vl M H P M H Thimiri x L H Vl H M Vl M L Vh L M L H Vl M Tirupathur x H M Vl L M Vl M Vl M M H H P P H Vellore x H Vh H L H Vl Vh M H Vh M L L M H Walajahpet Figure 5. Number of hidden nodes using MSE. with a single hidden layer. The results of MSE and MPE against the number of neurons are depicted in Figures 5 and 6, respectively. The training model is then tested with rest nine objects x , x , x , x , x , x , x , x , 12 13 14 15 16 17 18 19 x of qualitative information system presented in Table 7. The validation process is pre- sented in Table 8. From Table 8, it is clear that all objects are correctly classified. Thus, the accuracy of the training process is computed as below Supporting objects 9 Accuracy = = = 100% Total number of objects 9 But, in the experimental study, it is observed that the average classification accuracy of 93.7% is achieved on increasing the number of objects to 2193. The validation process along with an experimental comparative study was carried out in Section 6 to check its viability. FUZZY INFORMATION AND ENGINEERING 81 Figure 6. Number of hidden nodes using MPE. Table 8. Validating the training data Obj. a a a a a a a a a a a a a a a Recorded places Observed Places 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 x H Vh M Vl M Vl Vh Vl M M M Vl Vh M L 8 8 x Vh H M Vh M L M L P LHLH Vl H 4 4 x HVh VhH M Vl H L M M M Vl P Ne H 7 7 x HH L L M L H L P M M Vl Vl Ne M 3 3 x Vh M P L Vh M Vl L P Vl M H Vl P M 5 5 x HVh Vl L M M Vl M Ne Vl M H P M H 6 6 x L H Vl H M Vl M L Vh L M L H Vl M 4 4 x H M Vl L M Vl M Vl M M H H P P H 1 1 x H Vh H L H Vl Vh M H Vh M L L M H 5 5 6. Comparative Analysis and Results Experimental analysis has been carried out to get the efficiency of the proposed model, RSIFASANN. The experiments were conducted with a computer having Intel Pentium Pro- cessor, 8GB RAM, Windows 10 operating system and MATLAB R2008a. For analysis purpose, data are collected from Krishi Vigyan Kendra (KVK), Vellore, India. The data for 4799 villages were collected. But after careful observation, it is identified that 2193 villages are having agriculture crop production as their main occupation. The intuitionistic fuzzy proximity rela- tion is employed on whole data for getting almost equivalence classes. This phase changes the quantitative information system to qualitative information system. Further, the quali- tative data set of 2193 objects are validated with the training model. Additionally, we have chosen a model which integrates Bayesian classification and RSFAS (BCRSFAS) [12]. Also, the proposed model is compared with the previous work of hybridising RSFAS with Neural network as (RSFASANN). We have randomly selected 220 objects and predicted the decision using BCRSFAS and the proposed model RSIFASANN. Further, the number of objects is ran- domly increased by 220. The classification accuracy against both the models was checked. The process is repeated till the whole data set of 2193 objects. The results obtained are pre- sented in Table 9. The average accuracy obtained by the proposed model RSIFASANN is 93.7. The accuracy of model RSIFASANN is higher than the accuracy of RSFASANN and the accuracy of RSFASANN is higher than BCRSFAS. 82 A. ANITHA AND D. P. ACHARJYA Table 9. Comparative analysis and results. Supporting objects Accuracy obtained Objects RSIFASANN RSFASANN BCRSFAS RSIFASANN RSFASANN BCRSFAS 220 203 198 184 0.923 0.900 0.836 440 408 399 370 0.927 0.907 0.841 660 616 611 560 0.933 0.926 0.848 880 823 825 750 0.935 0.938 0.852 1100 1031 1030 935 0.937 0.936 0.850 1320 1236 1240 1140 0.936 0.939 0.864 1540 1443 1443 1369 0.937 0.937 0.889 1760 1656 1645 1578 0.941 0.935 0.897 1980 1875 1874 1779 0.947 0.946 0.898 2193 2090 2076 1975 0.953 0.947 0.901 Average accuracy = 0.937 0.931 0.868 Figure 7. Experimental comparative graph. Thecomparative graphisdepictedinFigure 7 for better visualisation. From the above analysis, it is clear that the classification accuracy of RSIFASANN is higher than the other two models and hence can be considered as a better model. 6.1. N-fold Cross-validation Generally, a classifier is induced from the training data using a learning algorithm. It is a known fact that every classifier is associated with some prediction error. But, the prediction error is unknown, and it is difficult to calculate. At the same time, it is essential to estimate the error from the data while analysing the data in training phase. This error which is esti- mated based on the data considered is called the estimated predicted error. This estimated predictor error is to be validated by means of its variance and bias. In the proposed technique, the data set is divided into training (55%) and testing data (45%). Back propagation algorithm is used as the classifier and the estimated predicted error is calculated based on the means square error and mean percentile error, by training the model with varied number of learning rate. The obtained MSE is observed as 0.188 on training the model with one hidden layer. Even though, the model is tested with more than FUZZY INFORMATION AND ENGINEERING 83 Figure 8. Mean square error of Fold 1 for N = 10. Figure 9. Overall mean square error over all folds for N = 10. one hidden layer, but the results are convincing enough to have a single hidden layer. Thus, out of 2193 data, the training data of 1203 data set were trained using back propagation algorithm and the testing data of 990 are tested with the least means square error. Further, the validation is performed using N-fold cross-validation and the results are presented as follows. In N-fold cross-validation, the data set is divided into N-folds, a classifier is learned using (N– 1) folds, and an error value is calculated by testing the classifier in the remaining fold. Finally, the N-CV estimation of the error is the average value of the errors committed in each fold. Thus, the N-CV error estimator depends on two factors: the training set and the partition into folds. The experimental analysis is performed using R language. The data set contains 15 con- ditional at tributes and one predictive attribute. The data set is divided with various number of folds such as N = 10, 15, 20 and 25. The MSE are recorded with respect to various fold values. A sample of the results computed using R language for N = 10 is given in Figure 8, and the overall MSE is recorded in Figure 9. The mean square error obtained for fold 1 is 2.6, whereas the overall mean square error obtained is 2.44. We have analysed the mean square error and overall mean square error on varying N and is presented in Table 10. It is seen from the Table 10 that the average MSE obtained is greater than the aver- age MSE obtained using neural network. Thus, we can say the validation carried out by hybridising rough computing with neural network provides better accuracy in prediction. 84 A. ANITHA AND D. P. ACHARJYA Table 10. Overall mean square error across various folds Number of folds (N) Overall MSE Observations in test set 10 2.44 99 15 2.43 66 20 2.44 49 25 2.44 39 30 2.43 33 Average MSE 0.2436 7. Conclusion In this paper, we hybridised RSIFAS with neural network for the prediction of unseen asso- ciations of attribute values. The initial process of the proposed model reduces quantitative information system to qualitative information system using RSIFAS. The final process pre- dicts the decision of unseen associations using back propagation neural network. The model is analysed over 20 blocks of Vellore district, Tamil Nadu. The experimental analy- sis depicts that the proposed model attained the average classification accuracy of 93.7%, whereas that of BCRSFAS is 86.8%. It indicates that the proposed model has 6.9% more clas- sification accuracy than BCRSFAS. Additionally, it facilitates the farmers to take decision on the crops to be cultivated on their land. Disclosure statement No potential conflict of interest was reported by the author(s). Notes on contributors Dr. A. Anitha is an Associate Professor in the School of Information Technology at VIT, Vellore, India. She received the MCA degree from Adhi Parasakthi College of Science, Kalavai, Tamil Nadu, India. She has published many international journal and conference papers to her credit. Her research interest includes data mining, fuzzy logic, neural network and rough sets. She is associated with the professional bodies CSI. Dr. D. P. Acharjya is a Professor in the School of Computing Sciences and Engineering at VIT, Vellore, India. He received his MSc from NIT, Rourkela, India; M. Tech. in Computer Science from Utkal Uni- versity, India; and PhD in Computer Science from Berhampur University, India. He has been awarded the Gold Medal in M. Sc.; Eminent Academician Award; Outstanding Educator and Scholar Award; The Best Citizens of India Award; and Bharat Vikas Award from various organizations of India. He has authored 84 international and national journal and conference papers. Besides, he has published 4 books and 17 book chapters with international publishers. In addition, he has edited 7 books with international publishers like CRC Press; Springer; and IGI Global, USA. His research interest includes rough sets, knowledge representation, machine learning, bio-inspired computing, and business intel- ligence. He is associated with many professional bodies, such as ACM, IACSIT, IAENG, CSTA, IRSS, CSI, ISTE, OITS, ISIAM, IMS, and AMTI. References [1] Zadeh LA. Fuzzy sets. Inf Control. 1965;8:338–353. [2] Pawlak Z. Rough sets. Int J Comp Inform Sci. 1982;11:341–356. [3] Pawlak Z. Rough sets – theoretical aspects of reasoning about data. Dordrecht: Kluwer Academic Publishers; 1991. FUZZY INFORMATION AND ENGINEERING 85 [4] De SK. Some aspects of fuzzy sets, rough sets and intuitionistic fuzzy sets [PhD Thesis]. Kharagpur: IIT, India; 1999. [5] Acharjya DP, Tripathy BK. Rough sets on fuzzy approximation spaces and applications to dis- tributed knowledge systems. Int J Artif Intell Soft Comput Inderscience. 2008;1(1):1–14. [6] Acharjya DP, Tripathy BK. Rough sets on intuitionistic fuzzy approximation spaces and knowl- edge representation. Int J Artif Int Comput Res. 2009;1 (1):29–36. [7] Molodstov D. Soft set theory-first results. Comp Math Appl. 1999;37(4/5):19–31. [8] Peters J. Near sets-general theory about nearness of objects. Appl Math Sci. 2007;1:2609–2629. [9] Dubois D, Prade H. Rough fuzzy sets and fuzzy rough set. Int J Gen Syst. 1990;17(2/3):191–209. [10] Liu G. Rough set theory based on two universal sets and its applications. Knowl Base Syst. 2010;23:110–115. [11] Smarandache F. Neutrosophic set – a generalization of the intuitionistic fuzzy set. Int J Pure Appl Math. 2005;24:287–297. [12] Acharjya DP, Roy D, Rahaman AM. Prediction of missing associations using rough computing and Bayesian classification. Int J Intell Syst Appls. 2012;4 (11):1–13. [13] Das TK, Acharjya DP. A decision making model using soft set and rough set on fuzzy approxima- tion spaces. Int J Intel Syst Technol Applic . 2014;13(3):170–186. [14] Anitha A, Acharjya DP. Neural network and rough set hybrid scheme prediction of missing associations. Int J Bioinform Res Appl. 2015;11(6):503–524. [15] Ahn BS, Cho SS, Kim CY. The integrated methodology of rough set theory and artificial neural network for business failure prediction. Expert Syst Appl. 2000;18(2):65–74. [16] Rao DVJ, Mitra P. A rough association rule based approach for class prediction with missing attribute values). Proceedings of the 2nd Indian international Conference on Artificial Intelli- gence; 2005. 20–22. [17] Atanasov KT. Intuitionistic fuzzy sets. Fuzzy Sets Syst. 1986;20:87–96. [18] Rumelhart DE, McClelland JL. Parallel distributed processing: exploration in microstructure of cognition. Cambridge: Foundations MIT Press; 1986. [19] Lippmann RP. An introduction to computing with neural nets. IEEE ASSP Mag. 1987;4(1): 4–22. [20] Acharjya DP. Knowledge extraction from information system using rough computing. In: M Usman, editor. Improving knowledge discovery through the integration of data mining tech- niques. IGI Global, Pennsylvania, USA, 2015, p. 161–182. [21] Hecht Nielsen R. Theory of the backpropagation neural network). Proceedings of the interna- tional Joint Conference on neural networks, 1 (1989), 593–605. [22] Tripathy BK, Acharjya DP. Knowledge mining using ordering rules and rough sets on fuzzy approximation Spaces. Int J Adv Sci Techn. 2010;1(3):41–50. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Fuzzy Information and Engineering Taylor & Francis

Agriculture Crop Suitability Prediction Using Rough Set on Intuitionistic Fuzzy Approximation Space and Neural Network

Fuzzy Information and Engineering , Volume 11 (1): 22 – Jan 2, 2019

Agriculture Crop Suitability Prediction Using Rough Set on Intuitionistic Fuzzy Approximation Space and Neural Network

Abstract

Agriculture plays a vital role in Indian economy. On considering the overall geographical space verses population in India, 7% of population is chronicled in Tamilnadu, with 3% of water and 4% of land resources. Thus an automated prediction system becomes essential for predicting the crop based on the nutritional security of the country. In this paper, effort has been made to process the uncertainties by hybridizing rough set on intuitionistic fuzzy approximation space (RSIFAS) [Acharjya DP,...
Loading next page...
 
/lp/taylor-francis/agriculture-crop-suitability-prediction-using-rough-set-on-CANxkkDN76
Publisher
Taylor & Francis
Copyright
© 2021 The Author(s). Published by Taylor & Francis Group on behalf of the Fuzzy Information and Engineering Branch of the Operations Research Society.
ISSN
1616-8666
eISSN
1616-8658
DOI
10.1080/16168658.2021.1886813
Publisher site
See Article on Publisher Site

Abstract

FUZZY INFORMATION AND ENGINEERING 2019, VOL. 11, NO. 1, 64–85 https://doi.org/10.1080/16168658.2021.1886813 Agriculture Crop Suitability Prediction Using Rough Set on Intuitionistic Fuzzy Approximation Space and Neural Network a b A. Anitha and D. P. Acharjya a b School of Information Technology and Engineering, VIT Vellore, India; School of Computer Science and Engineering, VIT Vellore, India ABSTRACT ARTICLE HISTORY Received 13 October 2016 Agriculture plays a vital role in Indian economy. On considering the Revised 25 March 2017 overall geographical space verses population in India, 7% of popu- Accepted 10 October 2018 lation is chronicled in Tamilnadu, with 3% of water and 4% of land resources. Thus an automated prediction system becomes essen- KEYWORDS tial for predicting the crop based on the nutritional security of the Almost indiscernible; country. In this paper, effort has been made to process the uncertain- intuitionistic fuzzy proximity relation; neural network; ties by hybridizing rough set on intuitionistic fuzzy approximation knowledge discovery; space (RSIFAS) [Acharjya DP, Tripathy BK. Rough sets on intuitionis- prediction; rough set tic fuzzy approximation spaces and knowledge representation. Int J Artif Int Comput Res. 2009;1 (1):29–36.] and neural network [Hecht NR. Theory of the backpropagation neural network. Proceedings of the international Joint Conference on neural networks, 1 (1989), 593–605.]. RSIFAS identifies the almost indiscernibility among the natural resources, and helps in reducing the computational proce- dure on employing data reduction techniques whereas neural net- work helps in prediction process. It helps to find the crops that may be cultivated based on the available natural resources. The proposed model is analyzed on data accumulated from Vellore district of Tamil- nadu, India and achieved 93.7% of average classification accuracy. The model is compared with earlier models and found 6.9% better accuracy while prediction. 1. Introduction In India, for over 58.4% of its population, agriculture is the principal means of livelihood. In addition, the agricultural merchandises are considered as the main commodity for the international trading. To sustain the growth of the Indian economy, there is a need for a drastic growth in agriculture productivity. For agriculture, the land and water are the main resources, which are inadequate in nature. Consequently, it is necessary to devise a lucrative cropping system with the accessible resources and to increase the productivity. Ever since, the market competition is high, a premeditated planning is mandatory to improve the per- formance to accomplish a profitable yield in the cropping system. The perfect planning in the development and production of the cropping system may step back due to uncer- tainty in forecasting the harvesting and demand for the crop. Therefore, to investigate the CONTACT D. P. Acharjya dpacharjya@gmail.com © 2021 The Author(s). Published by Taylor & Francis Group on behalf of the Fuzzy Information and Engineering Branch of the Operations Research Society, Guangdong Province Operations Research Society of China. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons. org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. FUZZY INFORMATION AND ENGINEERING 65 information for future planning can be achieved by a prediction model. A prediction model developed with the prior knowledge gives more accuracy towards the real-life situations. Thus, the proposed prediction model is based on soil and water resources available in some region to forecast the production of agricultural crops, with reduced risk of loss. Due to the lack of natural and human resources, many farmers agree themselves to alter the agricul- ture land into marketable land. This attitude has to be changed so as to retain the farmers and especially young generation to take up agriculture as their main occupation, and the income from the farm holding should be amplified significantly. The area of study of this paper restricted to Vellore district in Tamil Nadu where agricul- ture is the main vocation. The small and marginal farmers in this region play a key role in the overall improvement in agriculture towards the development of the Indian economy. Thus, the adoption of appropriate cropping system by these farmers needs to be focused. Indian government has taken some productivity measure to improve the crop production by: training the farmers, relaxing the seed cost and loan amount etc. To tackle the increasing competency, it becomes more essential to develop a crop suitability information system to improve the productivity, and profit for the farmers. To develop such an effective sys- tem, data collected from various sources such as soil, water, seedling methodologies and meteorological conditions must be analysed properly instead of saving as archives. Analysing data and discovering knowledge is a challenging and increasingly important task as it contains uncertainties. Additionally, it is not always useful to users as it may not certainly satisfy user’s choice due to the presence of redundancy, inconsistency and vague- ness. Many traditional tools used for discovering knowledge are deterministic, crisp and precise. Thus, it is essential to use some intelligent techniques so as to process the uncer- tainties present in the data. The emergence of intelligent computing techniques like fuzzy set [1], rough set [2,3], rough set on fuzzy approximation space (RSFAS) [4,5], rough set on intuitionistic fuzzy approximation space (RSIFAS) [6], soft set [7], near set [8], fuzzy rough set [9], rough set on two universal sets [10], neutrosophic set [11] etc. plays a vital role in knowledge discovery. Further, RSFAS is hybridised with Bayesian classification, soft set and neural network [12,13,14,15,16] in the development of prediction system. In this paper, effort has been taken to predict decisions from the uncertain and impre- cise data by means of RSIFAS and neural network. The concept of RSIFAS is based on almost indiscernibility present in the data set. The objects in the information system are approxi- mated by a pair of sets, called as lower and upper approximations based on the intuitionistic fuzzy proximity relation. The motivation behind the utilisation of RSIFAS is to obtain (α, β)- equivalence classes, where the attribute values are not qualitative. Further, the classified information system is trained and tested with back propagation neural network that com- forts to explore decisions for unknown associations of the attribute values. This helps us to predict a specific crop that is to be cultivated in a specific area on deliberating various conditions such as soil, water characteristics and rainfall. The remaining part of the paper is planned accordingly: Section 2 presents basics of RSI- FAS, whereas Section 3 discusses the basics of feed-forward back propagation neural net- work. The proposed research design is presented in Section 4. Section 5 deals with analyses of the performance of the trained data with the testing data according to known feature values. An experimental comparative study of the proposed model with various existing techniques is given in Section 6. The paper is concluded by a conclusion in Section 7. 66 A. ANITHA AND D. P. ACHARJYA Table 1. Information system. Avg. relative Avg. evaporation Objects Max temp (a ) Min temp (a ) Avg. wind speed (a ) humidity (a ) rate (a ) 1 2 3 4 5 x 36.6 20.9 8.6 73 4.4 x 36.9 23.1 7.4 72 2.8 x 43.7 24.8 6.2 70 2.6 x 46.9 27.4 3.1 67 3.4 x 46.1 27.2 7.4 62 5.1 x 45.4 26.4 8.9 56 4.2 2. An Information System Procuring knowledge for classification is one of the most essential intentions of data mining and inductive learning. But, in real-life problems, it is not enough to deal with simple clas- sification as it contains uncertainties. To deal with such problems, the classification using RSIFAS was introduced. Before, we discuss the classification power of RSIFAS, one should know about an information system. An information system is a table that offers a suit- able way to describe in detail about the finite set of objects of the universe by finite set of attributes thereby representing all available information and knowledge. From the view of rough set theory, it is common in defining the information system as a data set repre- sented as a table in which every column head represents an attribute that can be measured for each object. More formally, an information system is a quadruple IS = (U, A, V, f ), where U ={x , x , 1 2 ··· , x } is a non-empty finite set of objects called the universe and A ={a , a , ··· , a } is a n 1 2 m non-empty finite set of attributes, V = ∪ V , where V is the set of values that attribute a ∈ a a a∈A A may take. The mapping f : (U × A) → V provides the information about each object. a a Further, if A = (C ∪ D), where C is the set of conditional attributes and D is the decision attribute, we call the information system as decision system. For example, consider the information system as shown in Table 1 where each attribute values are quantitative rather qualitative. It is clear that the attribute values are almost identical rather that matching each other. To deal with such almost similarity, the concept of RSIFAS is introduced. 2.1. Foundations of Rough Set on Intuitionistic Fuzzy Approximation Space Pawlak’s rough set [2] is used to identify the indiscernibility between the attribute values with the help of an equivalence relation. But, in several real-life applications, it is observed that the values of the attributes are not exactly the same but almost the same. To decide the amount of identity between two attribute values, the equivalence relation is replaced with fuzzy tolerance relation on each domain of attributes [4]. Again, it fails to include hes- itation that may arise during the knowledge extraction phase. Therefore, fuzzy tolerance relation is further replaced with intuitionistic fuzzy tolerance relation and the concept of RSIFAS was introduced [6]. For example, on a particular period of time if the maximum temperatures at two different places are 36.6°C and 36.9°C, then the temperatures at these places are approximately identical rather than completely identical. At this instant, RSIFAS reduces to RSFAS if there is no hesitation. Similarly, RSIFAS reduces to rough set if there is no hesitation and the attribute values are exactly the same. Therefore, RSIFAS generalises FUZZY INFORMATION AND ENGINEERING 67 Pawlak’s approach of indiscernibility. To disclose the article, foundations such as notions and concepts of RSIFAS are briefly presented in this section. Let (U = ϕ) be a non-empty finite set of discourse called universe and x is a particu- lar element of U. An intuitionistic fuzzy set X of U is defined as {x, μ (x), ν (x)}, where X X μ : U → [0, 1] and ν : U → [0, 1] defines the degree of membership and degree of non- X X membership, respectively, for every element x ∈ U such that 0 ≤ μ (x)+ ν (x) ≤ 1. The X X value π (x) = 1 − (μ (x) + ν (x)) is called the hesitation part, which may cater either X X X membership value or non-membership value or both. For simply, we will use (μ (x), ν (x)) X X to denote the intuitionistic fuzzy set X [17]. An intuitionistic fuzzy relation IR on U is an intuitionistic fuzzy set defined on (U × U) characterised by the membership μ and the non-membership ν where IR IR IR ={(μ (x , x ), ν (x , x ))|x , x ∈ U} IR i j IR i j i j An intuitionistic fuzzy relation IR on U is said to be an intuitionistic fuzzy (IF) proximity relation if it satisfies the following conditions, where μ (x , x ) represents the degree of IR i j membership and ν (x , x ) represents the degree of non-membership between two objects IR i j x and x . i j (1) μ (x , x ) = 1and ν (x , x ) = 0 for all x ∈ U IR i i IR i i i (2) μ (x , x ) = μ (x , x ),and ν (x , x ) = ν (x , x ), for all x , x ∈ U. IR i j IR j i IR i j IR j i i j Let J ={(α, β)|α, β ∈ [0, 1]} and 0 ≤ α + β ≤ 1. Then for any(α, β) ∈ J,the (α, β) − cut is given as IR ={(x , x )|μ (x , x ) ≥ α and ν (x , x ) ≤ β}. We say that the two objects x α,β i j IR i j IR i j i and x are (α, β)−similar with respect to IR if (x , x ) ∈ IR and we write x IR x .Two j i j (α,β) i (α,β) j objects x and x are said to be (α, β)−identical with respect to IR, if there exists a sequence i j of elements u , u , ··· , u in U such that x IR u , u IR u , ··· , u IR x .Inthe 1 2 n i (α,β) 1 1 (α,β) 2 n (α,β) j above case, we say that x is transitively (α, β)−similar to x with respect to IR. It is clearly i j seen that for any (α, β) ∈ J,IR is an equivalence relation on U. Let us denote IR be (α,β) (α,β) the set of equivalence classes generated by the equivalence relation IR .The IR - (α,β) (α,β) equivalence class of an element x in U is denoted as [x] .The pair K = (U,IR(α, β)) is (α,β) called an intuitionistic fuzzy approximation space [6]. Let X ⊆ U. Then the (α, β)-lower and (α, β)-upper approximation of X in the generalised α,β α,β approximation space K = (U,IR(α, β)) is denoted as (X , X ), where L U α,β X =∪{Y|Y ∈ IR and Y ⊆ X} (1) L α,β α,β X =∪{Y|Y ∈ IR and Y ∩ X = ϕ} (2) U α,β α,β α,β A given set X is said to be (α, β)− rough if and only if X = X . Likewise, a given set X is U L α,β α,β said to be (α, β)−crisp if X = X . Equivalently, a set X is said to be (α, β)−rough if the U L α,β α,β α,β α,β boundary BND = X − X such that BND = ϕ. U L IR IR 3. Feed-forward Back Propagation Neural Network Artificial neural networks (ANN) are a model inspired by the organisation of the human brain. It is generally presented as a system of interconnected simple processing elements 68 A. ANITHA AND D. P. ACHARJYA called neurons. It has gone far away from the biological stimulations in exchanging the mes- sages between neurons. The exchanging of messages is carried out by every neuron in the network after receiving the input signal from the environment. The input signal is processed through hidden neurons and finally sent as output signal. Each neuron is connected with at least one neuron, and each connection have numeric weights [18,19]. These weights are generally tuned in the training phase. This makes the network adaptive to input and capa- ble of learning. The learning process is evaluated by a value called weight coefficient. The set of input neurons is activated by activation function and is passed to the other neurons in the next layer. This process is repeated until the desired output neuron is approximated. The construction of the feed-forward neural network is essential in categorising, estab- lishing and summarising data. The architecture consists of three layers such as input layer, hidden layer and output layer. The input layer is the first layer where the input is fed in to the network, whereas the output layer is the last layer where the desired output is produced. The layer(s) present in between the input and output layer are called hidden layers. The net- work is constructed as such of the human brain as each neuron in one layer is connected with all the neurons in the next layer. The interconnection initiated by the input layer and the mapping of input layer and the net layer is characterised by the weight coefficient. More formally, the input from the ith node of the input layer to the jth node in the next hidden layer is denoted as a . The connection from the ith node to the jth node is characterised by the weight coefficient w and the threshold coefficient v of the ith neuron. Based on all the ij i inputs, each node determines a net input value y by using Equation (3). The output value in y of the ith neuron is determined by Equation (4), where g(y ) is the sigmoid function io in which acts as the activation function in the back propagation neural network y = v + w a (3) in i ij i y = g(y ) (4) io in 4. Research Design Development Research design development and problem definition is most significant in applied research. It includes collection of data, preparation of data, removal of noise, classifica- tion, identification of techniques, validation of the model and moreover comparison of the model with the existing models. The proposed model consists of two stages. In the initial stage, RSIFAS is used for data classification whereas in the final stage, back propagation algorithm of feed-forward neural network is used for the prediction of unseen associations of attribute values. An abstract view of the proposed research design is depicted in Figure 1. Before we process data at the initial stage, a sequence of cleaning task such as abstract- ing noise, consistency check and data plenary are carried out to ascertain that the data are as precise as possible. The target data are processed using intuitionistic fuzzy tol- erance relation to obtain almost indiscernibility of data values for each attributes. The classification generated produces the (α, β)-equivalence classes, where α is the degree of belongingness and β the degree of non-belongingness, respectively. It is obvious that the degree of belongingness must be high and degree of non-belongingness must be low to get good appropriate classification. On making the belongingness as 1 (100%) and non- belongingness as 0, the model fails to analyse the information system as each classification will contain exactly one object. It is because of the attribute values present in the system are FUZZY INFORMATION AND ENGINEERING 69 Figure 1. The proposed research design. non-qualitative. The membership and non-membership relation have been premeditated such that the sum of their values lies between [0, 1] and additionally, these functions must be symmetric. The empirical study that we consider is related to crop suitability prediction of Vellore district of Tamil Nadu. The information system contains attributes such as soil pH, moisture, organic matter etc. It provides information about various agriculture contingency factors of different places along with the crops that are cultivated in these places. A place may not be rich in all agriculture contingency factors for the production of any type of crops. However, out of these, some agriculture contingency factors may have greater importance for the production of a particular crop than the others. On varying the values of α and β, the factors may deviate from each other. Indeed, if we decrement the value of α and incre- ment the value of β, progressively the number of factors shall become indispensable. The membership and non-membership relation have been premeditated such that the sum of their values lies between [0, 1] and additionally, these relations must be symmetric. The first requirement necessitates a major of 2 in the denominators of the non-membership functions [6,20]. The degree of belongingness (μ) and the degree of non-belongingness (ν) between two objects x and x is defined in Equations (5) and (6), respectively, where V is the value i j of the object x for the attribute a i i x j |V − V | a a i i μ (x , x ) = 1 − (5) R i j Max(V ) x j |V − V | a a i i ν (x , x ) = 1 − (6) R i j 2 × Max(V ) The reduced qualitative information system is divided into training data set of 55% and testing data set of 45%. The training data set is alimented into neural network to predict the decision for the new unseen objects. The testing data are used to validate the training phase and to ensure higher accuracy. The article uses back propagation neural network in the final stage to obtain the decisions. The process consists of three layer such as input layer, hidden layer and output layer, as shown in Figure 2. The attribute values, a ;1 ≤ i ≤ m of i 70 A. ANITHA AND D. P. ACHARJYA Figure 2. Design of back propagation neural network. the training data set are fed in the input layer. In the subsequent hidden layer, the actual mapping between the input and output layer is carried out. The number of hidden nodes is generally computed based on trial and error bases based on mean square error and mean percentile error. Let us assume total number of hidden nodes as h. Let us denote hidden node as z ;1 ≤ j ≤ h. The output nodes are denoted as d ;1 ≤ k ≤ n, where n is the total j k number of objects in the training data set. The feed-forward back propagation algorithm [21] is basically gradient descent model where the local minima are identified to converge the input, to the output functions. To facilitate this mean square error between the desired, and actual output is calculated to be minimum. This learning consists of two computational phases such as forward pass and backward pass. Forward pass is a feed-forward propagation of the inputs through the network. The following notions are used in the back propagation algorithm. A = {a , a , a , ··· , a , ··· a }: input attribute values (Training vector); where m = 15; 1 2 3 i m d ={d , d , d , ··· , d , ··· d }: observed decisions (Target vector); 1 2 3 n T = {t , t , t , ··· , t }: actual decisions; 1 2 3 m z : hidden node where ; v: random weight vector connecting the input and hidden layer; w: random weight vector connecting the hidden and output layer; bh : bias on hidden unit, bo : bias on output unit err : error at output node err : error at hidden node z ; j j v: weight correction term at the input layer ; [v ] ;1 ≤ i ≤ m; ij m×h w: weight correction term at the hidden layer ; [w ] ; jk h×n LR: learning rate; E : maximum number of epochs required for training; max FUZZY INFORMATION AND ENGINEERING 71 epoch: one training loop on considering all the input vector Algorithm 1(Back Propagation Algorithm) Input: Training Vector ‘A’, bias on hidden unit ‘bh’, learning rate ‘LR’ Output: The trained data set. 1. Initialise weight vector of the input layer v = [v ] by small random values, typically ij m×h between −1 and 1; i.e. −1 ≤ v ≤ 1. ij 2. Initialise weight vector of the hidden layer w = [w ] by small random values, not jk h×n necessarily between −1and 1. 3. Initialise mean square error, MSE = 0; epoch = 0 and learning rate LR. 4. Each input unit receives the input value a and transmits this value to all units in the hidden layer. 5. Each hidden unit z ;1 ≤ j ≤ h, compute its interconnection weight z as defined j in below: z = bh + (a × v ) in j i ij i=1 Apply activation function to all the interconnection weight z , i.e. z = g(z ) and in j in j j transmits these values to all the units in the output layer. 6. Each output unit d ; k = 1, 2, ······ , n compute its interconnection weight d as k in defined below d = bo + z × w in j h jk j=1 Apply activation function to all the interconnection weight d ; d = g(d ). in in k k 7. For each output unit d ; k = 1, 2, ······ , n, compute the mean square error MSE, and average mean square error (AMSE), is given MSE MSE = MSE + (t − d ) ;ASME = k k Increase epoch by 1, i.e. epoch = epoch + 1 8. If (AMSE ≤ 0.5) or (epoch = E ), then stop training; else repeat steps 9–12. max 9. Each output unit d ; k = 1, 2, ······ , n receives a target pattern corresponding to an input pattern. Compute the error term as given below δ = d (1 − d )(t − d ) k k k k k 10. Each hidden unit z ; j = 1, 2, ······ , h compute its error interconnection weight as defined below δ = δ w in k jk k=1 The error information term can be calculated as δ = δ z (1 − z ) j in j j j 72 A. ANITHA AND D. P. ACHARJYA 11. Each output unit d ; k = 1, 2, ······ , n updates its weights by using weight connection term w as jk w = LR ∗ δ z for j = 1, 2, ··· , h jk k j The bias correction term bo ,given as bo = αδ . Thus, we have k k k w (new) = w (old) + w jk jk jk and bo (new) = bo (old) + bo k k k 12. Each hidden unit z ; j = 1, 2, ······ , h updates its weights by using weight correction term v as below ij v = LR ∗ δ a for i = 1, 2, ··· , m ij j i The bias correction term bh ,given as bh = αδ . Thus, we have the following j j j equations and then go to step 4 and repeat the process. v (new) = v (old) + v ij ij ij and bh (new) = bh (old) + bh j j j 5. An Empirical Study on Crop Suitability Prediction The major objective of the research model taken in to consideration is to analyse and to predict the suitable place for cultivating the agriculture crop to yield maximum benefit with the existing resources on a various period of time. Usually, a layman depends on some agriculture research centre or some advice from the agriculture officers to lay the crops on their land. But in practical, it is time-consuming process. The proposed model act as a tool for a layman to identify the crop to be cultivated in a place based on the richness of vari- ous components of the specific crop. To make apparent research model, we considered a real-life problem pertaining to crop cultivation in Vellore district of Tamil Nadu. Historical data from 2011 to 2014 of Krishi Vigyan Kendra of Vellore district are collected. The major resource such as soil and land classification is considered based on the survey of agricul- ture department of Vellore district, Tamil Nadu. Additionally, Tamil Nadu state agriculture departments has divided Tamil Nadu into seven agro-climate zones such as cauvery delta zone, north-eastern zone, western zone, north western zone, high altitude zone, southern zone and high rainfall zone based on various components such as rainfall, soil, irrigation, another physical and ecological features. Among this, Vellore district is categorised under north-eastern zone which entertains an average rainfall of 1099.1 mm per year. The index map as per Krishi Vigyan Kendra of the study area is depicted in Figure 3. Furthermore, Vellore district has been distributed into nine agricultural divisions in 2011 and is further separated into 20 blocks. A total of 4799 villages of 20 blocks were docu- mented according to Krishi Vigyan Kendra whose main occupation is agriculture. Most of the villages produce major agricultural crops such as paddy, cholam, cumbu, ragi, samai, red gram, black gram etc. Apart from this, some villages produce horticulture crops such as banana, mango, guava, sapota etc. as fruit crops and also vegetable crops such as brinjal, tomato, onion, sweet potato etc. Some also yield flower crops and spices such as jasmine, chrysanthemum, marigold and chillies, turmeric, respectively. In this paper, effort has been FUZZY INFORMATION AND ENGINEERING 73 Figure 3. Index map of the study area. Figure 4. Administrative blocks of the study area. taken to collect data from some villages whose main occupation is agriculture. The admin- istrative block boundary map of Vellore district in 2009 on which the study is carried out is shown in Figure 4. For better understanding, the agriculture divisions along with the blocks are presented in Table 2. The most common attributes for crop production of Vellore district includes, soil com- ponent, water components, rainfall during north-east monsoon, rainfall during south-west monsoon, organic manure, moisture etc. Soil and water components are different at var- ious places and depend on several factors. So, it is essential to identify the availability of NPK (Nitrogen, Phosphorus, Potassium) ratio on soil at congruous stage afore cultivation. It minimises the use of inorganic chemical fertilisers. These parameters form the attribute set 74 A. ANITHA AND D. P. ACHARJYA Table 2. Agricultural divisions in Vellore district S. no. Agriculture division Blocks 1. Vellore (1) Vellore, Kaniyambadi, Anaicut 2. Gudiyatham (2) Gudiyatham, K.V.Kuppam and Katpadi 3. Vaniyambadi (3) Alangayam, Madhanur and Pernambattu 4. Tirupathur (4) Tirupathur, Kandhili, Natrampalli and Jolarpet 5. Walajah (5) Walajah and Sholingur 6. Arcot (6) Arcot and Thimiri 7. Arakonam (7) Arakonam, Nemili, Kaveripakkam 8 Ambur (8) Madhanur 9 Katpadi (9) K.V. Kuppam, and Katpadi Table 3. Notation representation table Attributes Abbreviation Notation Possible values Max value Soil pH SPH a [5.4–8.5] 8.5 Moisture MOI a [5–12.2] 12.2 Organic matter OM a [0.65–1.98] 1.98 Nitrogen N a [ 200–800] 800 Phosphorous P a [ 40–533] 533 Potassium K a [115–1045] 1045 Copper Cu a [0.05–2] 2 Zinc Zn a [ 0.01–2] 2 Manganese Mn a [0.7–4.6] 4.6 Iron Fe a [1.98–99.6] 99.6 Water pH WPH a [6.2–8.5] 8.5 Calcium Ca a [11–420] 420 Nitrate NO a [ 16–140] 140 3 13 Magnesium rain Mg a [ 21–280] 280 Rainfall R R a [ 773.4–1111.2] 1111.2 Places PL d – of analysis. The data collected from Krishi Vigyan Kendra and agriculture department are consolidated and presented in Tables 3 and 4. Table 3 represents the notations of various attributes, possible values and max range value of each attribute whereas Table 4 depicts the consolidated sample data considered to our study. The information system presented in Table 4 provides the information about 20 crops that are cultivated at various blocks of agriculture divisions of Vellore district. The infor- mation system contains essential attributes such as soil pH, moisture, organic matter etc. whereas objects are considered as crops. The decision attribute is considered as agricultural division where the particular crop is essentially cultivated to get maximum yield. The main objective of this study is to help farmers in identifying the crops suitable for their land. But the maximum yield rate depends on the various components like soil, water, rainfall etc. But, land and water are the crucial resource in nature. Additionally, a cultivation land may not rich in all the parameters to engender highest productivity. But, these factors are almost indiscernible and hence can be classified by using intuitionistic fuzzy proximity relation. 5.1. Initial Stage of an Empirical Study This section demonstrates the proposed model by considering data collected from Krishi Vigyan Kendra for extracting information. The collected data contains 26 attributes, out FUZZY INFORMATION AND ENGINEERING 75 Table 4. Sample agriculture information system Obj. a a a a a a a a a a a a a a a Places 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 x 7.3 9 0.96 350 200 160 1.2 1.8 3.2 61.2 7.3 20 16 226.14 787.9 3 x 7.2 11.7 0.99 450 130 115 1.2 1.09 1.088 75 8.5 21 17 25 1045.4 1 x 7.21 11.5 0.91 360 200 645 0.5 1.5 0.8 69 7.1 72.3 45 77.3 1111.2 7 x 7.35 9.5 0.78 432 40 150 1.6 3.3 1.7 61 7.36 23 63 280 1052.2 6 x 7.5 7 0.78 200 44 162.86 0.05 1.2 2.49 57 6.3 39 78 259 995 2 x 5.4 6.1 1.23 560 476 486 0.5 3.5 4.6 47 6.2 40.8 84 166 890 4 x 7.47 8 1.32 475 120 310 0.45 1.1 1.2 1.98 7.43 11 78 26 999.3 2 x 6.2 6.7 0.98 345 527 1045 0.9 4.7 2.7 2.2 6.35 80 56 250 894.3 4 x 6.3 7 1.2 401 222 672 0.05 0.01 0.7 8.4 6.35 53.8 45 21 1037.5 1 x 7.1 5 1.32 400 160 160 1.9 2.1 2.4 51.1 8.3 148 25 176.63 1004.4 2 x 7.45 8 1.67 540 242 370 1.5 5 3.3 8.8 7.41 420 56 110 998.7 7 x 7.2 11.9 1.53 200 160 220 1.8 1.2 3.4 61.4 7.4 20 140 211.55 773.4 3 x 8.5 10 1.52 800 190 340 1.2 2.4 1.8 45 8.31 70 130 120 999 4 x 7.32 12.1 1.98 645 140 120 1.4 2.2 3.5 64 7.42 12 18 27 1008.6 7 x 7.4 9 1.32 450 160 325 1.6 1.6 1.8 62.1 7.2 45 41 23 885.2 3 x 8.47 8 0.65 340 533 477 0.51 2.5 1.68 7.57 7 138 45 71 891.2 5 x 7.1 11.8 0.98 340 349 476 0.5 3.6 0.8 4.5 7.2 128 23 211 1012.6 6 x 5.5 10 0.88 650 170 150 1.1 1.9 4.6 51.5 7.28 60 126 130 880.5 4 x 7.2 8 0.92 460 120 140 1.1 1.2 3.2 60.2 8.11 118 24 69.5 1032.2 1 x 7.21 12.2 1.68 340 480 240 2 3.8 4.2 99.6 7.21 51 57 206.4 1008.1 5 of which to maintain consistency, the core and the reduct is applied for attribute reduc- tion. Thus, the reduced data set is processed with intuitionistic fuzzy proximity relation. To provide a clear understanding, we considered the sample data set presented in Table 4 and employed intuitionistic fuzzy proximity relation. Simultaneously, rough set helps to eliminate the parameters that are superfluous in an information system. The compu- tations are carried out by using Eqnuations (5) and (6) [22]. The results are presented in Table 5 for attribute a (Soil pH) and Table 6 for the attribute a (Moisture), on consider- 1 2 ing the random selection of 55% of the total objects (11 objects) shown in Table 4.The process is repeated for all the 15 attributes present in the considered information sys- tem. Let IR , i = 1, 2, 3, ··· , 15 be the intuitionistic fuzzy proximity relation corresponding to the attributes a , i = 1, 2, 3, ··· , 15. On taking into account the length of the paper, the computation of intuitionistic fuzzy proximity relation for the other attributes is omitted. On considering the degree of membership and non-membership values as α ≤ 0.95, β ≤ 0.3, it can be seen from Table 5 that μ(x , x ) = 1.00, ν(x , x ) = 0; μ(x , x ) = 1 1 1 1 1 2 0.99, ν(x , x ) = 0.01; μ(x , x ) = 1.00, ν(x , x ) = 0; μ(x , x ) = 0.98, ν(x , x ) = 0.01; 1 2 2 3 2 3 3 4 3 4 μ(x , x ) = 0.98, ν(x , x ) = 0.01; μ(x , x ) = 1.00, ν(x , x ) = 0; μ(x , x ) = 0.96, ν(x , 4 5 4 5 5 7 5 7 7 10 7 x ) = 0.02; μ(x , x ) = 0.96, ν(x , x ) = 0.02; μ(x , x ) = 0.98 and ν(x , x ) = 0.01. 10 10 11 10 11 11 1 11 1 Therefore, the objects x , x , x , x , x , x , x , x are (α, β)−indiscernible. Also, the 1 2 3 4 5 7 10 11 object x is not (α, β)−indiscernible with any other objects. Thus, the almost equivalence class generated is given as U/R ={{x , x , x , x , x , x , x , x }, {x }, {x , x }} 1 2 3 4 5 7 10 11 6 8 9 (α,β) In the same way, the computation is conceded for 20 crops (objects) and the almost equiva- lence class obtained for the attributes a , i = 1, 2, 3 ... 15 are given below. It is seen that the attribute values of soil pH (a ) are classified into four categories, namely very high, high, 1 76 A. ANITHA AND D. P. ACHARJYA Table 5. Intuitionistic fuzzy tolerance relation for the attribute a IR x x x x x x x x x x x 1 2 3 4 5 7 8 9 10 11 (α,β) 6 x 1, 0 0.99, 0.01 0.99, 0.00 0.99, 0,00 0.98, 0.01 0.79, 0.11 0.98, 0.01 0.88, 0.06 0.89, 0.06 0.98, 0.01 0.98, 0.01 x 0.99, 0.01 1,0 1.00, 0.00 0.98, 0.01 0.97, 0.02 0.80, 0.10 0.97, 0.02 0.89, 0.06 0.90, 0.05 0.99, 0.01 0.97, 0.01 x 0.99, 0.00 1.00, 0.00 1, 0 0.98, 0.01 0.97, 0.02 0.80, 0.10 0.97, 0.01 0.89 0.06 0.90, 0.05 0.99, 0.01 0.97. 0.01 x 0.99, 0.00 0.98, 0.01 0.98, 0.01 1, 0 0.98, 0.01 0.78, 0.11 0.99, 0.01 0.87, 0.06 0.88, 0.06 0.97, 0.01 0.99, 0.01 x 0.98. 0.01 0.97, 0.02 0.97, 0.02 0.98, 0.01 1, 0 0.77, 0.12 1.00, 0.00 0.86, 0.07 0.87, 0.07 0.96, 0.02 0.99, 0.00 x x 0.79, 0.11 0.80, 0.10 0.80, 0.10 0.78, 0.11 0.77, 0.12 1,0 0.77,0.12 0.91,0.04 0.90,0.05 0.81,0.09 0.77,0.10 6 6 x 0.98,0.01 0.97,0.02 0.97,0.01 0.99,0.01 1.00,0.00 0.77,0.12 1,0 0.86,0.07 0.87,0.07 0.96,0.02 1.00,0.00 x 0.88,0.06 0.89,0.06 0.89,0.06 0.87,0.06 0.86,0.07 0.91,0.04 0.86,0.07 1,0 0.99,0.01 0.90,0.05 0.86,0.07 x 0.89,0.06 0.90,0.05 0.90,0.05 0.88,0.06 0.87,0.07 0.90,0.05 0.87,0.07 0.99,0.01 1,0 0.91,0.04 0.87,0.06 x 0.98,0.01 0.99,0.01 0.99,0.01 0.97,0.01 0.96,0.02 0.81,0.09 0.96,0.02 0.90,0.05 0.91,0.04 1, 0 0.96,0.02 x 0.98,0.01 0.97,0.01 0.97,0.01 0.99,0.01 0.99,0.00 0.77,0.11 1.00,0.00 0.86,0.06 0.87,0.06 0.96,0.02 1, 0 11 FUZZY INFORMATION AND ENGINEERING 77 Table 6. Intuitionistic fuzzy proximity relation for the attribute a IR x x x x x x x x x x x 1 2 3 4 5 7 8 9 10 11 (α,β) 6 x 1, 0 0.78, 0.11 0.80, 0.10 0.96, 0.02 0.84, 0.08 0.76, 0.12 0.92, 0.04 0.81, 0.09 0.84, 0.08 0.67, 0.16 0.92, 0.04 x 0.78,0.11 1, 0 0.98, 0.01 0.82, 0.09 0.61, 0.19 0.54, 0.23 0.70, 0.15 0.59, 0.20 0.61, 0.19 0.45, 0.27 0.70, 0.15 x 0.80,0.10 0.98, 0.01 1, 0 0.84, 0.08 0.63, 0.18 0.56, 0.22 0.71, 0.14 0.61, 0.20 0.63, 0.18 0.47, 0.27 0.71 ,0.14 x 0.96,0.02 0.82,0.09 0.84,0.08 1, 0 0.80, 0,10 0.72,0.14 0.88,0.06 0.77,0.11 0.80,0.10 0.63,0.18 0.88,0.06 x 0.84,0.08 0.61,0.19 0.63,0.18 0.80,0.10 1, 0 0.93,0.04 0.92,0.04 0.98,0.01 1.00,0.00 0.84,0.08 0.92,0.04 x x 0.76,0.12 0.54,0.23 0.56,0.22 0.72,0.14 0.93,0.04 1, 0 0.84,0.08 0.95,0.02 0.93,0.04 0.91,0.05 0.84,0.08 6 6 x 0.92,0.04 0.70,0.15 0.71,0.14 0.88,0.06 0.92,0.01 0.84,0.08 1, 0 0.89,0.05 0.92,0.04 0.75,0.12 1.00,0.00 x 0.81,0.09 0.59,0.20 0.61,0.20 0.77,0.11 0.98,0.01 0.95,0.27 0.89,0.05 1, 0 0.98,0.01 0.86,0.07 0.89,0.05 x 0.84,0.08 0.61,0.19 0.63,0.18 0.80,0.10 1.00,0.00 0.93,0.04 0.92,0.04 0.98,0.01 1, 0 0.84,0.08 0.92,0.04 x 0.67,0.16 0.45,0.27 0.47,0.27 0.63,0.18 0.84,0.08 0.91,0.05 0.75,0.12 0.86,0.07 0.84,0.08 1, 0 0.75,0.12 x 0.92,0.04 0.70,0.15 0.71,0.14 0.88,0.06 0.92,0.04 0.84,0.08 1.00,0.00 0.89,0.05 0.92,0.04 0.75,0.12 1, 0 11 78 A. ANITHA AND D. P. ACHARJYA moderate and low. Alike, the attribute values of other attributes are also classified. U/IR (α,β) ={{x , x , x , x , x , x , x , x , x , x , x , x , x , x }, {x , x }, {x , x }, {x , x }} 1 2 3 4 5 7 10 11 12 14 15 17 19 20 8 9 13 16 6 18 U/IR ={{x , x , x , x , x }, {x , x , x , x }, {x , x , x , x , x , x }, {x , x , x , x }, 1 4 13 15 18 5 6 8 9 2 3 12 14 17 20 7 11 16 19 (α,β) {x }} U/IR ={{x , x , x , x , x , x , x , x , x }, {x , x , x , x , x }, {x , x }, {x , x }, 1 2 3 4 5 8 17 18 19 6 7 9 10 15 11 20 12 13 (α,β) {x }, {x }} 14 16 U/IR ={{x , x , x , x , x , x , x , x , x , x , x , x , x }, {x , x }, {x , x }, {x , x }, 1 2 3 4 7 8 9 10 15 16 17 19 20 5 12 14 18 6 11 (α,β) {x }} U/IR ={{x , x , x , x , x , x , x , x , x , x , x , x , x , x }, {x , x }, {x , x }, 1 2 3 7 9 10 11 12 13 14 15 17 18 19 6 20 8 16 (α,β) {x , x }} 4 5 U/IR ={{x , x , x , x , x , x , x , x , x , x }, {x , x , x }, {x , x }, {x , x , x , 1 2 4 5 10 12 14 18 19 20 6 16 17 3 9 7 11 13 (α,β) x }} U/IR ={{x , x , x , x , x }, {x , x , x , x , x }, {x , x , x , x , x , x }, {x , x , 1 2 13 18 19 3 6 7 16 17 4 5 9 11 14 15 10 12 (α,β) x }, {x }} 20 8 U/IR ={{x , x , x , x , x , x , x , x }, {x , x , x , x }, {x , x , x , x , x }, {x }, 1 3 10 13 14 15 16 18 4 6 17 20 2 5 7 12 19 9 (α,β) {x }, {x }} 11 8 U/IR ={{x , x , x , x , x }, {x , x }, {x , x , x }, {x , x , x , x }, {x , x , x }, {x , 1 11 12 14 19 2 7 3 9 17 4 13 15 16 5 8 10 6 (α,β) x }, {x }} 18 20 U/IR ={{x , x , x , x , x , x , x , x }, {x }, {x , x , x , x }, {x , x , x , x , x , 1 3 4 5 12 14 15 19 2 6 10 13 18 7 8 9 11 17 (α,β) x }, {x }} 16 20 U/IR ={{x , x , x , x , x , x , x , x , x , x , x , x }, {x , x , x , x }, {x , x , x , 1 3 4 7 11 12 14 15 16 17 18 20 2 10 13 19 5 6 8 (α,β) x }} 9 FUZZY INFORMATION AND ENGINEERING 79 U/IR ={{x , x , x , x , x , x , x , x , x }, {x , x , x , x , x , x }, {x , x , x , x }, 1 2 4 5 6 7 12 14 15 3 8 9 13 18 20 10 16 17 19 (α,β) {x }, {x }} 20 11 U/IR ={{x , x , x , x , x , x }, {x , x , x , x }, {x , x , x , x }, {x , x , x }, {x , 1 2 10 14 17 19 3 9 15 16 4 8 11 20 5 6 7 12 (α,β) x , x }} 13 18 U/IR ={{x , x , x , x }, {x , x , x , x , x }, {x , x , x }, {x }, {x , x }, {x , x }, 1 12 17 20 2 7 9 14 15 3 16 19 4 5 8 6 10 (α,β) {x , x , x }} 11 13 18 U/IR ={{x , x }, {x , x , x , x , x , x , x , x , x , x , x , x , x }, {x , x , x , x , 1 12 2 3 4 5 7 9 10 11 13 14 17 19 20 6 8 16 15 (α,β) x }} Unlike the attribute a , the attribute a is categorised into five categories, namely very 1 2 high, high, moderate, low and very low. Similarly, the attributes a , a , a , a , a , a , a , a , 3 4 5 6 7 8 9 10 a , a , a , a , a arecategorisedinto6,5,4,5,5,6,7,6,3,4,6,7and3categories,respec- 11 12 13 14 15 tively. The maximum number of categories is observed to be 7. Let the categories are very high (Vh), high (H), moderate (M), low (L), very low (Vl), poor (P) and negligible (Ne). This con- denses the quantitative information system into qualitative information system, as shown in Table 7. 5.2. Final Stage of Empirical Study The steps involved in the final process of the empirical study are discussed in this section. Predicting the places for cultivating agricultural crops on real data sets is considered as the main objective of this article. We used back propagation feed-forward neural network (BPNN) method for the investigation taken into consideration. The method is based on minimising the mean square error (MSE) and mean percentile error (MPE). The back prop- agation algorithm as discussed in Section 5.4 is used to train the data set. Based on the input attribute values, y and y are computed as discussed in Equations (3) and (4), in out respectively. Back propagation neural network is a supervised learning technique and so the training process can be terminated by declaring certain conditions. The process terminates if the network has procured the average mean square error (MSE) ≤ 0.5 or the number of pre- defined epochs. Generally, the number of neurons in the hidden layer is identified through trial and error basis based on MSE and MPE to get better performance. The weight coeffi- cient is recorded, so as to identify the effect of the number of hidden neurons acquired to map the input space and the output space. The result of recording shows that the best result is obtained at 17th hidden neurons in a single hidden layer architecture. While preserving the number of neurons as 17 and the learning rate as 0.5, the MSE obtained as 0.188 with the number of epochs as 300. It is also observed that on increasing the number of hidden neurons as much as more than 200 and the number of hidden layers ≥ 2, the combina- tions could not achieve the MSE ≤ 0.188. So, the analysis is restricted to 17 hidden neurons 80 A. ANITHA AND D. P. ACHARJYA Table 7. Qualitative information system of sample dataset Obj. a a a a a a a a a a a a a a a Places 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 x H H Vl L M Vl M L M M M Vl P M L Alangayam x H Vh Vl L M Vl M Vl Vl H H Vl P Ne H Anaicut x H Vh Vl L M H Vl L Ne M M L Vl P H Arakonam x H H Vl L L Vl H M P MMVl L Vh H Arcot x H L Vl Vl L Vl H Vl L M L Vl M H H Gudiyatham x L L L M H M Vl M Vh L L Vl M L M Jolarpet x H M L L M L Vl Vl Vl Vl M Vl M Ne H K V Kuppam x M L Vl L Vh Vh L H L Vl L L L H M Kandeli x ML L L M H H P Ne Vl L L Vl Ne H Kaniyambadi x H Vl L L M Vl Vh L L L H H P LH Katpadi x HMHM M L H Vh M M M Vh L Vl H Kaveripakkam x H Vh M Vl M Vl Vh Vl M M M Vl Vh M L Madhanur x Vh H M Vh M L M L P LHLH Vl H Natrampalli x H Vh Vh H M Vl H L M M M Vl P Ne H Nemili x H H L L M L H L P M M Vl Vl Ne M Pemambattu x Vh M P L Vh M Vl L P Vl M H Vl P M Sholingur x H Vh Vl L M M Vl M Ne Vl M H P M H Thimiri x L H Vl H M Vl M L Vh L M L H Vl M Tirupathur x H M Vl L M Vl M Vl M M H H P P H Vellore x H Vh H L H Vl Vh M H Vh M L L M H Walajahpet Figure 5. Number of hidden nodes using MSE. with a single hidden layer. The results of MSE and MPE against the number of neurons are depicted in Figures 5 and 6, respectively. The training model is then tested with rest nine objects x , x , x , x , x , x , x , x , 12 13 14 15 16 17 18 19 x of qualitative information system presented in Table 7. The validation process is pre- sented in Table 8. From Table 8, it is clear that all objects are correctly classified. Thus, the accuracy of the training process is computed as below Supporting objects 9 Accuracy = = = 100% Total number of objects 9 But, in the experimental study, it is observed that the average classification accuracy of 93.7% is achieved on increasing the number of objects to 2193. The validation process along with an experimental comparative study was carried out in Section 6 to check its viability. FUZZY INFORMATION AND ENGINEERING 81 Figure 6. Number of hidden nodes using MPE. Table 8. Validating the training data Obj. a a a a a a a a a a a a a a a Recorded places Observed Places 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 x H Vh M Vl M Vl Vh Vl M M M Vl Vh M L 8 8 x Vh H M Vh M L M L P LHLH Vl H 4 4 x HVh VhH M Vl H L M M M Vl P Ne H 7 7 x HH L L M L H L P M M Vl Vl Ne M 3 3 x Vh M P L Vh M Vl L P Vl M H Vl P M 5 5 x HVh Vl L M M Vl M Ne Vl M H P M H 6 6 x L H Vl H M Vl M L Vh L M L H Vl M 4 4 x H M Vl L M Vl M Vl M M H H P P H 1 1 x H Vh H L H Vl Vh M H Vh M L L M H 5 5 6. Comparative Analysis and Results Experimental analysis has been carried out to get the efficiency of the proposed model, RSIFASANN. The experiments were conducted with a computer having Intel Pentium Pro- cessor, 8GB RAM, Windows 10 operating system and MATLAB R2008a. For analysis purpose, data are collected from Krishi Vigyan Kendra (KVK), Vellore, India. The data for 4799 villages were collected. But after careful observation, it is identified that 2193 villages are having agriculture crop production as their main occupation. The intuitionistic fuzzy proximity rela- tion is employed on whole data for getting almost equivalence classes. This phase changes the quantitative information system to qualitative information system. Further, the quali- tative data set of 2193 objects are validated with the training model. Additionally, we have chosen a model which integrates Bayesian classification and RSFAS (BCRSFAS) [12]. Also, the proposed model is compared with the previous work of hybridising RSFAS with Neural network as (RSFASANN). We have randomly selected 220 objects and predicted the decision using BCRSFAS and the proposed model RSIFASANN. Further, the number of objects is ran- domly increased by 220. The classification accuracy against both the models was checked. The process is repeated till the whole data set of 2193 objects. The results obtained are pre- sented in Table 9. The average accuracy obtained by the proposed model RSIFASANN is 93.7. The accuracy of model RSIFASANN is higher than the accuracy of RSFASANN and the accuracy of RSFASANN is higher than BCRSFAS. 82 A. ANITHA AND D. P. ACHARJYA Table 9. Comparative analysis and results. Supporting objects Accuracy obtained Objects RSIFASANN RSFASANN BCRSFAS RSIFASANN RSFASANN BCRSFAS 220 203 198 184 0.923 0.900 0.836 440 408 399 370 0.927 0.907 0.841 660 616 611 560 0.933 0.926 0.848 880 823 825 750 0.935 0.938 0.852 1100 1031 1030 935 0.937 0.936 0.850 1320 1236 1240 1140 0.936 0.939 0.864 1540 1443 1443 1369 0.937 0.937 0.889 1760 1656 1645 1578 0.941 0.935 0.897 1980 1875 1874 1779 0.947 0.946 0.898 2193 2090 2076 1975 0.953 0.947 0.901 Average accuracy = 0.937 0.931 0.868 Figure 7. Experimental comparative graph. Thecomparative graphisdepictedinFigure 7 for better visualisation. From the above analysis, it is clear that the classification accuracy of RSIFASANN is higher than the other two models and hence can be considered as a better model. 6.1. N-fold Cross-validation Generally, a classifier is induced from the training data using a learning algorithm. It is a known fact that every classifier is associated with some prediction error. But, the prediction error is unknown, and it is difficult to calculate. At the same time, it is essential to estimate the error from the data while analysing the data in training phase. This error which is esti- mated based on the data considered is called the estimated predicted error. This estimated predictor error is to be validated by means of its variance and bias. In the proposed technique, the data set is divided into training (55%) and testing data (45%). Back propagation algorithm is used as the classifier and the estimated predicted error is calculated based on the means square error and mean percentile error, by training the model with varied number of learning rate. The obtained MSE is observed as 0.188 on training the model with one hidden layer. Even though, the model is tested with more than FUZZY INFORMATION AND ENGINEERING 83 Figure 8. Mean square error of Fold 1 for N = 10. Figure 9. Overall mean square error over all folds for N = 10. one hidden layer, but the results are convincing enough to have a single hidden layer. Thus, out of 2193 data, the training data of 1203 data set were trained using back propagation algorithm and the testing data of 990 are tested with the least means square error. Further, the validation is performed using N-fold cross-validation and the results are presented as follows. In N-fold cross-validation, the data set is divided into N-folds, a classifier is learned using (N– 1) folds, and an error value is calculated by testing the classifier in the remaining fold. Finally, the N-CV estimation of the error is the average value of the errors committed in each fold. Thus, the N-CV error estimator depends on two factors: the training set and the partition into folds. The experimental analysis is performed using R language. The data set contains 15 con- ditional at tributes and one predictive attribute. The data set is divided with various number of folds such as N = 10, 15, 20 and 25. The MSE are recorded with respect to various fold values. A sample of the results computed using R language for N = 10 is given in Figure 8, and the overall MSE is recorded in Figure 9. The mean square error obtained for fold 1 is 2.6, whereas the overall mean square error obtained is 2.44. We have analysed the mean square error and overall mean square error on varying N and is presented in Table 10. It is seen from the Table 10 that the average MSE obtained is greater than the aver- age MSE obtained using neural network. Thus, we can say the validation carried out by hybridising rough computing with neural network provides better accuracy in prediction. 84 A. ANITHA AND D. P. ACHARJYA Table 10. Overall mean square error across various folds Number of folds (N) Overall MSE Observations in test set 10 2.44 99 15 2.43 66 20 2.44 49 25 2.44 39 30 2.43 33 Average MSE 0.2436 7. Conclusion In this paper, we hybridised RSIFAS with neural network for the prediction of unseen asso- ciations of attribute values. The initial process of the proposed model reduces quantitative information system to qualitative information system using RSIFAS. The final process pre- dicts the decision of unseen associations using back propagation neural network. The model is analysed over 20 blocks of Vellore district, Tamil Nadu. The experimental analy- sis depicts that the proposed model attained the average classification accuracy of 93.7%, whereas that of BCRSFAS is 86.8%. It indicates that the proposed model has 6.9% more clas- sification accuracy than BCRSFAS. Additionally, it facilitates the farmers to take decision on the crops to be cultivated on their land. Disclosure statement No potential conflict of interest was reported by the author(s). Notes on contributors Dr. A. Anitha is an Associate Professor in the School of Information Technology at VIT, Vellore, India. She received the MCA degree from Adhi Parasakthi College of Science, Kalavai, Tamil Nadu, India. She has published many international journal and conference papers to her credit. Her research interest includes data mining, fuzzy logic, neural network and rough sets. She is associated with the professional bodies CSI. Dr. D. P. Acharjya is a Professor in the School of Computing Sciences and Engineering at VIT, Vellore, India. He received his MSc from NIT, Rourkela, India; M. Tech. in Computer Science from Utkal Uni- versity, India; and PhD in Computer Science from Berhampur University, India. He has been awarded the Gold Medal in M. Sc.; Eminent Academician Award; Outstanding Educator and Scholar Award; The Best Citizens of India Award; and Bharat Vikas Award from various organizations of India. He has authored 84 international and national journal and conference papers. Besides, he has published 4 books and 17 book chapters with international publishers. In addition, he has edited 7 books with international publishers like CRC Press; Springer; and IGI Global, USA. His research interest includes rough sets, knowledge representation, machine learning, bio-inspired computing, and business intel- ligence. He is associated with many professional bodies, such as ACM, IACSIT, IAENG, CSTA, IRSS, CSI, ISTE, OITS, ISIAM, IMS, and AMTI. References [1] Zadeh LA. Fuzzy sets. Inf Control. 1965;8:338–353. [2] Pawlak Z. Rough sets. Int J Comp Inform Sci. 1982;11:341–356. [3] Pawlak Z. Rough sets – theoretical aspects of reasoning about data. Dordrecht: Kluwer Academic Publishers; 1991. FUZZY INFORMATION AND ENGINEERING 85 [4] De SK. Some aspects of fuzzy sets, rough sets and intuitionistic fuzzy sets [PhD Thesis]. Kharagpur: IIT, India; 1999. [5] Acharjya DP, Tripathy BK. Rough sets on fuzzy approximation spaces and applications to dis- tributed knowledge systems. Int J Artif Intell Soft Comput Inderscience. 2008;1(1):1–14. [6] Acharjya DP, Tripathy BK. Rough sets on intuitionistic fuzzy approximation spaces and knowl- edge representation. Int J Artif Int Comput Res. 2009;1 (1):29–36. [7] Molodstov D. Soft set theory-first results. Comp Math Appl. 1999;37(4/5):19–31. [8] Peters J. Near sets-general theory about nearness of objects. Appl Math Sci. 2007;1:2609–2629. [9] Dubois D, Prade H. Rough fuzzy sets and fuzzy rough set. Int J Gen Syst. 1990;17(2/3):191–209. [10] Liu G. Rough set theory based on two universal sets and its applications. Knowl Base Syst. 2010;23:110–115. [11] Smarandache F. Neutrosophic set – a generalization of the intuitionistic fuzzy set. Int J Pure Appl Math. 2005;24:287–297. [12] Acharjya DP, Roy D, Rahaman AM. Prediction of missing associations using rough computing and Bayesian classification. Int J Intell Syst Appls. 2012;4 (11):1–13. [13] Das TK, Acharjya DP. A decision making model using soft set and rough set on fuzzy approxima- tion spaces. Int J Intel Syst Technol Applic . 2014;13(3):170–186. [14] Anitha A, Acharjya DP. Neural network and rough set hybrid scheme prediction of missing associations. Int J Bioinform Res Appl. 2015;11(6):503–524. [15] Ahn BS, Cho SS, Kim CY. The integrated methodology of rough set theory and artificial neural network for business failure prediction. Expert Syst Appl. 2000;18(2):65–74. [16] Rao DVJ, Mitra P. A rough association rule based approach for class prediction with missing attribute values). Proceedings of the 2nd Indian international Conference on Artificial Intelli- gence; 2005. 20–22. [17] Atanasov KT. Intuitionistic fuzzy sets. Fuzzy Sets Syst. 1986;20:87–96. [18] Rumelhart DE, McClelland JL. Parallel distributed processing: exploration in microstructure of cognition. Cambridge: Foundations MIT Press; 1986. [19] Lippmann RP. An introduction to computing with neural nets. IEEE ASSP Mag. 1987;4(1): 4–22. [20] Acharjya DP. Knowledge extraction from information system using rough computing. In: M Usman, editor. Improving knowledge discovery through the integration of data mining tech- niques. IGI Global, Pennsylvania, USA, 2015, p. 161–182. [21] Hecht Nielsen R. Theory of the backpropagation neural network). Proceedings of the interna- tional Joint Conference on neural networks, 1 (1989), 593–605. [22] Tripathy BK, Acharjya DP. Knowledge mining using ordering rules and rough sets on fuzzy approximation Spaces. Int J Adv Sci Techn. 2010;1(3):41–50.

Journal

Fuzzy Information and EngineeringTaylor & Francis

Published: Jan 2, 2019

Keywords: Almost indiscernible; intuitionistic fuzzy proximity relation; neural network; knowledge discovery; prediction; rough set

References