LSTM DSS Automatism and Dataset Optimization for Diabetes Prediction

Alessandro Massaro; Vincenzo Maritati; Daniele Giannone; Daniele Convertini; Angelo Galiano

doi:10.3390/app9173532

LSTM DSS Automatism and Dataset Optimization for Diabetes Prediction

Massaro, Alessandro;Maritati, Vincenzo;Giannone, Daniele;Convertini, Daniele;Galiano, Angelo 2019-08-28 00:00:00 applied sciences Article LSTM DSS Automatism and Dataset Optimization for Diabetes Prediction Alessandro Massaro * , Vincenzo Maritati , Daniele Giannone, Daniele Convertini and Angelo Galiano Dyrecta Lab srl, Via Vescovo Simplicio 45, 70014 Conversano, Italy * Correspondence: alessandro.massaro@dyrecta.com; Tel.: +39-080-4958477 y This work is an extended version of our research published in 2018 at the conference “AEIT 2018 International Annual Conference” held in Bari, Italy, 3–5 October 2018. Received: 28 June 2019; Accepted: 21 August 2019; Published: 28 August 2019 Featured Application: Implementation of DSS of patient management system based on LSTM for homecare assistance. Abstract: The paper is focused on the application of Long Short-Term Memory (LSTM) neural network enabling patient health status prediction focusing the attention on diabetes. The proposed topic is an upgrade of a Multi-Layer Perceptron (MLP) algorithm that can be fully embedded into an Enterprise Resource Planning (ERP) platform. The LSTM approach is applied for multi-attribute data processing and it is integrated into an information system based on patient management. To validate the proposed model, we have adopted a typical dataset used in the literature for data mining model testing. The study is focused on the procedure to follow for a correct LSTM data analysis by using artiﬁcial records (LSTM-AR-), improving the training dataset stability and test accuracy if compared with traditional MLP and LSTM approaches. The increase of the artiﬁcial data is important for all cases where only a few data of the training dataset are available, as for more practical cases. The paper represents a practical application about the LSTM approach into the decision support systems (DSSs) suitable for homecare assistance and for de-hospitalization processes. The paper goal is mainly to provide guidelines for the application of LSTM neural network in type I and II diabetes prediction adopting automatic procedures. A percentage improvement of test set accuracy of 6.5% has been observed by applying the LSTM-AR- approach, comparing results with up-to-date MLP works. The LSTM-AR- neural network can be applied as an alternative approach for all homecare platforms where not enough training sequential dataset is available. Keywords: LSTM; DSS; diabetes prediction; homecare assistance information system; muti-attribute analysis; artiﬁcial training dataset 1. Introduction A research topic in telemedicine is the predictive diagnostic improved by artiﬁcial intelligence (AI). Dierent open source tools [1–4] such as RapidMiner Studio, Weka, Konstanz Information Miner (KNIME), Orange Canvas, Keras, TensorFlow, and Theano can be applied for this purpose, implementing generic artiﬁcial neural networks (ANN) predicting patient health status. These tools are suitable for decision support systems (DSS) based on artiﬁcial intelligence algorithms [5–13] predicting diagnosis [14–16]. Speciﬁcally in references [5,6,10–13] are discussed how data mining could support hospital and assistance processes, while references [7–9,14–16] provide dierent healthcare applications where artiﬁcial intelligence plays an important role in decision making processes enabled by health status prediction. Accordingly, with homecare assistance facilities for de-hospitalization processes, Appl. Sci. 2019, 9, 3532; doi:10.3390/app9173532 www.mdpi.com/journal/applsci Appl. Sci. 2019, 9, 3532 2 of 22 Appl. Sci. 2019, 9, x 2 of 23 de-hospitalization processes, the use of certified smart sensors transmitting data in a cloud network the use of certiﬁed smart sensors transmitting data in a cloud network could remotely control the could remotely control the patients at home [17]. The sensor enabling homecare assistance can be patients at home [17]. The sensor enabling homecare assistance can be implemented into a more implemented into a more complex information hospital system embedding automatic alerting complex information hospital system embedding automatic alerting conditions based on dierent conditions based on different risk levels [18]. In this direction, KNIME workflows can be easily risk levels [18]. In this direction, KNIME workﬂows can be easily interfaced as a Graphical User interfaced as a Graphical User Interface (GUI) to the control room information system, thus allowing Interface (GUI) to the control room information system, thus allowing the connectivity with big data the connectivity with big data systems and timing data process by cron job run managing the systems and timing data process by cron job run managing the multilayer perceptron (MLP) ANN multilayer perceptron (MLP) ANN analyses [19]. In Figure 1a is illustrated an example information analyses [19]. In Figure 1a is illustrated an example information system basic architecture of the MLP system basic architecture of the MLP ANN network linked with the control room and big data ANN network linked with the control room and big data system for homecare assistance [19], and in system for homecare assistance [19], and in Figure 1b schematized the related KNIME workflow by Figure 1b schematized the related KNIME workﬂow by distinguishing the data process phases such as distinguishing the data process phases such as time delay for the workflow execution, python node time delay for the workﬂow execution, python node enabling data input access, data pre-processing, enabling data input access, data pre-processing, data model processing, and reporting [19]. data model processing, and reporting [19]. (b) (a) KNIME WORKFLOW Architecture Cron ANN predictive diagnostic model Homecare smart Cron sensors Cloud DB/Big Data Data output Data input (Big Data connector) Control Room Data pre- Data processing (training and testing) processing Figure 1. (a) Architecture of homecare smart assistance platform based on artificial neural networks Figure 1. (a) Architecture of homecare smart assistance platform based on artiﬁcial neural networks (ANN) data processing [19]; (b) Konstanz Information Miner (KNIME) workflow implementing a (ANN) data processing [19]; (b) Konstanz Information Miner (KNIME) workﬂow implementing a ttraditional raditional ANN ANN m multi-layer ulti-layer p per erc ceptr eptro on n ( (MLP) MLP) [1 [19 9] ]. . The ANN model implemented by workﬂows with objects are user friendly but cannot be easily The ANN model implemented by workflows with objects are user friendly but cannot be easily implemented into Enterprise Resource Planning (ERP) software. For this purpose it is preferable to implemented into Enterprise Resource Planning (ERP) software. For this purpose it is preferable to embed ANN scripts directly into the ERP framework, thus facilitating the DSS platform implementation embed ANN scripts directly into the ERP framework, thus facilitating the DSS platform and execution. For this purpose, it is preferable to adopt the python language, which can be easily implementation and execution. For this purpose, it is preferable to adopt the python language, embedded in dierent ERP frameworks. In previous literature the Long Short-Term Memory (LSTM) which can be easily embedded in different ERP frameworks. In previous literature the Long neural network has been adopted for predictive diagnostics, assuring good performance results [20–22]. Short-Term Memory (LSTM) neural network has been adopted for predictive diagnostics, assuring Following this direction, the traditional ANN MLP prediction network, applied in the work [19] using good performance results [20–22]. Following this direction, the traditional ANN MLP prediction a single attribute labeling, has been substituted by an LSTM neural network based on a multi-attribute network, applied in the work [19] using a single attribute labeling, has been substituted by an LSTM analysis. The passage from the workﬂow implementation to the python script is necessary in order neural network based on a multi-attribute analysis. The passage from the workflow implementation to properly design a neural network embedded into an ERP platform, potentially enabling data to the python script is necessary in order to properly design a neural network embedded into an ERP processing automatisms. In order to check the performance of the upgraded network has been platform, potentially enabling data processing automatisms. In order to check the performance of processedt the experimental dataset of [23,24], representing a good dataset for testing LSTM neural the upgraded network has been processedt the experimental dataset of [23,24], representing a good network. The experimental dataset [24] has been adopted in the literature for dierent data mining dataset for testing LSTM neural network. The experimental dataset [24] has been adopted in the testing [24–29]. Speciﬁcally in reference [25], the K-means algorithm has been applied for predicting literature for different data mining testing [24–29]. Specifically in reference [25], the K-means diabetes, in reference [26] some authors applied synthetic data in order to balance a machine learning algorithm has been applied for predicting diabetes, in reference [26] some authors applied synthetic dataset model, while references [27–29] have analyzed dierent machine learning algorithms for data in order to balance a machine learning dataset model, while references [27–29] have analyzed diabetes prediction. different machine learning algorithms for diabetes prediction. Concerning data mining algorithms, some researchers focused their attention on the formulation Concerning data mining algorithms, some researchers focused their attention on the of decision tree models for Type 2 Diabetes Mellitus (T2DM) [30]. Other studies analyzed the sensitivity formulation of decision tree models for Type 2 Diabetes Mellitus (T2DM) [30]. Other studies analyzed the sensitivity of Machine Learning Algorithms about self-monitored blood glucose (SMBG) readings [31], thus enhancing the importance to construct a good learning model. The Deep Appl. Sci. 2019, 9, 3532 3 of 22 of Machine Learning Algorithms about self-monitored blood glucose (SMBG) readings [31], thus enhancing the importance to construct a good learning model. The Deep Learning Approach has also been adopted for the prediction of blood glucose levels [32]. Furthermore, data mining algorithms can be applied for prediction and prevention of complications associated with diabetes [33,34]. According to the World health Organization, the number of people with diabetes has risen from 108 million in 1980 to 422 million in 2014, respectively. For this reason, a good DSS could support diagnostic prediction, thus facilitating diabetes care. This paper developed an LSTM neural network suitable for DSS platforms, upgrading the architecture of Figure 1 by adding the following speciﬁcations: - LSTM python script enabling software verticalization and integration in ERP platforms oriented on patient management; - Integration of LSTM neural network into the information system collecting patient information and patient data; - Creation of dierent data models allowing data pre-processing and new facilities oriented on patient management; - Creation of a prediction model based on the simultaneous analysis of multiple attributes; - Adoption of artiﬁcial data in order to improve the training dataset; - Possibility to choose the best prediction models by reading dierent model outputs. 2. Materials and Methods Based on several studies, we found that a commonly used dataset for health data mining was the Pima Indians Diabetes Dataset from the University of California, Irvine (UCI) Machine Learning Database [24–29]. The datasets consist of several medical predictor (independent) variables and one target (dependent) variable, Outcome. Independent variables include the number of pregnancies the patient has had, their BMI, insulin level, age, and so on: PregnanciesNumber (PN): Pregnant number; GlucosePlasma (GP): Glucose concentration (after 2 h of oral glucose tolerance test); BloodPressureDiastolic (BPD): Blood pressure (mm Hg); SkinThicknessTriceps (STT): Skin fold thickness (mm); Insulin2-Hour (I): Serum insulin (mu U/mL); BMIBody (BMI): Mass index (weight in kg/(height in m) ); DiabetesPedigreeFunctionDiabetes (DPFD): Pedigree function; AgeAge (AA): Years old; OutcomeClass (OC): Binary variable (0 indicates the no-diabetes status of 268 samples, and 1 indicates the diabetes status of the remaining 500 cases of the training dataset). In Figure 2 is illustrated the statistic distribution of the above listed attributes plotted by RapidMiner tool. Appl. App Sci. l. Sc 2019 i. 201 , 9 9, , 9 3532 , x 4 of4 23 of 22 (a) (b) (d) (c) (f) (e) (g) (h) (i) Figure 2. (a–i) Attribute dataset statistic distribution. Figure 2. (a–i) Attribute dataset statistic distribution. In general, before processing data by data mining algorithms, it is important to analyze the In general, before processing data by data mining algorithms, it is important to analyze the correlation between attributes in order to choose the less correlated variables: By processing correlation between attributes in order to choose the less correlated variables: By processing strong strong correlated variables, which can be introduced into the system redundancies and calculus sensitivities, which can alter the results and increase the data process error or the prediction error. Appl. Sci. 2019, 9, x 5 of 23 correlated variables, which can be introduced into the system redundancies and calculus sensitivities, which can alter the results and increase the data process error or the prediction error. Appl. Sci. 2019, 9, 3532 5 of 22 These considerations are valid also for LSTM processing. A method to estimate the correlation between variables generating a weights vector based on these correlations is Pearson’s correlation These considerations are valid also for LSTM processing. A method to estimate the correlation between coefficient evaluation. The algorithm calculates this coefficient, which is the covariance of the two variables generating a weights vector based on these correlations is Pearson’s correlation coecient variables divided by the product of their standard deviations [35,36]: evaluation. The algorithm calculates this coecient, which is the covariance of the two variables cov(XY , ) divided by the product of their standard deviations [35,36]:   (1) XY ,  XY cov(X, Y) = (1) X,Y being Cov (X,Y), the covariance of the variables X and Y (σXY), and σX and σY the standard deviation of X Y variable X and Y, respectively. being Cov (X,Y), the covariance of the variables X and Y ( ), and and the standard deviation XY X Y By observing the calculated correction matrix of Table 1 (data processing of the experimental of variable X and Y, respectively. dataset) it is clear that all the attributes are not strongly correlated. By observing the calculated correction matrix of Table 1 (data processing of the experimental dataset) it is clear that all the attributes are not strongly correlated. Table 1. Correlation matrix between experimental dataset attributes. PN GP BPD STT I BMI DPFD AA OC Table 1. Correlation matrix between experimental dataset attributes. PN 1 0.13 0.14 −0.08 −0.07 0.02 −0.03 0.54 0.22 PN GP BPD STT I BMI DPFD AA OC GP 0.13 1 0.15 0.06 0.03 0.22 0.14 0.26 0.47 PN 1 0.13 0.14 0.08 0.07 0.02 0.03 0.54 0.22 BPD 0.14 0.15 1 0.21 0.09 0.28 0.04 0.24 0.07 GP 0.13 1 0.15 0.06 0.03 0.22 0.14 0.26 0.47 STT −0.08 0.06 0.21 1 0.04 0.39 0.18 −0.11 0.07 BPD 0.14 0.15 1 0.21 0.09 0.28 0.04 0.24 0.07 I −0.07 0.33 0.09 0.44 1 0.2 0.19 −0.04 0.13 STT 0.08 0.06 0.21 1 0.04 0.39 0.18 0.11 0.07 BMI 0.02 0.22 0.28 0.39 0.02 1 0.14 0.04 0.29 I 0.07 0.33 0.09 0.44 1 0.2 0.19 0.04 0.13 DPFD −0.03 0.14 0.04 0.18 0.19 0.14 1 0.03 0.17 BMI 0.02 0.22 0.28 0.39 0.02 1 0.14 0.04 0.29 DPFD 0.03 0.14 0.04 0.18 0.19 0.14 1 0.03 0.17 AA 0.54 0.26 0.24 −0.11 −0.04 0.04 0.03 1 0.24 AA 0.54 0.26 0.24 0.11 0.04 0.04 0.03 1 0.24 OC 0.22 0.47 0.07 0.07 0.1 0.29 0.17 0.24 1 OC 0.22 0.47 0.07 0.07 0.1 0.29 0.17 0.24 1 A first check of correlation can also be performed by directly observing the 2D plots between a A ﬁrst check of correlation can also be performed by directly observing the 2D plots between couple of variables. By focusing the attention on the OutcomeClass variable indicating diabetic a couple of variables. By focusing the attention on the OutcomeClass variable indicating diabetic status, it was evident from Figures 3–5 that generally the classes 1 and 0 were not distinguished in status, it was evident from Figures 3–5 that generally the classes 1 and 0 were not distinguished in the function of the other variables (data overlapping). This confirmed that the results found in the the function of the other variables (data overlapping). This conﬁrmed that the results found in the correlation matrix and provided information about samples dispersion. correlation matrix and provided information about samples dispersion. Figure 3. Outcome versus DiabetesPedigree function. Figure 3. Outcome versus DiabetesPedigree function. Appl. Sci. 2019, 9, x 6 of 23 Appl. Sci. 2019, 9, 3532 6 of 22 Appl. Sci. 2019, 9, x 6 of 23 Figure 4. Outcome versus Insulin function. Figure 4. Outcome versus Insulin function. Figure 4. Outcome versus Insulin function. Figure 5. Outcome versus BMI function. Figure 5. Outcome versus BMI function. The prediction results about the outcome variable (labeled variable) were performed by the LSTM Figure 5. Outcome versus BMI function. The prediction results about the outcome variable (labeled variable) were performed by the neural network. The LSTM basic architecture [37] was composed of a cell, an input gate, an output gate, LSTM neural network. The LSTM basic architecture [37] was composed of a cell, an input gate, an and a forget gate. Each cell recalled values over arbitrary time intervals, besides the 3 gates regulated The prediction results about the outcome variable (labeled variable) were performed by the output gate, and a forget gate. Each cell recalled values over arbitrary time intervals, besides the 3 the ﬂow of information into and out of the cell. In Figure 6 it has ditched a scheme of the LSTM neural LSTM neural network. The LSTM basic architecture [37] was composed of a cell, an input gate, an gates regulated the flow of information into and out of the cell. In Figure 6 it has ditched a scheme of network cell where the input (input activation at the time step t i ), output (output activation at the output gate, and a forget gate. Each cell recalled values over arbitrary time intervals, besides the 3 the LSTM neural network cell where the input (input activation at the time step t it), output (output time step t o ), and forget (forget activation at the time step t f ) gates behaved as neuron computing in t t gates regulated the flow of information into and out of the cell. In Figure 6 it has ditched a scheme of activation at the time step t ot), and forget (forget activation at the time step t ft) gates behaved as a feed-forward or multi-layer neural network: The gates calculated their activations at time step t by the LSTM neural network cell where the input (input activation at the time step t it), output (output neuron computing in a feed-forward or multi-layer neural network: The gates calculated their considering the activation of the memory cell C at time step t-1. More details about the LSTM neural activation at the time step t ot), and forget (forget activation at the time step t ft) gates behaved as activations at time step t by considering the activation of the memory cell C at time step t-1. More network models are in the script comments of Appendix A. neuron computing in a feed-forward or multi-layer neural network: The gates calculated their details about the LSTM neural network models are in the script comments of Appendix A. activations at time step t by considering the activation of the memory cell C at time step t-1. More details about the LSTM neural network models are in the script comments of Appendix A. Appl. Sci. 2019, 9, 3532 7 of 22 Appl. Sci. 2019, 9, x 7 of 23 LSTM calculation cell Output t Gate Input Gate cell t h   Forget Gate Figure 6. Long short-term memory (LSTM) calculation cell (symbol x represents the multiplication Figure 6. Long short-term memory (LSTM) calculation cell (symbol x represents the multiplication operator between inputs, and represents the application of a dierentiable function). operator between inputs, and ∫ represents the application of a differentiable function). The output parameters indicating the LSTM performance are the model accuracy, the model The output parameters indicating the LSTM performance are the model accuracy, the model loss and the Receiver Operating Characteristic (ROC) curve indicating the Area under the ROC loss and the Receiver Operating Characteristic (ROC) curve indicating the Area under the ROC Curve—AUC—(performance indicator). Loss value deﬁnes how well the LSTM neural network model Curve—AUC—(performance indicator). Loss value defines how well the LSTM neural network behaves after each iteration of optimization (ideally, one would expect the reduction of loss after each, model behaves after each iteration of optimization (ideally, one would expect the reduction of loss or several, iterations). The accuracy parameter is deﬁned as: after each, or several, iterations). The accuracy parameter is defined as: Number o f correct predictions TP + TN Number of correct predictions TP TN Accuracy = = (2) Accuracy (2) Total number o f predictions TP + TN + FP + FN Total number of predictions TP TN FP FN being TP = True Positives, TN = True Negatives, FP = False Positives, and FN = False Negatives. being TP = True Positives, TN = True Negatives, FP = False Positives, and FN = False Negatives. The loss function is a binary cross-entropy used for the problems involving yes/no (binary) The loss function is a binary cross-entropy used for the problems involving yes/no (binary) decisions. For instance, in multi-label problems, where an example can belong simultaneously to decisions. For instance, in multi-label problems, where an example can belong simultaneously to multiple classes, the model tries to decide for each class whether the example belongs to that class or multiple classes, the model tries to decide for each class whether the example belongs to that class or not. This performance indicator is estimated as: not. This performance indicator is estimated as: Loss y, y   y log y  1 y  log 1 y         (3)   Loss y, y p =  y log y p+ (1 y) log 1 y p (3) p p p i0 i=0 where yp is the predicted value. where y is the predicted value. As calculation tools have been adopted Keras API and TensorFlow library: Keras is a high-level As calculation tools have been adopted Keras API and TensorFlow library: Keras is a high-level API suitable for building and training deep learning models (LSTM), and TensorFlow is a free and API suitable for building and training deep learning models (LSTM), and TensorFlow is a free and open-source software library for dataflow and differentiable programming across a range of tasks. open-source software library for dataﬂow and dierentiable programming across a range of tasks. 3. Results 3. Results In this section are shown the LSTM neural network results by enhancing some aspects of model In this section are shown the LSTM neural network results by enhancing some aspects of model consistency in function of the training and testing dataset percentage used for the calculation. consistency in function of the training and testing dataset percentage used for the calculation. Training and Testing Dataset Training and Testing Dataset The training and the testing dataset were randomly extracted from the whole dataset made by The training and the testing dataset were randomly extracted from the whole dataset made by 768 records. This allows a decrease in the error calculation of the LSTM network by limiting data 768 records. This allows a decrease in the error calculation of the LSTM network by limiting data redundancy and consecutively data correlation and sensitivity. Table 2 illustrates a table extracted redundancy and consecutively data correlation and sensitivity. Table 2 illustrates a table extracted from output results indicating the diabetic outcome prediction, where the predicted OC is the output from output results indicating the diabetic outcome prediction, where the predicted OC is the output and the other listed variables are the input testing attributes. and the other listed variables are the input testing attributes. Appl. Sci. 2019, 9, x 8 of 23 Appl. Sci. 2019, 9, x 8 of 23 Appl. Sci. 2019, 9, 3532 8 of 22 Table 2. Example of predicted outcomes (diabetes prediction): OC is the labeled class. Table 2. Example of predicted outcomes (diabetes prediction): OC is the labeled class. PN GP BPD STT I BMI DPFD AA OC (Predicted) Table 2. Example of predicted outcomes (diabetes prediction): OC is the labeled class. PN GP BPD STT I BMI DPFD AA OC (Predicted) 6 148 72 35 0 33.6 0.627 50 1 PN 6 GP 148 BPD 72 STT 35 0 I 33.BMI 6 0.62 DPFD 7 50 AA OC1(Predicted) 1 85 66 29 0 26.6 0.351 31 0 1 85 66 29 0 26.6 0.351 31 0 6 148 72 35 0 33.6 0.627 50 1 8 183 64 0 0 23.3 0.672 32 1 1 8 85183 66 64 29 0 0 0 23.26.6 3 0.670.351 2 32 31 1 0 1 89 66 23 94 28.1 0.167 21 0 8 183 64 0 0 23.3 0.672 32 1 1 89 66 23 94 28.1 0.167 21 0 0 137 40 35 168 43.1 2.288 33 1 1 89 66 23 94 28.1 0.167 21 0 0 137 40 35 168 43.1 2.288 33 1 5 116 74 0 0 25.6 0.201 30 0 0 137 40 35 168 43.1 2.288 33 1 5 116 74 0 0 25.6 0.201 30 0 3 78 50 32 88 31.0 0.248 26 1 5 116 74 0 0 25.6 0.201 30 0 3 78 50 32 88 31.0 0.248 26 1 3 78 50 32 88 31.0 0.248 26 1 10 115 0 0 0 35.3 0.134 29 0 10 10 115 115 0 0 0 0 0 0 35.35.3 3 0.130.134 4 29 29 0 0 2 197 70 45 543 30.5 0.158 53 1 2 197 70 45 543 30.5 0.158 53 1 2 197 70 45 543 30.5 0.158 53 1 8 125 96 0 0 0 0.232 54 1 8 125 96 0 0 0 0.232 54 1 8 125 96 0 0 0 0.232 54 1 In order to estimate, the outcome prediction has been applied to the LSTM neural network by In Inor oder rderto toestimate, estimate,the theoutcome outcome pr pr ediction edictionhas has been beenapplied appliedto tothe the LSTM LSTMneural neuralnetwork network by by changing the partitioning between experimental and training dataset. Different calculations have changing changing the the partitioning partitioning between betweenexperimental experimentaland andtraining training dataset. dataset.Di Di fer fer ent ent calculations calculations have have been performed by changing the testing dataset percentage. In particular, Figures 7–11 illustrate the been beenperformed performedby by changing changingthe the testing testing dataset dataset per pecentage. rcentage.In Inparticular particula,r,Figur Figure ess7 7 –– 11 11illustrate illustratethe the accuracy the losses and the ROC curve of the case of testing dataset percentage of 5%, 10%, 15%, accuracy accuracy the the losses losses and anthe d th ROC e ROC curve curv of e the of th case e ca of se testing of testi dataset ng data per set centage percent of age 5%, of10%, 5%, 15%, 10%, 20%, 15%, 20%, and 25%, respectively. and 20% 25%, , andr espectively 25%, respec . tively. (a) (b) (c) (a) (b) (c) False positive rate False positive rate Figure 7. LSTM results (training dataset = 95%, testing dataset = 5%): (a) Model accuracy versus Figure 7. LSTM results (training dataset = 95%, testing dataset = 5%): (a) Model accuracy versus Figure 7. LSTM results (training dataset = 95%, testing dataset = 5%): (a) Model accuracy versus epochs; (b) model loss versus epochs; (c) receiver operating characteristics (ROC) curve. epochs; (b) model loss versus epochs; (c) receiver operating characteristics (ROC) curve. epochs; (b) model loss versus epochs; (c) receiver operating characteristics (ROC) curve. (a) (b) (c) (a) (b) (c) False positive rate False positive rate Figure 8. LSTM results (training dataset = 90%, testing dataset = 10%): (a) Model accuracy versus Figure 8. LSTM results (training dataset = 90%, testing dataset = 10%): (a) Model accuracy versus epochs; (b) model loss versus epochs; (c) ROC curve. Figure 8. LSTM results (training dataset = 90%, testing dataset = 10%): (a) Model accuracy versus epochs; (b) model loss versus epochs; (c) ROC curve. epochs; (b) model loss versus epochs; (c) ROC curve. True positive rate True positive rate TrueTrue posi t pos ivei t riat ve e rate Appl. Sci. 2019, 9, x 9 of 23 Appl. Sci. 2019, 9, x 9 of 23 (a) (b) (c) Appl. Sci. 2019, 9, 3532 9 of 22 Appl. Sci. 2019, 9, x 9 of 23 (b) (a) (c) (b) (a) (c) False positive rate False positive rate Figure 9. LSTM results (training dataset = 85%, testing dataset = 15%): (a) Model accuracy versus epochs; (b) model loss versus epochs; (c) ROC curve. False positive rate Figure 9. LSTM results (training dataset = 85%, testing dataset = 15%): (a) Model accuracy versus Figure epochs; 9. (b LSTM ) mode results l loss v (training ersus epo dataset chs; (c) = ROC 85%, curv testing e. dataset = 15%): (a) Model accuracy versus Figure 9. LSTM results (training dataset = 85%, testing dataset = 15%): (a) Model accuracy versus (b) (a) epochs; (b) model loss versus epochs; (c) ROC curve. (c) epochs; (b) model loss versus epochs; (c) ROC curve. (b) (a) (c) (b) (a) (c) Figure 10. LSTM results (training dataset = 80%, testing dataset = 20%): (a) Model accuracy versus Figure epochs; ( 10.bLSTM ) moderl esults loss v(training ersus epodataset chs; (c) ROC = 80%, curv testing e. dataset = 20%): (a) Model accuracy versus Figure 10. LSTM results (training dataset = 80%, testing dataset = 20%): (a) Model accuracy versus epochs; (b) model loss versus epochs; (c) ROC curve. epochs; (b) model loss versus epochs; (c) ROC curve. Figure 10. LSTM results (training dataset = 80%, testing dataset = 20%): (a) Model accuracy versus (b) (a) (c) epochs; (b) model loss versus epochs; (c) ROC curve. (b) (a) (c) (b) (a) (c) False positive rate False positive rate Figure 11. LSTM results (training dataset = 75%, testing dataset = 25%): (a) Model accuracy versus Figure 11. LSTM results (training dataset = 75%, testing dataset = 25%): (a) Model accuracy versus epochs; (b) model loss versus epochs; (c) ROC curve. False positive rate epochs; (b) model loss versus epochs; (c) ROC curve. Figure 11. LSTM results (training dataset = 75%, testing dataset = 25%): (a) Model accuracy versus The best convergence was observed for the model accuracy of Figure 10, thus conﬁrming that a epochs; (b) model loss versus epochs; (c) ROC curve. The best convergence was observed for the model accuracy of Figure 10, thus confirming that a good balancing between test and train model was achieved (case of testing dataset of 20%). Evident Figure 11. LSTM results (training dataset = 75%, testing dataset = 25%): (a) Model accuracy versus good balancing between test and train model was achieved (case of testing dataset of 20%). Evident overﬁtting was observed in the model accuracy of Figure 9 related to 15% of testing dataset (no good epochs; (b) model loss versus epochs; (c) ROC curve. The best convergence was observed for the model accuracy of Figure 10, thus confirming that a overfitting was observed in the model accuracy of Figure 9 related to 15% of testing dataset (no good balancing of parameters). good balancing between test and train model was achieved (case of testing dataset of 20%). Evident balancing of parameters). The best convergence was observed for the model accuracy of Figure 10, thus confirming that a From the ROC curves can be calculated the AUC values. overfitting was observed in the model accuracy of Figure 9 related to 15% of testing dataset (no good From the ROC curves can be calculated the AUC values. gooSummarized d balancing bet inwee Table n te 3st ar ae nd the tra riesults n modof el the wasAUC, achiev accuracy ed (case , o and f testi loss ng of dathe taset adopted of 20%).models, Evident balancing of parameters). Summarized in Table 3 are the results of the AUC, accuracy, and loss of the adopted models, overfitting was observed in the model accuracy of Figure 9 related to 15% of testing dataset (no good where the green color indicates a better result. From the ROC curves can be calculated the AUC values. where the green color indicates a better result. balancing of parameters). Summarized in Table 3 are the results of the AUC, accuracy, and loss of the adopted models, Table 3. LSTM neural network model and decision support systems (DSS) reading automatism: area From the ROC curves can be calculated the AUC values. where the green color indicates a better result. Table 3. LSTM neural network model and decision support systems (DSS) reading automatism: area under the curve (AUC), accuracy, and loss results. Summarized in Table 3 are the results of the AUC, accuracy, and loss of the adopted models, under the curve (AUC), accuracy, and loss results. where the green color indicates a better result. Table 3. LSTM neural network model and decision support systems (DSS) reading automatism: area Testing Samples 5% 10% 15% 20% 25% under the curve (AUCT ), e ast cciura ng cS ya , a mp nd llo es ss res 5% ult s. 10% 15% 20% 25% AUC % 87.7 87 83.9 82 86.7 Table 3. LSTM neural network model and decision support systems (DSS) reading automatism: area AUC % 87.7 87 83.9 82 86.7 Accuracy % 75 73 70 75 76 Testing Samples 5% 10% 15% 20% 25% under the curve (AUC), accuracy, and loss results. A Loss ccura %cy % 100 75 100 73 70 70 55 75 65 76 AUC % 87.7 87 83.9 82 86.7 Testing Samples 5% 10% 15% 20% 25% Accuracy % 75 73 70 75 76 AUC % 87.7 87 83.9 82 86.7 Accuracy % 75 73 70 75 76 True positivTrue e rat e positive rate True positive rate True positiTrue ve rat pos e itive rate True positive rate Appl. Sci. 2019, 9, 3532 10 of 22 The red and the green colors refer to values above or below thresholds considered valid for LSTM outputs. Speciﬁcally, the following thresholds have been considered: 86% for AUC %, 75% for the accuracy, and 60% for loss. The thresholds could be integrated into an automatic procedure able to select the best model to apply. In Appendix A is listed the python script used for the testing. The LSTM approach has been improved by implementing a new approach to the training dataset construction based on artiﬁcial data creation (LSTM artiﬁcial records—AR—). In the case of 20% of the testing dataset characterized by the best compromise between accuracy and loss parameter has created a new training dataset following these criteria: - Choose the attributes characterized by a higher correlation if compared with other attributes (in the case of study are insulin correlated with glucose, and skin thickness correlated with BMI); - Split the dataset for patients having diabetes or not (ﬁrst partition); - The ﬁrst partition has been furthermore split by considering the age (second partition); - The second partition is then split into a third one representing pregnant women (third partition); - Change of the correlated attributes by a low quantity of the values couple glucose and insulin (by increasing insulin is decreased the glucose of the same entity in order to balance the parameter variation), and skin thickness and BMI of the same person belonging to the same partition. The goal of the proposed criteria is to generate artiﬁcial records improving the training dataset stability and test accuracy. The increase in artiﬁcial data is important for all cases where only few data of the training dataset are available, as for more practical cases. In the case of this study, a training dataset has been created of 10,000 records, where only 768 are real. The cross validation has been performed on MLP traditional methods, and on LSTM using artiﬁcial records—AR—(LSTM-AR-). In Table 4 a benchmarking performed by the comparison of the test set accuracy parameter is provided between traditional MLP [38], LSTM traditional algorithm, and the innovative LSTM-AR-approach. Table 4. Cross validation of results. Method Test Set Accuracy % MLP 77.5 [38] LSTM 75 LSTM-AR- 84 Observing the comparison, it is evident an eciency increase of the LSTM-AR- of 9% if compared with the LSTM traditional approach, and of 6.5% if compared with the MLP method optimized for diabetes prediction model [38]. Figures 12–14 illustrate the accuracy, the loss, and the ROC curve of the LSTM-AR- outputs. Appl. Sci. 2019, 9, x 11 of 23 Appl. Sci. 2019, 9, x 11 of 23 Appl. Sci. 2019, 9, 3532 11 of 22 Figure 12. LSTM-AR- results (training dataset = 80%, testing dataset = 20%): Model Accuracy versus epochs. Figure 12. LSTM-AR- results (training dataset = 80%, testing dataset = 20%): Model Accuracy Figure 12. LSTM-AR- results (training dataset = 80%, testing dataset = 20%): Model Accuracy versus versus epochs. epochs. Appl. Sci. 2019, 9, x 12 of 23 Figure 13. LSTM-AR- results (training dataset = 80%, testing dataset = 20%): Model Loss versus epochs. Figure 13. LSTM-AR- results (training dataset = 80%, testing dataset = 20%): Model Loss versus epochs. Figure 13. LSTM-AR- results (training dataset = 80%, testing dataset = 20%): Model Loss versus epochs. False positive rate Figure 14. LSTM-AR- results (training dataset = 80%, testing dataset = 20%): ROC curve. Figure 14. LSTM-AR- results (training dataset = 80%, testing dataset = 20%): ROC curve. In particular, the model accuracy of Figure 12 proves that a good parameter balancing is achieved in terms of convergence, and no overfitting is observed. Table 5 reports the comparison between the traditional LSTM approach and LSTM-AR- one, where it is observed that there is an efficiency improvement performed by LSTM-AR-. Table 5. LSTM/LSTM-AR models and DSS reading automatism: AUC, accuracy and loss results. Testing Samples LSTM (20%) LSTM-AR (20%) AUC % 82 89 Accuracy % 75 84 Loss % 55 50 For the other testing dataset (5%, 10%, 15%, 25%) the same increase/decrease percentage has been observed as in the case of Table 5. 4. Discussion The proposed results allow us to define guidelines to adopt for LSTM data processing in general for data analysis in health applications using a generic dataset. The main steps are summarized as follows: - Calculation of correlation matrix (analysis of correlation and weights between variables); - Check of 2D variable functions (check of samples dispersion); - Calculation of LSTM prediction of diabetic outcomes by changing the partitioning between the testing and the training dataset; - Choice procedures of the best LSTM model. In order to apply correctly the LSTM, one approach is to balance both the indicators loss and accuracy. By observing Table 3, the case of the training dataset of 20% represents a good case of this balancing but allows a relative low AUC value if compared with the other cases. For this purpose, it is important to compare the outputs of the model with the case of good AUC performance related to the cases of testing samples of 5%, 10%, and 25%. This “cross comparison” will facilitate a better understanding of which samples can be classified in false positive or false negative classes. Observing correlation matrix results of Table 1, we note that GlucosePlasma (GP) and OutcomeClass (OC) are correlated by a factor of 0.47, and PregnanciesNumber (PN) and AgeAge (AA) are correlated by a factor 0.57. For this purpose, these attributes could contribute negatively to the model convergence and for AUC values. In other dataset cases, the correlations between attributes True positive rate Appl. Sci. 2019, 9, 3532 12 of 22 In particular, the model accuracy of Figure 12 proves that a good parameter balancing is achieved in terms of convergence, and no overﬁtting is observed. Table 5 reports the comparison between the traditional LSTM approach and LSTM-AR- one, where it is observed that there is an eciency improvement performed by LSTM-AR-. Table 5. LSTM/LSTM-AR models and DSS reading automatism: AUC, accuracy and loss results. Testing Samples LSTM (20%) LSTM-AR (20%) AUC % 82 89 Accuracy % 75 84 Loss % 55 50 For the other testing dataset (5%, 10%, 15%, 25%) the same increase/decrease percentage has been observed as in the case of Table 5. 4. Discussion The proposed results allow us to deﬁne guidelines to adopt for LSTM data processing in general for data analysis in health applications using a generic dataset. The main steps are summarized as follows: - Calculation of correlation matrix (analysis of correlation and weights between variables); - Check of 2D variable functions (check of samples dispersion); - Calculation of LSTM prediction of diabetic outcomes by changing the partitioning between the testing and the training dataset; - Choice procedures of the best LSTM model. In order to apply correctly the LSTM, one approach is to balance both the indicators loss and accuracy. By observing Table 3, the case of the training dataset of 20% represents a good case of this balancing but allows a relative low AUC value if compared with the other cases. For this purpose, it is important to compare the outputs of the model with the case of good AUC performance related to the cases of testing samples of 5%, 10%, and 25%. This “cross comparison” will facilitate a better understanding of which samples can be classiﬁed in false positive or false negative classes. Observing correlation matrix results of Table 1, we note that GlucosePlasma (GP) and OutcomeClass (OC) are correlated by a factor of 0.47, and PregnanciesNumber (PN) and AgeAge (AA) are correlated by a factor 0.57. For this purpose, these attributes could contribute negatively to the model convergence and for AUC values. In other dataset cases, the correlations between attributes can be stronger by adding further complexity to the LSTM output analysis. For this reason, it is important to compare the results of dierent models in order to ﬁnd the best reading procedure involving: - The extraction of outliers related to wrong measurements and to neglect from the training and testing dataset; - The combined analysis of the therapeutic plan of the monitored patient; - The analysis of possible failures of the adopted sensors; - A dynamical update of the training model by changing anomalous data records; - The digital traceability of the assistance pattern in order to choose a patient more suitable to construct the training model; - A pre-clustering of patients (data pre-processing performed by combining dierent attributes such as age, pathology, therapeutic plan, etc.). We note that in medical and clinical analysis the AUC is considered as a classiﬁer able to discriminate the capacity of a test (see Table 6) [39]. All the AUC values found during the test are classiﬁed as “moderately accurate test”. In addition, for this reason, it is important to focus the attention on the convergence between Loss and Accuracy parameters. Appl. Sci. 2019, 9, 3532 13 of 22 Table 6. AUC values [39]. 0.5 (50%) < 0.7 (70%) < AUC < 0.5 0.9 (90%) < AUC AUC = 0.5 AUC 0.7 AUC 0.9 AUC = 1 (50%) AUC 1 (100%) (70%) (90%) Classiﬁcation of the Non-informative Moderately Highly accurate discriminating No sense test Inaccurate test Perfect test test accurate test test capacity of a test The sensitivity of the LSTM neural network is then correlated with the speciﬁc used model and with the choosen dataset. The possibility to ﬁnd common data patterns is then important to formulate a correct training dataset. The goal is to perform a preliminary cross-analysis by considering all the patient information, which are collected into a database system (see Appendix B representing the adopted experimental database). The cross-analysis will contribute to creating the best LSTM training model. A good procedure to follow is: - Phase 1: Collecting patient data (by means of a well-structured database system allowing dierent data mining processing); - Phase 2: Pre-clustering and ﬁltering of patient data (construction of a stable training dataset); - Phase 3: Pre-analysis of correlations between attributes and analysis of data dispersions; - Phase 4: Execution of the LSTM neural network algorithm by processing simultaneously dierent attributes (multi-attribute data processing); - Phase 5: Comparison of results by changing the testing dataset; - Phase 6: Choice of the best model to adopt following the analysis of phase 5. We observe that by repeating the calculation of the random testing datasets, same range values are obtained of the plots of Figures 8–15, thus conﬁrming the validity of the result discussion. The limitations and advantages of the proposed study are summarized in the following Table 7: Table 7. Limitations and advantages of the proposed study. Advantages Limitations DSS tool for diabetes prediction ready to use Accurate training dataset Multi attribute analysis Redundancy of data processing (correlated attributes) Presence of positive false and negative false due to Reading procedure of outputs results wrong measurements Choose of the best model according to simultaneous Finding a true compromise of eciency parameter analyses (accuracy, loss, and AUC) values Network having a memory used for the data It is necessary to acquire a correct temporal data processing sequence Powerful approach if compared with ANN MLP High computational cost method Concerning dataset optimization has increased the LSTM performances by adding artiﬁcial data into the training dataset by deﬁning the DSS automatism represented by the ﬂow chart of Figure 15: The LSTM neural network model is applied automatically when the training dataset is constructed with enough data, otherwise a new training dataset will be formulated by artiﬁcial data (LSTM-AR- model) following the criteria discussed in Section 3. The ﬂowchart of Figure 15 summarizes all the concepts discussed in this paper. Appl. Sci. 2019, 9, x 14 of 23 attributes) Presence of positive false and negative false due Reading procedure of outputs results to wrong measurements Choose of the best model according to Finding a true compromise of efficiency simultaneous analyses (accuracy, loss, and AUC) parameter values Network having a memory used for the data It is necessary to acquire a correct temporal data processing sequence Powerful approach if compared with ANN MLP High computational cost method Concerning dataset optimization has increased the LSTM performances by adding artificial data into the training dataset by defining the DSS automatism represented by the flow chart of Figure 15: The LSTM neural network model is applied automatically when the training dataset is constructed with enough data, otherwise a new training dataset will be formulated by artificial data (LSTM-AR- model) following the criteria discussed in Section 3. The flowchart of Figure 15 Appl. Sci. 2019, 9, 3532 14 of 22 summarizes all the concepts discussed in this paper. Start Initial training dataset consruction Correlation analysis No Construction of Is the training dataset constructed with enough data? a new training dataset by artificial data Yes Yes Testing dataset percentage LSTM model variation LSTM AR model Correct accuracy, loss, AUC balancing? No Yes Prediction results Figure 15. Flowchart representing automatisms for LSTM/LSTM-AR- DSS model predicting diabetes. Figure 15. Flowchart representing automatisms for LSTM/LSTM-AR- DSS model predicting diabetes. In order to test the LSTM-AR- algorithm on a time series dataset has been considered the sequential In order to test the LSTM-AR- algorithm on a time series dataset has been considered the dataset of reference [40] (9086 time series data generated by 70 patients). This dataset is suitable for sequential dataset of reference [40] (9086 time series data generated by 70 patients). This dataset is many architectures related to homecare smart assistance platforms. suitable for many architectures related to homecare smart assistance platforms. By observing the results of Table 8, it is clear that LSTM and LSTM-AR- approaches are characterized By observing the results of Table 8, it is clear that LSTM and LSTM-AR- approaches are by the same performances. In particular LSTM and the LSTM-AR- exhibit a percentage improvement characterized by the same performances. In particular LSTM and the LSTM-AR- exhibit a percentage of Accuracy and Loss of 4% if compared with MLP results. improvement of Accuracy and Loss of 4% if compared with MLP results. Table 8. LSTM, LSTM-AR, and MLP models: AUC, accuracy and loss results by considering the dataset found in reference [40]. Testing Samples LSTM (20%) LSTM-AR (20%) MLP (20%) AUC % 91 91 94 Accuracy % 86 86 82 Loss % 10 10 14 In this case, the artiﬁcial records (454,300 artiﬁcial records) have been created by considering the sequential dataset by extracting sub- data set sequences with traditional sliding window approach. The MLP network is optimized for the new performed test (1 hidden layer enabling 30 neurons). Appendix C indicates the adopted MLP network. The adopted LSTM is the recurrent neural network—RNN—described in Appendix A (where sequential datasets will not be considered in structure reshaping). In this last case, the selected epochs number is 200 because over 200 there was no performance improvement. Figures 16 and 17 illustrate two graphs proving this cross validation method [41]. For all the other cases, the choice of the epochs number followed the same criterion. Appl. Sci. 2019, 9, x 15 of 23 Table 8. LSTM, LSTM-AR, and MLP models: AUC, accuracy and loss results by considering the dataset found in reference [40]. Testing Samples LSTM (20%) LSTM-AR (20%) MLP (20%) AUC % 91 91 94 Accuracy % 86 86 82 Loss % 10 10 14 In this case, the artificial records (454,300 artificial records) have been created by considering the sequential dataset by extracting sub- data set sequences with traditional sliding window approach. The MLP network is optimized for the new performed test (1 hidden layer enabling 30 neurons). Appendix C indicates the adopted MLP network. The adopted LSTM is the recurrent neural network—RNN—described in Appendix A (where sequential datasets will not be considered in structure reshaping). In this last case, the selected epochs number is 200 because over 200 there was no performance Appl. Sci. i2019 mpr,o 9v , emen 3532 t. Figures 16 and 17 illustrate two graphs proving this cross validation method [41]. For 15 of 22 all the other cases, the choice of the epochs number followed the same criterion. Appl. Sci. 2019, 9, x 16 of 23 Figure Fig16. ure 1 Accuracy 6. Accuracy plot plotusing using da dataset taset fofound und in r in efer refer ence ence [40]. [40]. Figure Figure 17. 1 Loss 7. Loss plot plousing t using dataset dataset fo found und in in refer refer encence e [40].[ 40]. The illustrated outputs are the average result of 25 trainings. The illustrated outputs are the average result of 25 trainings. As observed in Table 6, the LSTM-AR- approach is characterized by the same performance of As observed in Table 6, the LSTM-AR- approach is characterized by the same performance of the the LSTM method by confirming that it is suitable for all homecare platforms where not enough in LSTM method by conﬁrming that it is suitable for all homecare platforms where not enough in the the training sequential dataset is available. training sequential dataset is available. 5. Conclusions 5. Conclusions The proposed paper shows how important the data sensitivity analysis in LSTM diabetes is, and predictions also considered patient attributes characterized by low correlations. The high The proposed paper shows how important the data sensitivity analysis in LSTM diabetes is, and sensitivity is mainly due to the creation of the training and testing dataset. The research is focused on predictions also considered patient attributes characterized by low correlations. The high sensitivity is the sensitivity analysis versus the testing dataset partitioning, by means of a stable experimental mainly due to the creation of the training and testing dataset. The research is focused on the sensitivity dataset tested in the literature. Following the performed analysis, a useful guideline to execute analysis versus the testing dataset partitioning, by means of a stable experimental dataset tested in the correct data processing and analysis by means of the LSTM neural network algorithm, processing literature. Following the performed analysis, a useful guideline to execute correct data processing different patient attributes, has been formulated. The discussion is mainly focused on the and analysis simulta by neo means us analof ysithe s and LSTM compa neural rison ofnetwork the LSTM algorithm, performance pr iocessing ndicators such dier as ent accpatient uracy, loattributes, ss, and AUC. The study is completed by presenting the python code used for the calculation and has been formulated. The discussion is mainly focused on the simultaneous analysis and comparison database design of an information system providing more information suitable for the data of the LSTM performance indicators such as accuracy, loss, and AUC. The study is completed by pre-processing and for data processing. The structured database is integrated into the DSS presenting the python code used for the calculation and database design of an information system information system oriented on homecare assistance, providing prediction results and different providing more information suitable for the data pre-processing and for data processing. The structured analysis models, and predicted health risks. The choice to use different test set sizes is dictated by database is integrated into the DSS information system oriented on homecare assistance, providing the fact that many datasets are not available with a perfect sequential structure (missing values, not prediction periro esults dical and measuremen dierent ts, analysis human models, measurem and ent pr eedicted rrors, rehealth cords ex risks. changed The , choice etc.), a to nduse are di erent characterized by different dimensions. For these reasons, a criterion has been formulated for a generic dataset by changing the testing size where all the proposed results are the average results of 25 trainings. The work also proposes an innovative approach based on the construction of an efficient training artificial dataset based on the weak variation of correlated attributes. The approach, named LSTM-AR-, can be applied to other applications and dataset different from the diabetes prediction following the same logic improved for the proposed DSS automatism. The LSTM-AR- approach can be adopted for all the platforms characterized by a poor training dataset. Author Contributions: Conceptualization, A.M., and V.M.; methodology, V.M. and A.M.; software, V.M., D.C., D.G.; validation, A.M.; formal Analysis, A.M.; investigation, A.G., and A.M.; resources, A.G.; data curation, A.M.; writing—original draft preparation, A.M.; supervision, A.G. and V.M.; project administration, A.G. Funding: This research received no external funding. Appl. Sci. 2019, 9, 3532 16 of 22 test set sizes is dictated by the fact that many datasets are not available with a perfect sequential structure (missing values, not periodical measurements, human measurement errors, records exchanged, etc.), and are characterized by dierent dimensions. For these reasons, a criterion has been formulated for a generic dataset by changing the testing size where all the proposed results are the average results of 25 trainings. The work also proposes an innovative approach based on the construction of an ecient training artiﬁcial dataset based on the weak variation of correlated attributes. The approach, named LSTM-AR-, can be applied to other applications and dataset dierent from the diabetes prediction following the same logic improved for the proposed DSS automatism. The LSTM-AR- approach can be adopted for all the platforms characterized by a poor training dataset. Author Contributions: Conceptualization, A.M., and V.M.; methodology, V.M. and A.M.; software, V.M., D.C., D.G.; validation, A.M.; formal Analysis, A.M.; investigation, A.G., and A.M.; resources, A.G.; data curation, A.M.; writing—original draft preparation, A.M.; supervision, A.G. and V.M.; project administration, A.G. Funding: This research received no external funding. Acknowledgments: The work has been developed in the frameworks of the project: “Piattaforma B.I. intelligente di management risorse e di monitoraggio costi di assistenza sanitaria ‘Healthcare Assistance Platform: Management and Resources Allocation’”. Authors gratefully thanks the researchers: V. Calati, D. Carella, A. Colonna, R. Cosmo, G. Fanelli, R. Guglielmi, A. Leogrande, A. Lombardi, A. Lorusso, N. Malfettone, F. S. Massari, G. Meuli, L. Pellicani, R. Porﬁdo, D. Suma, F. Tarulli, and E. Valenzano. Conﬂicts of Interest: The authors declare no conﬂict of interest. Appendix A In this appendix is listed the python code used for the check of the adopted LSTM algorithm. # Visualize training history from keras.models import Sequential from keras.layers import LSTM from keras.layers import Dense import matplotlib.pyplot as plt import numpy from sklearn import preprocessing from sklearn.metrics import roc_curve from sklearn.metrics import roc_auc_score from matplotlib import pyplot # random seed (a random seed is ﬁxed) seed = 42 numpy.random.seed(seed) ‘dataset loading(csv format)’ dataset = numpy.loadtxt(“C:/user /pime_indian_paper/dataset/diabetes3.csv”, delimiter = “,”) ‘Dataset Normalization’ normalized = preprocessing.normalize(dataset, norm = ‘max’, axis = 0, copy = True) ‘Partitioning example: 80% as training set and the 20% of sample of the test dataset’ X = normalized[:,0:8] Y = normalized[:,8] ‘Dataset structure: 1 column of eight row: time sequence data format’ ‘We modify the dataset structure so that it has a column with 8 rows instead of a row with 8 columns (structure implementing a temporal sequence). For sequential dataset it is not considered the following reshaping’ X = X.reshape(768, 8, 1) Appl. Sci. 2019, 9, 3532 17 of 22 ‘LSTM model creation’ ‘We will use an LSTM (Long Short Term Memory) network. Recurrent networks take as input not only the example of current input they see, but also what they have previously perceived. The decision taken by a recurrent network at the time t-1 inﬂuences the decision it will reach a moment later in time t: the recurrent networks have two sources of input, the present and the recent past. We will use on each neuron the RELU activation function that ﬂattens the response to all negative values to zero, while leaving everything unchanged for values equal to or greater than zero (normalization)’ model = Sequential() model.add(LSTM(32, input_shape = (8,1), return_sequences = True, kernel_initializer = ‘uniform’, activation = ‘relu’)) model.add(LSTM(64, kernel_initializer = ‘uniform’, return_sequences = True, activation = ‘relu’ )) model.add(LSTM(128, kernel_initializer = ‘uniform’, activation = ‘relu’)) model.add(Dense(256, activation = ‘relu’)) model.add(Dense(128, activation = ‘relu’)) model.add(Dense(64, activation = ‘relu’)) model.add(Dense(16, activation = ‘relu’)) model.add(Dense(1, activation = ‘sigmoid’)) ‘Loss function’ ‘We compile the model using as a NADAM optimizer that combines the peculiarities of the RMSProp optimizer with the momentum concept’ ‘We calculate the loss function through the binary crossentropy’ model.compile(loss = ‘binary_crossentropy’, optimizer = ‘NADAM’, metrics = [‘accuracy’]) model.summary() # Fit the model history = model.ﬁt(X, Y, validation_split = 0.33, epochs = 300, batch_size = 64, verbose = 1) ‘Graphical Reporting’ plt.plot(history.history[‘acc’]) plt.plot(history.history[‘val_acc’]) plt.title(‘model accuracy’) plt.ylabel(‘accuracy’) plt.xlabel(‘epoch’) plt.legend([‘train’, ‘test’], loc = ‘upper left’) plt.saveﬁg(‘accuracy.png’) plt.show() ‘Outputs plotting’ plt.plot(history.history[‘loss’]) plt.plot(history.history[‘val_loss’]) plt.title(‘model loss’) plt.ylabel(‘loss’) plt.xlabel(‘epoch’) plt.legend([‘train’, ‘test’],loc = ‘upper left’) plt.saveﬁg(‘loss.png’) plt.show() ‘model saving’ model.save(‘pima_indian.model’) Appl. Sci. 2019, 9, 3532 18 of 22 ‘Curva ROC Curve’ probs = model.predict_proba(X) probs = probs[:,0] auc = roc_auc_score(Y, probs) print(‘AUC: %.3f’ % auc) fpr, tpr, thresholds = roc_curve(Y, probs) pyplot.plot([0, 1], [0, 1], linestyle = ‘–’) pyplot.plot(fpr, tpr, marker = ‘.’) pyplot.saveﬁg(‘roc.png’) pyplot.show() Appendix B In this appendix section is indicated the whole dataset structure of the information system monitoring patients at home. Figure A1 illustrates the database layout design upgrading the information Appl. Sci. 2019, 9, x 19 of 23 system architecture of Figure 1a. Figure A1. Database structure of the information system oriented on homecare assistance monitoring Figure A1. Database structure of the information system oriented on homecare assistance monitoring and integrating LSTM algorithms. and integrating LSTM algorithms. Below are indicated the main requirements of the designed database enhancing possible facilities. 1. Model for alarm prediction The patient ID, of sex SEX, having pathology pathology, born on BithDay, is examined at home by the id_devices providing measurements which will be stored in the database. The patient therapeutic status is indicated by id-therapia. Alarms is a tuple that contains the relative code_parameter and with the min and max values of the parameter that produced the alarm (thresholds). 2. Predictive model of patient’s health status It is possible to predict the status of patients by applying the LSTM algorithm based on historical data processing of vital parameters dataset. 3. Classification of the adequacy of therapy for each patient who has experienced an alarm All patient with id_parameters_threshold having a value above or below the threshold limit, are involved in a particular therapeutic status identified by id_therap_status, and by particular measures of pathology. Id_therap is the therapy that the patient is following. Every patient with a pathology follows a specific therapeutic program. If the patient’s state of health is recorded as critical, then, it Appl. Sci. 2019, 9, 3532 19 of 22 Below are indicated the main requirements of the designed database enhancing possible facilities. 1. Model for alarm prediction The patient ID, of sex SEX, having pathology pathology, born on BithDay, is examined at home by the id_devices providing measurements which will be stored in the database. The patient therapeutic status is indicated by id-therapia. Alarms is a tuple that contains the relative code_parameter and with the min and max values of the parameter that produced the alarm (thresholds). 2. Predictive model of patient’s health status It is possible to predict the status of patients by applying the LSTM algorithm based on historical data processing of vital parameters dataset. 3. Classiﬁcation of the adequacy of therapy for each patient who has experienced an alarm All patient with id_parameters_threshold having a value above or below the threshold limit, are involved in a particular therapeutic status identiﬁed by id_therap_status, and by particular measures of pathology. Id_therap is the therapy that the patient is following. Every patient with a pathology follows a speciﬁc therapeutic program. If the patient’s state of health is recorded as critical, then, it will be possible to use an LSTM-based program which, based on historical data, will provide support about the adequacy of his therapy. 4. Support for the diagnosis and prognosis of the disease Starting with the analysis of historical data, it is possible to establish the temporal evolution of the pathology. For example, it is possible to identify the patient that is most “similar” to the current patient. The patient id_patient is hospitalized on the ﬁrst day by communicating messages to the operator, who receives documents typology (document_typology), containing ﬁlename (properly processed). The LSTM will provide a diagnostic indication of the pathology and a prognostic on its temporal evolution. 5. Evaluation of the human resources (operators) The patient assistance operations will provide important information about Key Performance Indicators (KPI) of operators. 6. Documents data processing for the development of assisted diagnosis assistances The data processing of all the documents and ﬁle about a patient will allow to optimize the homecare assistance process. 7. Analysis of the relationships between classes Proper association rules allow us to obtain interesting relationships within the database attributes. In this case it is possible to establish relationships of the type: [id_patient, pathology] -> “parameters”, thus supporting the prognostic. 8. Analysis of devices The device records will contain the identiﬁcation number of the device associated with the patient. All data of devices will be contained into tables associated with the patient ID. 9. Inspection of the pharmacological dosage administered to the patient At each patient id_patient is associated with a therapy id_therapy. An important relationship to analyze is: id_patient, id_therapy] -> “alarm”. 10. Real time Criticality analysis The constantly monitored patient conditions can be displayed in real time. Historical measures can be applied in order to predict critical moments. Appl. Sci. 2019, 9, 3532 20 of 22 Appendix C Below are listed the MLP script enabling data processing. model = Sequential() model.add(Dense(93, input_shape = (1200,), activation = ‘relu’)) model.add(Dense(93, activation = ‘relu’)) model.add(Dense(1, activation = ‘relu’)) model.compile(metrics = [‘accuracy’, auroc], optimizer = Nadam(lr = 0.002, schedule_decay = 0.004), loss = ‘mean_squared_error’) model.summary() Below are the reported model summary Layer (type) Output Shape Param # dense_1 (Dense) (None, 93) 111693 dense_2 (Dense) (None, 93) 8742 dense_3 (Dense) (None, 1) 94 Total params: 120,529 Trainable params: 120,529 Non-trainable params: 0 References 1. Wimmer, H.; Powell, L.M. A comparison of open source tools for data science. J. Inf. Syst. Appl. Res. 2016, 9, 4–12. 2. Al-Khoder, A.; Harmouch, H. Evaluating four of the most popular open source and free data mining tools. Int. J. Acad. Sci. Res. 2015, 3, 13–23. 3. Gulli, A.; Pal, S. Deep Learning with Keras- Implement Neural Networks with Keras on Theano and TensorFlow; Birmingham- Mumbai Packt Book: Birmingham, UK, 2017; ISBN 978-1-78712-842-2. 4. Kovalev, V.; Kalinovsky, A.; Kovalev, S. Deep learning with theano, torch, cae, TensorFlow, and deeplearning4j: Which one is the best in speed and accuracy? In Proceedings of the XIII International Conference on Pattern Recognition and Information Processing, Minsk, Belarus, 3–5 October 2016; Belarus State University: Minsk, Belarus, 2016; pp. 99–103. 5. Li, J.-S.; Yu, H.-Y.; Zhang, X.-G. Data mining in hospital information system. In New Fundamental Technologies in Data Mining; Funatsu, K., Ed.; Intech: London, UK, 2011. 6. Goodwin, L.; VanDyne, M.; Lin, S. Data mining issues an opportunities for building nursing knowledge. J. Biomed. Inform. 2003, 36, 379–388. [CrossRef] 7. Belacel, N.; Boulassel, M.R. Multicriteria fuzzy assignment method: A useful tool to assist medical diagnosis. Artif. Intell. Med. 2001, 21, 201–207. [CrossRef] 8. Demšar, J.; Zupan, B.; Aoki, N.; Wall, M.J.; Granchi, T.H.; Beck, J.R. Feature mining and predictive model construction from severe trauma patient’s data. Int. J. Med. Inform. 2001, 36, 41–50. [CrossRef] 9. Kusiak, A.; Dixon, B.; Shital, S. Predicting survival time for kidney dialysis patients: a data mining approach. Comput. Biol. Med. 2005, 35, 311–327. [CrossRef] 10. Yu, H.-Y.; Li, J.-S. Data mining analysis of inpatient fees in hospital information system. In Proceedings of the IEEE International Symposium on IT in Medicine & Education (ITME2009), Jinan, China, 14–16 August 2009. 11. Chae, Y.M.; Kim, H.S. Analysis of healthcare quality indicator using data mining and decision support system. Exp. Syst. Appl. 2003, 24, 167–172. [CrossRef] 12. Morando, M.; Ponte, S.; Ferrara, E.; Dellepiane, S. Deﬁnition of motion and biophysical indicators for home-based rehabilitation through serious games. Information 2018, 9, 105. [CrossRef] 13. Ozcan, Y.A. Quantitative Methods in Health Care Management, 2nd ed.; Josey-Bass: San Francisco, CA, USA, 2009; pp. 10–44. Appl. Sci. 2019, 9, 3532 21 of 22 14. Ghavami, P.; Kapur, K. Artiﬁcial neural network-enabled prognostics for patient health management. In Proceedings of the IEEE Conference on Prognostics and Health Management (PHM), Denver, CO, USA, 18–21 June 2012. 15. Grossi, E. Artiﬁcial neural networks and predictive medicine: A revolutionary paradigm shift. In Artiﬁcial Neural Networks—Methodological Advances and Biomedical Applications, 1st ed.; Suzuki, K., Ed.; InTech: Rijeka, Croatia, 2011; Volume 1, pp. 130–150. 16. Adhikari, N.C.D. Prevention of heart problem using artiﬁcial intelligence. Int. J. Artif. Intell. Appl. 2018, 9, 21–35. [CrossRef] 17. Galiano, A.; Massaro, A.; Boussahel, B.; Barbuzzi, D.; Tarulli, F.; Pellicani, L.; Renna, L.; Guarini, A.; De Tullio, G.; Nardelli, G.; et al. Improvements in haematology for home health assistance and monitoring by a web based communication system. In Proceedings of the IEEE International Symposium on Medical Measurements and Applications MeMeA, Benevento, Italy, 15–18 May 2016. 18. Massaro, A.; Maritati, V.; Savino, N.; Galiano, A.; Convertini, D.; De Fonte, E.; Di Muro, M. A Study of a health resources management platform integrating neural networks and DSS telemedicine for homecare assistance. Information 2018, 9, 176. [CrossRef] 19. Massaro, A.; Maritati, V.; Savino, N.; Galiano, A. Neural networks for automated smart health platforms oriented on heart predictive diagnostic big data systems. In Proceedings of the AEIT 2018 International Annual Conference, Bari, Italy, 3–5 October 2018. 20. Saadatnejad, S.; Oveisi, M.; Hashemi, M. LSTM-based ECG classiﬁcation for continuous monitoring on personal wearable devices. IEEE J. Biomed. Health Inform. 2019. [CrossRef] [PubMed] 21. Pham, T.; Tran, T.; Phung, D.; Venkatesh, S. Predicting healthcare trajectories from medical records: A deep learning approach. J. Biomed. Inform. 2017, 69, 218–229. [CrossRef] [PubMed] 22. Kaji, D.A.; Zech, J.R.; Kim, J.S.; Cho, S.K.; Dangayach, N.S.; Costa, A.B.; Oermann, E.K. An attention based deep learning model of clinical events in the intensive care unit. PLoS ONE 2019, 14, e0211057. [CrossRef] [PubMed] 23. Pima Indians Diabetes Database. Available online: https://gist.github.com/ktisha/ c21e73a1bd1700294ef790c56c8aec1f (accessed on 27 August 2019). 24. Predict the Onset of Diabetes Based on Diagnostic Measures. Available online: https://www.kaggle.com/ uciml/pima-indians-diabetes-database (accessed on 21 June 2019). 25. Wu, H.; Yang, S.; Huang, Z.; He, J.; Wang, X. Type 2 diabetes mellitus prediction model based on data mining. Inform. Med. Unlocked 2018, 10, 100–107. [CrossRef] 26. Luo, M.; Ke Wang, M.; Cai, Z.; Liu, A.; Li, Y.; Cheang, C.F. Using imbalanced triangle synthetic data for machine learning anomaly detection. Comput. Mater. Contin. 2019, 58, 15–26. [CrossRef] 27. Al Helal, M.; Chowdhury, A.I.; Islam, A.; Ahmed, E.; Mahmud, S.; Hossain, S. An optimization approach to improve classiﬁcation performance in cancer and diabetes prediction. In Proceedings of the International Conference on Electrical, Computer and Communication Engineering (ECCE), Cox’sBazar, Bangladesh, 7–9 February 2019. 28. Li, T.; Fong, S. A fast feature selection method based on coecient of variation for diabetics prediction using machine learning. Int. J. Extr. Autom. Connect. Health 2019, 1, 1–11. [CrossRef] 29. Puneet, M.; Singh, Y.A. Impact of preprocessing methods on healthcare predictions. In Proceedings of the 2nd International Conference on Advanced Computing and Software Engineering (ICACSE), Sultanpur, India, 8–9 February 2019. 30. Stranieri, A.; Yatsko, A.; Jelinek, H.F.; Venkatraman, S. Data-analytically derived ﬂexi, le HbA1c thresholds for type 2 diabetes mellitus diagnostic. Artif. Intell. Res. 2016, 5, 111–134. 31. Sudharsan, B.; Peeples, M.M.; Shomali, M.E. Hypoglycemia prediction using machine learning models for patients with type 2 diabetes. J. Diabetes Sci. Technol. 2015, 9, 86–90. [CrossRef] 32. Mhaskar, H.N.; Pereverzyev, S.V.; Van Der Walt, M.D. A deep learning approach to diabetic blood glucose prediction. Front. Appl. Math. Stat. 2017, 3, 1–14. [CrossRef] 33. Contreras, I.; Vehi, J. Artiﬁcial intelligence for diabetes management and decision support: Literature review. J. Med. Internet Res. 2018, 20, 1–24. [CrossRef] 34. Bosnyak, Z.; Zhou, F.L.; Jimenez, J.; Berria, R. Predictive modeling of hypoglycemia risk with basal insulin use in type 2 diabetes: Use of machine learning in the LIGHTNING study. Diabetes Ther. 2019, 10, 605–615. [CrossRef] [PubMed] Appl. Sci. 2019, 9, 3532 22 of 22 35. Massaro, A.; Meuli, G.; Galiano, A. Intelligent electrical multi outlets controlled and activated by a data mining engine oriented to building electrical management. Int. J. Soft Comput. Artif. Intell. Appl. 2018, 7, 1–20. [CrossRef] 36. Myers, J.L.; Well, A.D. Research Design and Statistical Analysis, 2nd ed.; Lawrence Erlbaum: Mahwah, NJ, USA, 2003. 37. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [CrossRef] [PubMed] 38. Mohapatra, S.K.; Mihir, J.K.S.; Mohanty, N. Detection of diabetes using multilayer perceptron. In International Conference on Intelligent Computingand Applications, Advances in Intelligent Systems and Computing; Bhaskar, M.A., Dash, S.S., Das, S., Panigrahi, B.K., Eds.; Springer: Singapore, 2019. 39. Swets, J.A. Measuring the accuracy of diagnostic systems. Science 1988, 240, 1285–1293. [CrossRef] [PubMed] 40. Diabetes Data Set. Available online: https://archive.ics.uci.edu/ml/datasets/Diabetes (accessed on 19 August 2019). 41. Chui, K.T.; Fung, D.C.L.; Lytras, M.D. Predicting at-risk University students in a virtual learning environment via a machine learning algorithm. Comput. Hum. Behav. 2018, in press. [CrossRef] © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Applied Sciences Multidisciplinary Digital Publishing Institute http://www.deepdyve.com/lp/multidisciplinary-digital-publishing-institute/lstm-dss-automatism-and-dataset-optimization-for-diabetes-prediction-aEBuHOlu4P

Loading next page...

References (40)

Kwok Chui, Dennis Fung, Miltiadis Lytras, Tin Lam (2020)
Predicting at-risk university students in a virtual learning environment via a machine learning algorithm
Comput. Hum. Behav., 107
A. Galiano, A. Massaro, Bachir Boussahel, D. Barbuzzi, Francesco Tarulli, L. Pellicani, Lino Renna, A. Guarini, G. Tullio, Giovanni Nardelli, Roberto Bonaduce, C. Minoia, S. Ciavarella, V. Fazio, A. Negri, Caterina Marchionna (2016)
Improvements in haematology for home health assistance and monitoring by a web based communication system
2016 IEEE International Symposium on Medical Measurements and Applications (MeMeA)
E. Grossi (2011)
Artificial Neural Networks and Predictive Medicine: a Revolutionary Paradigm Shift
Menghua Luo, Ke Wang, Zhiping Cai, Anfeng Liu, Yangyang Li, Chak-Fong Cheang (2019)
Using Imbalanced Triangle Synthetic Data for Machine Learning Anomaly Detection
Computers, Materials & Continua
A. Massaro, Giacomo Meuli, A. Galiano (2018)
INTELLIGENT ELECTRICAL MULTI OUTLETS CONTROLLED AND ACTIVATED BY A DATA MINING ENGINE ORIENTED TO BUILDING ELECTRICAL MANAGEMENT
International Journal on Soft Computing, Artificial Intelligence and Applications
H. Wimmer, L. Powell (2016)
A Comparison of Open Source Tools for Data Science
, 9
(2003)
Research Design and Statistical Analysis, 2nd ed.; Lawrence Erlbaum: Mahwah, NJ
Y. Ozcan (2015)
QUANTITATIVE METHODS IN HEALTH CARE MANAGEMENT
M. Morando, S. Ponte, E. Ferrara, S. Dellepiane (2018)
Definition of Motion and Biophysical Indicators for Home-Based Rehabilitation through Serious Games
Inf., 9
Y. Chae, Hye Kim, K. Tark, Hyun Park, S. Ho (2003)
Analysis of healthcare quality indicator using data mining and decision support system
Expert Syst. Appl., 24
N. Belacel, M. Boulassel (2001)
Multicriteria fuzzy assignment method: a useful tool to assist medical diagnosis
Artificial intelligence in medicine, 21 1-3
Deepak Kaji, J. Zech, Jun Kim, Samuel Cho, N. Dangayach, A. Costa, E. Oermann (2019)
An attention based deep learning model of clinical events in the intensive care unit
PLoS ONE, 14
M. Helal, A. Chowdhury, Ashraful Islam, Eshtiak Ahmed, Md. Mahmud, Sabrina Hossain (2019)
An Optimization Approach to Improve Classification Performance in Cancer and Diabetes Prediction
2019 International Conference on Electrical, Computer and Communication Engineering (ECCE)
B. Sudharsan, M. Peeples, M. Shomali (2014)
Hypoglycemia Prediction Using Machine Learning Models for Patients With Type 2 Diabetes
Journal of Diabetes Science and Technology, 9
Nimai Adhikari (2018)
Prevention of Heart Problem Using Artificial Intelligence
International Journal of Artificial Intelligence & Applications, 9
Diabetes Data Set
A. Massaro, Vincenzo Maritati, Nicola Savino, A. Galiano, Daniele Convertini, Emanuele Fonte, M. Muro (2018)
A Study of a Health Resources Management Platform Integrating Neural Networks and DSS Telemedicine for Homecare Assistance
Inf., 9
A. Kusiak, B. Dixon, Shital Shah (2005)
Predicting survival time for kidney dialysis patients: a data mining approach
Computers in biology and medicine, 35 4
Antonio Gullì (2017)
Deep learning with Keras : implement neural networks with Keras on Theano and TensorFlow
Pima Indians Diabetes Database Predict the Onset of Diabetes Based on Diagnostic Measures
P. Misra, A. Yadav (2019)
Impact of Preprocessing Methods on Healthcare Predictions
ICACSE 2019: Proceedings
J. Demšar, B. Zupan, Noriaki Aoki, M. Wall, T. Granchi, J. Beck (2001)
Feature mining and predictive model construction from severe trauma patient's data
International journal of medical informatics, 63 1-2
Hai-Yan Yu, Jing-song Li, Xiong Han, Zhen Hu, Tian-Shu Zhou, Jie Chi, Tao Zheng (2009)
Data mining analysis of inpatient fees in Hospital Information System
2009 IEEE International Symposium on IT in Medicine & Education, 1
L. Goodwin, M. VanDyne, Simon Lin, S. Talbert (2003)
Data mining issues and opportunities for building nursing knowledge
Journal of biomedical informatics, 36 4-5
Saeed Saadatnejad, M. Oveisi, Matin Hashemi (2018)
LSTM-Based ECG Classification for Continuous Monitoring on Personal Wearable Devices
IEEE Journal of Biomedical and Health Informatics, 24
Iván Contreras, J. Vehí (2018)
Artificial Intelligence for Diabetes Management and Decision Support: Literature Review
Journal of Medical Internet Research, 20
S. Hochreiter, J. Schmidhuber (1997)
Long Short-Term Memory
Neural Computation, 9
V. Kovalev, A. Kalinovsky, S. Kovalev (2016)
Deep learning with theano, torch, caffe, tensorflow, and deeplearning4J: which one is the best in speed and accuracy?
Z. Bosnyák, F. Zhou, J. Jiménez, R. Berria (2019)
Predictive Modeling of Hypoglycemia Risk with Basal Insulin Use in Type 2 Diabetes: Use of Machine Learning in the LIGHTNING Study
Diabetes Therapy, 10
J. Swets (1988)
Measuring the accuracy of diagnostic systems.
Science, 240 4857
Wu Han, Shengqi Yang, Zhangqin Huang, Jian He, Xiaoyi Wang (2018)
Type 2 diabetes mellitus prediction model based on data mining
Informatics in Medicine Unlocked, 10
Tengyue Li, S. Fong (2019)
A Fast Feature Selection Method Based on Coefficient of Variation for Diabetics Prediction Using Machine Learning
International Journal of Extreme Automation and Connectivity in Healthcare
Jing-song Li, Hai-Yan Yu, Xiaoguang Zhang (2011)
Data Mining in Hospital Information System
Saumendra Mohapatra, Jagjit Swain, M. Mohanty (2018)
Detection of Diabetes Using Multilayer Perceptron
Advances in Intelligent Systems and Computing
A. Massaro, Vincenzo Maritati, Nicola Savino, A. Galiano (2018)
Neural Networks for Automated Smart Health Platforms oriented on Heart Predictive Diagnostic Big Data Systems
2018 AEIT International Annual Conference
H. Mhaskar, S. Pereverzyev, M. Walt (2017)
A Deep Learning Approach to Diabetic Blood Glucose Prediction
Frontiers Appl. Math. Stat., 3
A. Stranieri, A. Yatsko, H. Jelinek, S. Venkatraman (2015)
Data-analytically derived flexible HbA1c thresholds for type 2 diabetes mellitus diagnostic
Artif. Intell. Res., 5
P. Ghavami, K. Kapur (2012)
Artificial neural network-enabled prognostics for patient health management
2012 IEEE Conference on Prognostics and Health Management
Trang Pham, T. Tran, Dinh Phung, S. Venkatesh (2017)
Predicting healthcare trajectories from medical records: A deep learning approach
Journal of biomedical informatics, 69
Ahmad Al-Khoder, Hazar Harmouch (2015)
Evaluating four of the most popular Open Source and Free Data Mining Tools

Publisher: Multidisciplinary Digital Publishing Institute
Copyright: © 1996-2019 MDPI (Basel, Switzerland) unless otherwise stated Terms and Conditions Privacy Policy
ISSN: 2076-3417
DOI: 10.3390/app9173532
Publisher site: See Article on Publisher Site

Abstract

applied sciences Article LSTM DSS Automatism and Dataset Optimization for Diabetes Prediction Alessandro Massaro * , Vincenzo Maritati , Daniele Giannone, Daniele Convertini and Angelo Galiano Dyrecta Lab srl, Via Vescovo Simplicio 45, 70014 Conversano, Italy * Correspondence: alessandro.massaro@dyrecta.com; Tel.: +39-080-4958477 y This work is an extended version of our research published in 2018 at the conference “AEIT 2018 International Annual Conference” held in Bari, Italy, 3–5 October 2018. Received: 28 June 2019; Accepted: 21 August 2019; Published: 28 August 2019 Featured Application: Implementation of DSS of patient management system based on LSTM for homecare assistance. Abstract: The paper is focused on the application of Long Short-Term Memory (LSTM) neural network enabling patient health status prediction focusing the attention on diabetes. The proposed topic is an upgrade of a Multi-Layer Perceptron (MLP) algorithm that can be fully embedded into an Enterprise Resource Planning (ERP) platform. The LSTM approach is applied for multi-attribute data processing and it is integrated into an information system based on patient management. To validate the proposed model, we have adopted a typical dataset used in the literature for data mining model testing. The study is focused on the procedure to follow for a correct LSTM data analysis by using artiﬁcial records (LSTM-AR-), improving the training dataset stability and test accuracy if compared with traditional MLP and LSTM approaches. The increase of the artiﬁcial data is important for all cases where only a few data of the training dataset are available, as for more practical cases. The paper represents a practical application about the LSTM approach into the decision support systems (DSSs) suitable for homecare assistance and for de-hospitalization processes. The paper goal is mainly to provide guidelines for the application of LSTM neural network in type I and II diabetes prediction adopting automatic procedures. A percentage improvement of test set accuracy of 6.5% has been observed by applying the LSTM-AR- approach, comparing results with up-to-date MLP works. The LSTM-AR- neural network can be applied as an alternative approach for all homecare platforms where not enough training sequential dataset is available. Keywords: LSTM; DSS; diabetes prediction; homecare assistance information system; muti-attribute analysis; artiﬁcial training dataset 1. Introduction A research topic in telemedicine is the predictive diagnostic improved by artiﬁcial intelligence (AI). Dierent open source tools [1–4] such as RapidMiner Studio, Weka, Konstanz Information Miner (KNIME), Orange Canvas, Keras, TensorFlow, and Theano can be applied for this purpose, implementing generic artiﬁcial neural networks (ANN) predicting patient health status. These tools are suitable for decision support systems (DSS) based on artiﬁcial intelligence algorithms [5–13] predicting diagnosis [14–16]. Speciﬁcally in references [5,6,10–13] are discussed how data mining could support hospital and assistance processes, while references [7–9,14–16] provide dierent healthcare applications where artiﬁcial intelligence plays an important role in decision making processes enabled by health status prediction. Accordingly, with homecare assistance facilities for de-hospitalization processes, Appl. Sci. 2019, 9, 3532; doi:10.3390/app9173532 www.mdpi.com/journal/applsci Appl. Sci. 2019, 9, 3532 2 of 22 Appl. Sci. 2019, 9, x 2 of 23 de-hospitalization processes, the use of certified smart sensors transmitting data in a cloud network the use of certiﬁed smart sensors transmitting data in a cloud network could remotely control the could remotely control the patients at home [17]. The sensor enabling homecare assistance can be patients at home [17]. The sensor enabling homecare assistance can be implemented into a more implemented into a more complex information hospital system embedding automatic alerting complex information hospital system embedding automatic alerting conditions based on dierent conditions based on different risk levels [18]. In this direction, KNIME workflows can be easily risk levels [18]. In this direction, KNIME workﬂows can be easily interfaced as a Graphical User interfaced as a Graphical User Interface (GUI) to the control room information system, thus allowing Interface (GUI) to the control room information system, thus allowing the connectivity with big data the connectivity with big data systems and timing data process by cron job run managing the systems and timing data process by cron job run managing the multilayer perceptron (MLP) ANN multilayer perceptron (MLP) ANN analyses [19]. In Figure 1a is illustrated an example information analyses [19]. In Figure 1a is illustrated an example information system basic architecture of the MLP system basic architecture of the MLP ANN network linked with the control room and big data ANN network linked with the control room and big data system for homecare assistance [19], and in system for homecare assistance [19], and in Figure 1b schematized the related KNIME workflow by Figure 1b schematized the related KNIME workﬂow by distinguishing the data process phases such as distinguishing the data process phases such as time delay for the workflow execution, python node time delay for the workﬂow execution, python node enabling data input access, data pre-processing, enabling data input access, data pre-processing, data model processing, and reporting [19]. data model processing, and reporting [19]. (b) (a) KNIME WORKFLOW Architecture Cron ANN predictive diagnostic model Homecare smart Cron sensors Cloud DB/Big Data Data output Data input (Big Data connector) Control Room Data pre- Data processing (training and testing) processing Figure 1. (a) Architecture of homecare smart assistance platform based on artificial neural networks Figure 1. (a) Architecture of homecare smart assistance platform based on artiﬁcial neural networks (ANN) data processing [19]; (b) Konstanz Information Miner (KNIME) workflow implementing a (ANN) data processing [19]; (b) Konstanz Information Miner (KNIME) workﬂow implementing a ttraditional raditional ANN ANN m multi-layer ulti-layer p per erc ceptr eptro on n ( (MLP) MLP) [1 [19 9] ]. . The ANN model implemented by workﬂows with objects are user friendly but cannot be easily The ANN model implemented by workflows with objects are user friendly but cannot be easily implemented into Enterprise Resource Planning (ERP) software. For this purpose it is preferable to implemented into Enterprise Resource Planning (ERP) software. For this purpose it is preferable to embed ANN scripts directly into the ERP framework, thus facilitating the DSS platform implementation embed ANN scripts directly into the ERP framework, thus facilitating the DSS platform and execution. For this purpose, it is preferable to adopt the python language, which can be easily implementation and execution. For this purpose, it is preferable to adopt the python language, embedded in dierent ERP frameworks. In previous literature the Long Short-Term Memory (LSTM) which can be easily embedded in different ERP frameworks. In previous literature the Long neural network has been adopted for predictive diagnostics, assuring good performance results [20–22]. Short-Term Memory (LSTM) neural network has been adopted for predictive diagnostics, assuring Following this direction, the traditional ANN MLP prediction network, applied in the work [19] using good performance results [20–22]. Following this direction, the traditional ANN MLP prediction a single attribute labeling, has been substituted by an LSTM neural network based on a multi-attribute network, applied in the work [19] using a single attribute labeling, has been substituted by an LSTM analysis. The passage from the workﬂow implementation to the python script is necessary in order neural network based on a multi-attribute analysis. The passage from the workflow implementation to properly design a neural network embedded into an ERP platform, potentially enabling data to the python script is necessary in order to properly design a neural network embedded into an ERP processing automatisms. In order to check the performance of the upgraded network has been platform, potentially enabling data processing automatisms. In order to check the performance of processedt the experimental dataset of [23,24], representing a good dataset for testing LSTM neural the upgraded network has been processedt the experimental dataset of [23,24], representing a good network. The experimental dataset [24] has been adopted in the literature for dierent data mining dataset for testing LSTM neural network. The experimental dataset [24] has been adopted in the testing [24–29]. Speciﬁcally in reference [25], the K-means algorithm has been applied for predicting literature for different data mining testing [24–29]. Specifically in reference [25], the K-means diabetes, in reference [26] some authors applied synthetic data in order to balance a machine learning algorithm has been applied for predicting diabetes, in reference [26] some authors applied synthetic dataset model, while references [27–29] have analyzed dierent machine learning algorithms for data in order to balance a machine learning dataset model, while references [27–29] have analyzed diabetes prediction. different machine learning algorithms for diabetes prediction. Concerning data mining algorithms, some researchers focused their attention on the formulation Concerning data mining algorithms, some researchers focused their attention on the of decision tree models for Type 2 Diabetes Mellitus (T2DM) [30]. Other studies analyzed the sensitivity formulation of decision tree models for Type 2 Diabetes Mellitus (T2DM) [30]. Other studies analyzed the sensitivity of Machine Learning Algorithms about self-monitored blood glucose (SMBG) readings [31], thus enhancing the importance to construct a good learning model. The Deep Appl. Sci. 2019, 9, 3532 3 of 22 of Machine Learning Algorithms about self-monitored blood glucose (SMBG) readings [31], thus enhancing the importance to construct a good learning model. The Deep Learning Approach has also been adopted for the prediction of blood glucose levels [32]. Furthermore, data mining algorithms can be applied for prediction and prevention of complications associated with diabetes [33,34]. According to the World health Organization, the number of people with diabetes has risen from 108 million in 1980 to 422 million in 2014, respectively. For this reason, a good DSS could support diagnostic prediction, thus facilitating diabetes care. This paper developed an LSTM neural network suitable for DSS platforms, upgrading the architecture of Figure 1 by adding the following speciﬁcations: - LSTM python script enabling software verticalization and integration in ERP platforms oriented on patient management; - Integration of LSTM neural network into the information system collecting patient information and patient data; - Creation of dierent data models allowing data pre-processing and new facilities oriented on patient management; - Creation of a prediction model based on the simultaneous analysis of multiple attributes; - Adoption of artiﬁcial data in order to improve the training dataset; - Possibility to choose the best prediction models by reading dierent model outputs. 2. Materials and Methods Based on several studies, we found that a commonly used dataset for health data mining was the Pima Indians Diabetes Dataset from the University of California, Irvine (UCI) Machine Learning Database [24–29]. The datasets consist of several medical predictor (independent) variables and one target (dependent) variable, Outcome. Independent variables include the number of pregnancies the patient has had, their BMI, insulin level, age, and so on: PregnanciesNumber (PN): Pregnant number; GlucosePlasma (GP): Glucose concentration (after 2 h of oral glucose tolerance test); BloodPressureDiastolic (BPD): Blood pressure (mm Hg); SkinThicknessTriceps (STT): Skin fold thickness (mm); Insulin2-Hour (I): Serum insulin (mu U/mL); BMIBody (BMI): Mass index (weight in kg/(height in m) ); DiabetesPedigreeFunctionDiabetes (DPFD): Pedigree function; AgeAge (AA): Years old; OutcomeClass (OC): Binary variable (0 indicates the no-diabetes status of 268 samples, and 1 indicates the diabetes status of the remaining 500 cases of the training dataset). In Figure 2 is illustrated the statistic distribution of the above listed attributes plotted by RapidMiner tool. Appl. App Sci. l. Sc 2019 i. 201 , 9 9, , 9 3532 , x 4 of4 23 of 22 (a) (b) (d) (c) (f) (e) (g) (h) (i) Figure 2. (a–i) Attribute dataset statistic distribution. Figure 2. (a–i) Attribute dataset statistic distribution. In general, before processing data by data mining algorithms, it is important to analyze the In general, before processing data by data mining algorithms, it is important to analyze the correlation between attributes in order to choose the less correlated variables: By processing correlation between attributes in order to choose the less correlated variables: By processing strong strong correlated variables, which can be introduced into the system redundancies and calculus sensitivities, which can alter the results and increase the data process error or the prediction error. Appl. Sci. 2019, 9, x 5 of 23 correlated variables, which can be introduced into the system redundancies and calculus sensitivities, which can alter the results and increase the data process error or the prediction error. Appl. Sci. 2019, 9, 3532 5 of 22 These considerations are valid also for LSTM processing. A method to estimate the correlation between variables generating a weights vector based on these correlations is Pearson’s correlation These considerations are valid also for LSTM processing. A method to estimate the correlation between coefficient evaluation. The algorithm calculates this coefficient, which is the covariance of the two variables generating a weights vector based on these correlations is Pearson’s correlation coecient variables divided by the product of their standard deviations [35,36]: evaluation. The algorithm calculates this coecient, which is the covariance of the two variables cov(XY , ) divided by the product of their standard deviations [35,36]:   (1) XY ,  XY cov(X, Y) = (1) X,Y being Cov (X,Y), the covariance of the variables X and Y (σXY), and σX and σY the standard deviation of X Y variable X and Y, respectively. being Cov (X,Y), the covariance of the variables X and Y ( ), and and the standard deviation XY X Y By observing the calculated correction matrix of Table 1 (data processing of the experimental of variable X and Y, respectively. dataset) it is clear that all the attributes are not strongly correlated. By observing the calculated correction matrix of Table 1 (data processing of the experimental dataset) it is clear that all the attributes are not strongly correlated. Table 1. Correlation matrix between experimental dataset attributes. PN GP BPD STT I BMI DPFD AA OC Table 1. Correlation matrix between experimental dataset attributes. PN 1 0.13 0.14 −0.08 −0.07 0.02 −0.03 0.54 0.22 PN GP BPD STT I BMI DPFD AA OC GP 0.13 1 0.15 0.06 0.03 0.22 0.14 0.26 0.47 PN 1 0.13 0.14 0.08 0.07 0.02 0.03 0.54 0.22 BPD 0.14 0.15 1 0.21 0.09 0.28 0.04 0.24 0.07 GP 0.13 1 0.15 0.06 0.03 0.22 0.14 0.26 0.47 STT −0.08 0.06 0.21 1 0.04 0.39 0.18 −0.11 0.07 BPD 0.14 0.15 1 0.21 0.09 0.28 0.04 0.24 0.07 I −0.07 0.33 0.09 0.44 1 0.2 0.19 −0.04 0.13 STT 0.08 0.06 0.21 1 0.04 0.39 0.18 0.11 0.07 BMI 0.02 0.22 0.28 0.39 0.02 1 0.14 0.04 0.29 I 0.07 0.33 0.09 0.44 1 0.2 0.19 0.04 0.13 DPFD −0.03 0.14 0.04 0.18 0.19 0.14 1 0.03 0.17 BMI 0.02 0.22 0.28 0.39 0.02 1 0.14 0.04 0.29 DPFD 0.03 0.14 0.04 0.18 0.19 0.14 1 0.03 0.17 AA 0.54 0.26 0.24 −0.11 −0.04 0.04 0.03 1 0.24 AA 0.54 0.26 0.24 0.11 0.04 0.04 0.03 1 0.24 OC 0.22 0.47 0.07 0.07 0.1 0.29 0.17 0.24 1 OC 0.22 0.47 0.07 0.07 0.1 0.29 0.17 0.24 1 A first check of correlation can also be performed by directly observing the 2D plots between a A ﬁrst check of correlation can also be performed by directly observing the 2D plots between couple of variables. By focusing the attention on the OutcomeClass variable indicating diabetic a couple of variables. By focusing the attention on the OutcomeClass variable indicating diabetic status, it was evident from Figures 3–5 that generally the classes 1 and 0 were not distinguished in status, it was evident from Figures 3–5 that generally the classes 1 and 0 were not distinguished in the function of the other variables (data overlapping). This confirmed that the results found in the the function of the other variables (data overlapping). This conﬁrmed that the results found in the correlation matrix and provided information about samples dispersion. correlation matrix and provided information about samples dispersion. Figure 3. Outcome versus DiabetesPedigree function. Figure 3. Outcome versus DiabetesPedigree function. Appl. Sci. 2019, 9, x 6 of 23 Appl. Sci. 2019, 9, 3532 6 of 22 Appl. Sci. 2019, 9, x 6 of 23 Figure 4. Outcome versus Insulin function. Figure 4. Outcome versus Insulin function. Figure 4. Outcome versus Insulin function. Figure 5. Outcome versus BMI function. Figure 5. Outcome versus BMI function. The prediction results about the outcome variable (labeled variable) were performed by the LSTM Figure 5. Outcome versus BMI function. The prediction results about the outcome variable (labeled variable) were performed by the neural network. The LSTM basic architecture [37] was composed of a cell, an input gate, an output gate, LSTM neural network. The LSTM basic architecture [37] was composed of a cell, an input gate, an and a forget gate. Each cell recalled values over arbitrary time intervals, besides the 3 gates regulated The prediction results about the outcome variable (labeled variable) were performed by the output gate, and a forget gate. Each cell recalled values over arbitrary time intervals, besides the 3 the ﬂow of information into and out of the cell. In Figure 6 it has ditched a scheme of the LSTM neural LSTM neural network. The LSTM basic architecture [37] was composed of a cell, an input gate, an gates regulated the flow of information into and out of the cell. In Figure 6 it has ditched a scheme of network cell where the input (input activation at the time step t i ), output (output activation at the output gate, and a forget gate. Each cell recalled values over arbitrary time intervals, besides the 3 the LSTM neural network cell where the input (input activation at the time step t it), output (output time step t o ), and forget (forget activation at the time step t f ) gates behaved as neuron computing in t t gates regulated the flow of information into and out of the cell. In Figure 6 it has ditched a scheme of activation at the time step t ot), and forget (forget activation at the time step t ft) gates behaved as a feed-forward or multi-layer neural network: The gates calculated their activations at time step t by the LSTM neural network cell where the input (input activation at the time step t it), output (output neuron computing in a feed-forward or multi-layer neural network: The gates calculated their considering the activation of the memory cell C at time step t-1. More details about the LSTM neural activation at the time step t ot), and forget (forget activation at the time step t ft) gates behaved as activations at time step t by considering the activation of the memory cell C at time step t-1. More network models are in the script comments of Appendix A. neuron computing in a feed-forward or multi-layer neural network: The gates calculated their details about the LSTM neural network models are in the script comments of Appendix A. activations at time step t by considering the activation of the memory cell C at time step t-1. More details about the LSTM neural network models are in the script comments of Appendix A. Appl. Sci. 2019, 9, 3532 7 of 22 Appl. Sci. 2019, 9, x 7 of 23 LSTM calculation cell Output t Gate Input Gate cell t h   Forget Gate Figure 6. Long short-term memory (LSTM) calculation cell (symbol x represents the multiplication Figure 6. Long short-term memory (LSTM) calculation cell (symbol x represents the multiplication operator between inputs, and represents the application of a dierentiable function). operator between inputs, and ∫ represents the application of a differentiable function). The output parameters indicating the LSTM performance are the model accuracy, the model The output parameters indicating the LSTM performance are the model accuracy, the model loss and the Receiver Operating Characteristic (ROC) curve indicating the Area under the ROC loss and the Receiver Operating Characteristic (ROC) curve indicating the Area under the ROC Curve—AUC—(performance indicator). Loss value deﬁnes how well the LSTM neural network model Curve—AUC—(performance indicator). Loss value defines how well the LSTM neural network behaves after each iteration of optimization (ideally, one would expect the reduction of loss after each, model behaves after each iteration of optimization (ideally, one would expect the reduction of loss or several, iterations). The accuracy parameter is deﬁned as: after each, or several, iterations). The accuracy parameter is defined as: Number o f correct predictions TP + TN Number of correct predictions TP TN Accuracy = = (2) Accuracy (2) Total number o f predictions TP + TN + FP + FN Total number of predictions TP TN FP FN being TP = True Positives, TN = True Negatives, FP = False Positives, and FN = False Negatives. being TP = True Positives, TN = True Negatives, FP = False Positives, and FN = False Negatives. The loss function is a binary cross-entropy used for the problems involving yes/no (binary) The loss function is a binary cross-entropy used for the problems involving yes/no (binary) decisions. For instance, in multi-label problems, where an example can belong simultaneously to decisions. For instance, in multi-label problems, where an example can belong simultaneously to multiple classes, the model tries to decide for each class whether the example belongs to that class or multiple classes, the model tries to decide for each class whether the example belongs to that class or not. This performance indicator is estimated as: not. This performance indicator is estimated as: Loss y, y   y log y  1 y  log 1 y         (3)   Loss y, y p =  y log y p+ (1 y) log 1 y p (3) p p p i0 i=0 where yp is the predicted value. where y is the predicted value. As calculation tools have been adopted Keras API and TensorFlow library: Keras is a high-level As calculation tools have been adopted Keras API and TensorFlow library: Keras is a high-level API suitable for building and training deep learning models (LSTM), and TensorFlow is a free and API suitable for building and training deep learning models (LSTM), and TensorFlow is a free and open-source software library for dataflow and differentiable programming across a range of tasks. open-source software library for dataﬂow and dierentiable programming across a range of tasks. 3. Results 3. Results In this section are shown the LSTM neural network results by enhancing some aspects of model In this section are shown the LSTM neural network results by enhancing some aspects of model consistency in function of the training and testing dataset percentage used for the calculation. consistency in function of the training and testing dataset percentage used for the calculation. Training and Testing Dataset Training and Testing Dataset The training and the testing dataset were randomly extracted from the whole dataset made by The training and the testing dataset were randomly extracted from the whole dataset made by 768 records. This allows a decrease in the error calculation of the LSTM network by limiting data 768 records. This allows a decrease in the error calculation of the LSTM network by limiting data redundancy and consecutively data correlation and sensitivity. Table 2 illustrates a table extracted redundancy and consecutively data correlation and sensitivity. Table 2 illustrates a table extracted from output results indicating the diabetic outcome prediction, where the predicted OC is the output from output results indicating the diabetic outcome prediction, where the predicted OC is the output and the other listed variables are the input testing attributes. and the other listed variables are the input testing attributes. Appl. Sci. 2019, 9, x 8 of 23 Appl. Sci. 2019, 9, x 8 of 23 Appl. Sci. 2019, 9, 3532 8 of 22 Table 2. Example of predicted outcomes (diabetes prediction): OC is the labeled class. Table 2. Example of predicted outcomes (diabetes prediction): OC is the labeled class. PN GP BPD STT I BMI DPFD AA OC (Predicted) Table 2. Example of predicted outcomes (diabetes prediction): OC is the labeled class. PN GP BPD STT I BMI DPFD AA OC (Predicted) 6 148 72 35 0 33.6 0.627 50 1 PN 6 GP 148 BPD 72 STT 35 0 I 33.BMI 6 0.62 DPFD 7 50 AA OC1(Predicted) 1 85 66 29 0 26.6 0.351 31 0 1 85 66 29 0 26.6 0.351 31 0 6 148 72 35 0 33.6 0.627 50 1 8 183 64 0 0 23.3 0.672 32 1 1 8 85183 66 64 29 0 0 0 23.26.6 3 0.670.351 2 32 31 1 0 1 89 66 23 94 28.1 0.167 21 0 8 183 64 0 0 23.3 0.672 32 1 1 89 66 23 94 28.1 0.167 21 0 0 137 40 35 168 43.1 2.288 33 1 1 89 66 23 94 28.1 0.167 21 0 0 137 40 35 168 43.1 2.288 33 1 5 116 74 0 0 25.6 0.201 30 0 0 137 40 35 168 43.1 2.288 33 1 5 116 74 0 0 25.6 0.201 30 0 3 78 50 32 88 31.0 0.248 26 1 5 116 74 0 0 25.6 0.201 30 0 3 78 50 32 88 31.0 0.248 26 1 3 78 50 32 88 31.0 0.248 26 1 10 115 0 0 0 35.3 0.134 29 0 10 10 115 115 0 0 0 0 0 0 35.35.3 3 0.130.134 4 29 29 0 0 2 197 70 45 543 30.5 0.158 53 1 2 197 70 45 543 30.5 0.158 53 1 2 197 70 45 543 30.5 0.158 53 1 8 125 96 0 0 0 0.232 54 1 8 125 96 0 0 0 0.232 54 1 8 125 96 0 0 0 0.232 54 1 In order to estimate, the outcome prediction has been applied to the LSTM neural network by In Inor oder rderto toestimate, estimate,the theoutcome outcome pr pr ediction edictionhas has been beenapplied appliedto tothe the LSTM LSTMneural neuralnetwork network by by changing the partitioning between experimental and training dataset. Different calculations have changing changing the the partitioning partitioning between betweenexperimental experimentaland andtraining training dataset. dataset.Di Di fer fer ent ent calculations calculations have have been performed by changing the testing dataset percentage. In particular, Figures 7–11 illustrate the been beenperformed performedby by changing changingthe the testing testing dataset dataset per pecentage. rcentage.In Inparticular particula,r,Figur Figure ess7 7 –– 11 11illustrate illustratethe the accuracy the losses and the ROC curve of the case of testing dataset percentage of 5%, 10%, 15%, accuracy accuracy the the losses losses and anthe d th ROC e ROC curve curv of e the of th case e ca of se testing of testi dataset ng data per set centage percent of age 5%, of10%, 5%, 15%, 10%, 20%, 15%, 20%, and 25%, respectively. and 20% 25%, , andr espectively 25%, respec . tively. (a) (b) (c) (a) (b) (c) False positive rate False positive rate Figure 7. LSTM results (training dataset = 95%, testing dataset = 5%): (a) Model accuracy versus Figure 7. LSTM results (training dataset = 95%, testing dataset = 5%): (a) Model accuracy versus Figure 7. LSTM results (training dataset = 95%, testing dataset = 5%): (a) Model accuracy versus epochs; (b) model loss versus epochs; (c) receiver operating characteristics (ROC) curve. epochs; (b) model loss versus epochs; (c) receiver operating characteristics (ROC) curve. epochs; (b) model loss versus epochs; (c) receiver operating characteristics (ROC) curve. (a) (b) (c) (a) (b) (c) False positive rate False positive rate Figure 8. LSTM results (training dataset = 90%, testing dataset = 10%): (a) Model accuracy versus Figure 8. LSTM results (training dataset = 90%, testing dataset = 10%): (a) Model accuracy versus epochs; (b) model loss versus epochs; (c) ROC curve. Figure 8. LSTM results (training dataset = 90%, testing dataset = 10%): (a) Model accuracy versus epochs; (b) model loss versus epochs; (c) ROC curve. epochs; (b) model loss versus epochs; (c) ROC curve. True positive rate True positive rate TrueTrue posi t pos ivei t riat ve e rate Appl. Sci. 2019, 9, x 9 of 23 Appl. Sci. 2019, 9, x 9 of 23 (a) (b) (c) Appl. Sci. 2019, 9, 3532 9 of 22 Appl. Sci. 2019, 9, x 9 of 23 (b) (a) (c) (b) (a) (c) False positive rate False positive rate Figure 9. LSTM results (training dataset = 85%, testing dataset = 15%): (a) Model accuracy versus epochs; (b) model loss versus epochs; (c) ROC curve. False positive rate Figure 9. LSTM results (training dataset = 85%, testing dataset = 15%): (a) Model accuracy versus Figure epochs; 9. (b LSTM ) mode results l loss v (training ersus epo dataset chs; (c) = ROC 85%, curv testing e. dataset = 15%): (a) Model accuracy versus Figure 9. LSTM results (training dataset = 85%, testing dataset = 15%): (a) Model accuracy versus (b) (a) epochs; (b) model loss versus epochs; (c) ROC curve. (c) epochs; (b) model loss versus epochs; (c) ROC curve. (b) (a) (c) (b) (a) (c) Figure 10. LSTM results (training dataset = 80%, testing dataset = 20%): (a) Model accuracy versus Figure epochs; ( 10.bLSTM ) moderl esults loss v(training ersus epodataset chs; (c) ROC = 80%, curv testing e. dataset = 20%): (a) Model accuracy versus Figure 10. LSTM results (training dataset = 80%, testing dataset = 20%): (a) Model accuracy versus epochs; (b) model loss versus epochs; (c) ROC curve. epochs; (b) model loss versus epochs; (c) ROC curve. Figure 10. LSTM results (training dataset = 80%, testing dataset = 20%): (a) Model accuracy versus (b) (a) (c) epochs; (b) model loss versus epochs; (c) ROC curve. (b) (a) (c) (b) (a) (c) False positive rate False positive rate Figure 11. LSTM results (training dataset = 75%, testing dataset = 25%): (a) Model accuracy versus Figure 11. LSTM results (training dataset = 75%, testing dataset = 25%): (a) Model accuracy versus epochs; (b) model loss versus epochs; (c) ROC curve. False positive rate epochs; (b) model loss versus epochs; (c) ROC curve. Figure 11. LSTM results (training dataset = 75%, testing dataset = 25%): (a) Model accuracy versus The best convergence was observed for the model accuracy of Figure 10, thus conﬁrming that a epochs; (b) model loss versus epochs; (c) ROC curve. The best convergence was observed for the model accuracy of Figure 10, thus confirming that a good balancing between test and train model was achieved (case of testing dataset of 20%). Evident Figure 11. LSTM results (training dataset = 75%, testing dataset = 25%): (a) Model accuracy versus good balancing between test and train model was achieved (case of testing dataset of 20%). Evident overﬁtting was observed in the model accuracy of Figure 9 related to 15% of testing dataset (no good epochs; (b) model loss versus epochs; (c) ROC curve. The best convergence was observed for the model accuracy of Figure 10, thus confirming that a overfitting was observed in the model accuracy of Figure 9 related to 15% of testing dataset (no good balancing of parameters). good balancing between test and train model was achieved (case of testing dataset of 20%). Evident balancing of parameters). The best convergence was observed for the model accuracy of Figure 10, thus confirming that a From the ROC curves can be calculated the AUC values. overfitting was observed in the model accuracy of Figure 9 related to 15% of testing dataset (no good From the ROC curves can be calculated the AUC values. gooSummarized d balancing bet inwee Table n te 3st ar ae nd the tra riesults n modof el the wasAUC, achiev accuracy ed (case , o and f testi loss ng of dathe taset adopted of 20%).models, Evident balancing of parameters). Summarized in Table 3 are the results of the AUC, accuracy, and loss of the adopted models, overfitting was observed in the model accuracy of Figure 9 related to 15% of testing dataset (no good where the green color indicates a better result. From the ROC curves can be calculated the AUC values. where the green color indicates a better result. balancing of parameters). Summarized in Table 3 are the results of the AUC, accuracy, and loss of the adopted models, Table 3. LSTM neural network model and decision support systems (DSS) reading automatism: area From the ROC curves can be calculated the AUC values. where the green color indicates a better result. Table 3. LSTM neural network model and decision support systems (DSS) reading automatism: area under the curve (AUC), accuracy, and loss results. Summarized in Table 3 are the results of the AUC, accuracy, and loss of the adopted models, under the curve (AUC), accuracy, and loss results. where the green color indicates a better result. Table 3. LSTM neural network model and decision support systems (DSS) reading automatism: area Testing Samples 5% 10% 15% 20% 25% under the curve (AUCT ), e ast cciura ng cS ya , a mp nd llo es ss res 5% ult s. 10% 15% 20% 25% AUC % 87.7 87 83.9 82 86.7 Table 3. LSTM neural network model and decision support systems (DSS) reading automatism: area AUC % 87.7 87 83.9 82 86.7 Accuracy % 75 73 70 75 76 Testing Samples 5% 10% 15% 20% 25% under the curve (AUC), accuracy, and loss results. A Loss ccura %cy % 100 75 100 73 70 70 55 75 65 76 AUC % 87.7 87 83.9 82 86.7 Testing Samples 5% 10% 15% 20% 25% Accuracy % 75 73 70 75 76 AUC % 87.7 87 83.9 82 86.7 Accuracy % 75 73 70 75 76 True positivTrue e rat e positive rate True positive rate True positiTrue ve rat pos e itive rate True positive rate Appl. Sci. 2019, 9, 3532 10 of 22 The red and the green colors refer to values above or below thresholds considered valid for LSTM outputs. Speciﬁcally, the following thresholds have been considered: 86% for AUC %, 75% for the accuracy, and 60% for loss. The thresholds could be integrated into an automatic procedure able to select the best model to apply. In Appendix A is listed the python script used for the testing. The LSTM approach has been improved by implementing a new approach to the training dataset construction based on artiﬁcial data creation (LSTM artiﬁcial records—AR—). In the case of 20% of the testing dataset characterized by the best compromise between accuracy and loss parameter has created a new training dataset following these criteria: - Choose the attributes characterized by a higher correlation if compared with other attributes (in the case of study are insulin correlated with glucose, and skin thickness correlated with BMI); - Split the dataset for patients having diabetes or not (ﬁrst partition); - The ﬁrst partition has been furthermore split by considering the age (second partition); - The second partition is then split into a third one representing pregnant women (third partition); - Change of the correlated attributes by a low quantity of the values couple glucose and insulin (by increasing insulin is decreased the glucose of the same entity in order to balance the parameter variation), and skin thickness and BMI of the same person belonging to the same partition. The goal of the proposed criteria is to generate artiﬁcial records improving the training dataset stability and test accuracy. The increase in artiﬁcial data is important for all cases where only few data of the training dataset are available, as for more practical cases. In the case of this study, a training dataset has been created of 10,000 records, where only 768 are real. The cross validation has been performed on MLP traditional methods, and on LSTM using artiﬁcial records—AR—(LSTM-AR-). In Table 4 a benchmarking performed by the comparison of the test set accuracy parameter is provided between traditional MLP [38], LSTM traditional algorithm, and the innovative LSTM-AR-approach. Table 4. Cross validation of results. Method Test Set Accuracy % MLP 77.5 [38] LSTM 75 LSTM-AR- 84 Observing the comparison, it is evident an eciency increase of the LSTM-AR- of 9% if compared with the LSTM traditional approach, and of 6.5% if compared with the MLP method optimized for diabetes prediction model [38]. Figures 12–14 illustrate the accuracy, the loss, and the ROC curve of the LSTM-AR- outputs. Appl. Sci. 2019, 9, x 11 of 23 Appl. Sci. 2019, 9, x 11 of 23 Appl. Sci. 2019, 9, 3532 11 of 22 Figure 12. LSTM-AR- results (training dataset = 80%, testing dataset = 20%): Model Accuracy versus epochs. Figure 12. LSTM-AR- results (training dataset = 80%, testing dataset = 20%): Model Accuracy Figure 12. LSTM-AR- results (training dataset = 80%, testing dataset = 20%): Model Accuracy versus versus epochs. epochs. Appl. Sci. 2019, 9, x 12 of 23 Figure 13. LSTM-AR- results (training dataset = 80%, testing dataset = 20%): Model Loss versus epochs. Figure 13. LSTM-AR- results (training dataset = 80%, testing dataset = 20%): Model Loss versus epochs. Figure 13. LSTM-AR- results (training dataset = 80%, testing dataset = 20%): Model Loss versus epochs. False positive rate Figure 14. LSTM-AR- results (training dataset = 80%, testing dataset = 20%): ROC curve. Figure 14. LSTM-AR- results (training dataset = 80%, testing dataset = 20%): ROC curve. In particular, the model accuracy of Figure 12 proves that a good parameter balancing is achieved in terms of convergence, and no overfitting is observed. Table 5 reports the comparison between the traditional LSTM approach and LSTM-AR- one, where it is observed that there is an efficiency improvement performed by LSTM-AR-. Table 5. LSTM/LSTM-AR models and DSS reading automatism: AUC, accuracy and loss results. Testing Samples LSTM (20%) LSTM-AR (20%) AUC % 82 89 Accuracy % 75 84 Loss % 55 50 For the other testing dataset (5%, 10%, 15%, 25%) the same increase/decrease percentage has been observed as in the case of Table 5. 4. Discussion The proposed results allow us to define guidelines to adopt for LSTM data processing in general for data analysis in health applications using a generic dataset. The main steps are summarized as follows: - Calculation of correlation matrix (analysis of correlation and weights between variables); - Check of 2D variable functions (check of samples dispersion); - Calculation of LSTM prediction of diabetic outcomes by changing the partitioning between the testing and the training dataset; - Choice procedures of the best LSTM model. In order to apply correctly the LSTM, one approach is to balance both the indicators loss and accuracy. By observing Table 3, the case of the training dataset of 20% represents a good case of this balancing but allows a relative low AUC value if compared with the other cases. For this purpose, it is important to compare the outputs of the model with the case of good AUC performance related to the cases of testing samples of 5%, 10%, and 25%. This “cross comparison” will facilitate a better understanding of which samples can be classified in false positive or false negative classes. Observing correlation matrix results of Table 1, we note that GlucosePlasma (GP) and OutcomeClass (OC) are correlated by a factor of 0.47, and PregnanciesNumber (PN) and AgeAge (AA) are correlated by a factor 0.57. For this purpose, these attributes could contribute negatively to the model convergence and for AUC values. In other dataset cases, the correlations between attributes True positive rate Appl. Sci. 2019, 9, 3532 12 of 22 In particular, the model accuracy of Figure 12 proves that a good parameter balancing is achieved in terms of convergence, and no overﬁtting is observed. Table 5 reports the comparison between the traditional LSTM approach and LSTM-AR- one, where it is observed that there is an eciency improvement performed by LSTM-AR-. Table 5. LSTM/LSTM-AR models and DSS reading automatism: AUC, accuracy and loss results. Testing Samples LSTM (20%) LSTM-AR (20%) AUC % 82 89 Accuracy % 75 84 Loss % 55 50 For the other testing dataset (5%, 10%, 15%, 25%) the same increase/decrease percentage has been observed as in the case of Table 5. 4. Discussion The proposed results allow us to deﬁne guidelines to adopt for LSTM data processing in general for data analysis in health applications using a generic dataset. The main steps are summarized as follows: - Calculation of correlation matrix (analysis of correlation and weights between variables); - Check of 2D variable functions (check of samples dispersion); - Calculation of LSTM prediction of diabetic outcomes by changing the partitioning between the testing and the training dataset; - Choice procedures of the best LSTM model. In order to apply correctly the LSTM, one approach is to balance both the indicators loss and accuracy. By observing Table 3, the case of the training dataset of 20% represents a good case of this balancing but allows a relative low AUC value if compared with the other cases. For this purpose, it is important to compare the outputs of the model with the case of good AUC performance related to the cases of testing samples of 5%, 10%, and 25%. This “cross comparison” will facilitate a better understanding of which samples can be classiﬁed in false positive or false negative classes. Observing correlation matrix results of Table 1, we note that GlucosePlasma (GP) and OutcomeClass (OC) are correlated by a factor of 0.47, and PregnanciesNumber (PN) and AgeAge (AA) are correlated by a factor 0.57. For this purpose, these attributes could contribute negatively to the model convergence and for AUC values. In other dataset cases, the correlations between attributes can be stronger by adding further complexity to the LSTM output analysis. For this reason, it is important to compare the results of dierent models in order to ﬁnd the best reading procedure involving: - The extraction of outliers related to wrong measurements and to neglect from the training and testing dataset; - The combined analysis of the therapeutic plan of the monitored patient; - The analysis of possible failures of the adopted sensors; - A dynamical update of the training model by changing anomalous data records; - The digital traceability of the assistance pattern in order to choose a patient more suitable to construct the training model; - A pre-clustering of patients (data pre-processing performed by combining dierent attributes such as age, pathology, therapeutic plan, etc.). We note that in medical and clinical analysis the AUC is considered as a classiﬁer able to discriminate the capacity of a test (see Table 6) [39]. All the AUC values found during the test are classiﬁed as “moderately accurate test”. In addition, for this reason, it is important to focus the attention on the convergence between Loss and Accuracy parameters. Appl. Sci. 2019, 9, 3532 13 of 22 Table 6. AUC values [39]. 0.5 (50%) < 0.7 (70%) < AUC < 0.5 0.9 (90%) < AUC AUC = 0.5 AUC 0.7 AUC 0.9 AUC = 1 (50%) AUC 1 (100%) (70%) (90%) Classiﬁcation of the Non-informative Moderately Highly accurate discriminating No sense test Inaccurate test Perfect test test accurate test test capacity of a test The sensitivity of the LSTM neural network is then correlated with the speciﬁc used model and with the choosen dataset. The possibility to ﬁnd common data patterns is then important to formulate a correct training dataset. The goal is to perform a preliminary cross-analysis by considering all the patient information, which are collected into a database system (see Appendix B representing the adopted experimental database). The cross-analysis will contribute to creating the best LSTM training model. A good procedure to follow is: - Phase 1: Collecting patient data (by means of a well-structured database system allowing dierent data mining processing); - Phase 2: Pre-clustering and ﬁltering of patient data (construction of a stable training dataset); - Phase 3: Pre-analysis of correlations between attributes and analysis of data dispersions; - Phase 4: Execution of the LSTM neural network algorithm by processing simultaneously dierent attributes (multi-attribute data processing); - Phase 5: Comparison of results by changing the testing dataset; - Phase 6: Choice of the best model to adopt following the analysis of phase 5. We observe that by repeating the calculation of the random testing datasets, same range values are obtained of the plots of Figures 8–15, thus conﬁrming the validity of the result discussion. The limitations and advantages of the proposed study are summarized in the following Table 7: Table 7. Limitations and advantages of the proposed study. Advantages Limitations DSS tool for diabetes prediction ready to use Accurate training dataset Multi attribute analysis Redundancy of data processing (correlated attributes) Presence of positive false and negative false due to Reading procedure of outputs results wrong measurements Choose of the best model according to simultaneous Finding a true compromise of eciency parameter analyses (accuracy, loss, and AUC) values Network having a memory used for the data It is necessary to acquire a correct temporal data processing sequence Powerful approach if compared with ANN MLP High computational cost method Concerning dataset optimization has increased the LSTM performances by adding artiﬁcial data into the training dataset by deﬁning the DSS automatism represented by the ﬂow chart of Figure 15: The LSTM neural network model is applied automatically when the training dataset is constructed with enough data, otherwise a new training dataset will be formulated by artiﬁcial data (LSTM-AR- model) following the criteria discussed in Section 3. The ﬂowchart of Figure 15 summarizes all the concepts discussed in this paper. Appl. Sci. 2019, 9, x 14 of 23 attributes) Presence of positive false and negative false due Reading procedure of outputs results to wrong measurements Choose of the best model according to Finding a true compromise of efficiency simultaneous analyses (accuracy, loss, and AUC) parameter values Network having a memory used for the data It is necessary to acquire a correct temporal data processing sequence Powerful approach if compared with ANN MLP High computational cost method Concerning dataset optimization has increased the LSTM performances by adding artificial data into the training dataset by defining the DSS automatism represented by the flow chart of Figure 15: The LSTM neural network model is applied automatically when the training dataset is constructed with enough data, otherwise a new training dataset will be formulated by artificial data (LSTM-AR- model) following the criteria discussed in Section 3. The flowchart of Figure 15 Appl. Sci. 2019, 9, 3532 14 of 22 summarizes all the concepts discussed in this paper. Start Initial training dataset consruction Correlation analysis No Construction of Is the training dataset constructed with enough data? a new training dataset by artificial data Yes Yes Testing dataset percentage LSTM model variation LSTM AR model Correct accuracy, loss, AUC balancing? No Yes Prediction results Figure 15. Flowchart representing automatisms for LSTM/LSTM-AR- DSS model predicting diabetes. Figure 15. Flowchart representing automatisms for LSTM/LSTM-AR- DSS model predicting diabetes. In order to test the LSTM-AR- algorithm on a time series dataset has been considered the sequential In order to test the LSTM-AR- algorithm on a time series dataset has been considered the dataset of reference [40] (9086 time series data generated by 70 patients). This dataset is suitable for sequential dataset of reference [40] (9086 time series data generated by 70 patients). This dataset is many architectures related to homecare smart assistance platforms. suitable for many architectures related to homecare smart assistance platforms. By observing the results of Table 8, it is clear that LSTM and LSTM-AR- approaches are characterized By observing the results of Table 8, it is clear that LSTM and LSTM-AR- approaches are by the same performances. In particular LSTM and the LSTM-AR- exhibit a percentage improvement characterized by the same performances. In particular LSTM and the LSTM-AR- exhibit a percentage of Accuracy and Loss of 4% if compared with MLP results. improvement of Accuracy and Loss of 4% if compared with MLP results. Table 8. LSTM, LSTM-AR, and MLP models: AUC, accuracy and loss results by considering the dataset found in reference [40]. Testing Samples LSTM (20%) LSTM-AR (20%) MLP (20%) AUC % 91 91 94 Accuracy % 86 86 82 Loss % 10 10 14 In this case, the artiﬁcial records (454,300 artiﬁcial records) have been created by considering the sequential dataset by extracting sub- data set sequences with traditional sliding window approach. The MLP network is optimized for the new performed test (1 hidden layer enabling 30 neurons). Appendix C indicates the adopted MLP network. The adopted LSTM is the recurrent neural network—RNN—described in Appendix A (where sequential datasets will not be considered in structure reshaping). In this last case, the selected epochs number is 200 because over 200 there was no performance improvement. Figures 16 and 17 illustrate two graphs proving this cross validation method [41]. For all the other cases, the choice of the epochs number followed the same criterion. Appl. Sci. 2019, 9, x 15 of 23 Table 8. LSTM, LSTM-AR, and MLP models: AUC, accuracy and loss results by considering the dataset found in reference [40]. Testing Samples LSTM (20%) LSTM-AR (20%) MLP (20%) AUC % 91 91 94 Accuracy % 86 86 82 Loss % 10 10 14 In this case, the artificial records (454,300 artificial records) have been created by considering the sequential dataset by extracting sub- data set sequences with traditional sliding window approach. The MLP network is optimized for the new performed test (1 hidden layer enabling 30 neurons). Appendix C indicates the adopted MLP network. The adopted LSTM is the recurrent neural network—RNN—described in Appendix A (where sequential datasets will not be considered in structure reshaping). In this last case, the selected epochs number is 200 because over 200 there was no performance Appl. Sci. i2019 mpr,o 9v , emen 3532 t. Figures 16 and 17 illustrate two graphs proving this cross validation method [41]. For 15 of 22 all the other cases, the choice of the epochs number followed the same criterion. Appl. Sci. 2019, 9, x 16 of 23 Figure Fig16. ure 1 Accuracy 6. Accuracy plot plotusing using da dataset taset fofound und in r in efer refer ence ence [40]. [40]. Figure Figure 17. 1 Loss 7. Loss plot plousing t using dataset dataset fo found und in in refer refer encence e [40].[ 40]. The illustrated outputs are the average result of 25 trainings. The illustrated outputs are the average result of 25 trainings. As observed in Table 6, the LSTM-AR- approach is characterized by the same performance of As observed in Table 6, the LSTM-AR- approach is characterized by the same performance of the the LSTM method by confirming that it is suitable for all homecare platforms where not enough in LSTM method by conﬁrming that it is suitable for all homecare platforms where not enough in the the training sequential dataset is available. training sequential dataset is available. 5. Conclusions 5. Conclusions The proposed paper shows how important the data sensitivity analysis in LSTM diabetes is, and predictions also considered patient attributes characterized by low correlations. The high The proposed paper shows how important the data sensitivity analysis in LSTM diabetes is, and sensitivity is mainly due to the creation of the training and testing dataset. The research is focused on predictions also considered patient attributes characterized by low correlations. The high sensitivity is the sensitivity analysis versus the testing dataset partitioning, by means of a stable experimental mainly due to the creation of the training and testing dataset. The research is focused on the sensitivity dataset tested in the literature. Following the performed analysis, a useful guideline to execute analysis versus the testing dataset partitioning, by means of a stable experimental dataset tested in the correct data processing and analysis by means of the LSTM neural network algorithm, processing literature. Following the performed analysis, a useful guideline to execute correct data processing different patient attributes, has been formulated. The discussion is mainly focused on the and analysis simulta by neo means us analof ysithe s and LSTM compa neural rison ofnetwork the LSTM algorithm, performance pr iocessing ndicators such dier as ent accpatient uracy, loattributes, ss, and AUC. The study is completed by presenting the python code used for the calculation and has been formulated. The discussion is mainly focused on the simultaneous analysis and comparison database design of an information system providing more information suitable for the data of the LSTM performance indicators such as accuracy, loss, and AUC. The study is completed by pre-processing and for data processing. The structured database is integrated into the DSS presenting the python code used for the calculation and database design of an information system information system oriented on homecare assistance, providing prediction results and different providing more information suitable for the data pre-processing and for data processing. The structured analysis models, and predicted health risks. The choice to use different test set sizes is dictated by database is integrated into the DSS information system oriented on homecare assistance, providing the fact that many datasets are not available with a perfect sequential structure (missing values, not prediction periro esults dical and measuremen dierent ts, analysis human models, measurem and ent pr eedicted rrors, rehealth cords ex risks. changed The , choice etc.), a to nduse are di erent characterized by different dimensions. For these reasons, a criterion has been formulated for a generic dataset by changing the testing size where all the proposed results are the average results of 25 trainings. The work also proposes an innovative approach based on the construction of an efficient training artificial dataset based on the weak variation of correlated attributes. The approach, named LSTM-AR-, can be applied to other applications and dataset different from the diabetes prediction following the same logic improved for the proposed DSS automatism. The LSTM-AR- approach can be adopted for all the platforms characterized by a poor training dataset. Author Contributions: Conceptualization, A.M., and V.M.; methodology, V.M. and A.M.; software, V.M., D.C., D.G.; validation, A.M.; formal Analysis, A.M.; investigation, A.G., and A.M.; resources, A.G.; data curation, A.M.; writing—original draft preparation, A.M.; supervision, A.G. and V.M.; project administration, A.G. Funding: This research received no external funding. Appl. Sci. 2019, 9, 3532 16 of 22 test set sizes is dictated by the fact that many datasets are not available with a perfect sequential structure (missing values, not periodical measurements, human measurement errors, records exchanged, etc.), and are characterized by dierent dimensions. For these reasons, a criterion has been formulated for a generic dataset by changing the testing size where all the proposed results are the average results of 25 trainings. The work also proposes an innovative approach based on the construction of an ecient training artiﬁcial dataset based on the weak variation of correlated attributes. The approach, named LSTM-AR-, can be applied to other applications and dataset dierent from the diabetes prediction following the same logic improved for the proposed DSS automatism. The LSTM-AR- approach can be adopted for all the platforms characterized by a poor training dataset. Author Contributions: Conceptualization, A.M., and V.M.; methodology, V.M. and A.M.; software, V.M., D.C., D.G.; validation, A.M.; formal Analysis, A.M.; investigation, A.G., and A.M.; resources, A.G.; data curation, A.M.; writing—original draft preparation, A.M.; supervision, A.G. and V.M.; project administration, A.G. Funding: This research received no external funding. Acknowledgments: The work has been developed in the frameworks of the project: “Piattaforma B.I. intelligente di management risorse e di monitoraggio costi di assistenza sanitaria ‘Healthcare Assistance Platform: Management and Resources Allocation’”. Authors gratefully thanks the researchers: V. Calati, D. Carella, A. Colonna, R. Cosmo, G. Fanelli, R. Guglielmi, A. Leogrande, A. Lombardi, A. Lorusso, N. Malfettone, F. S. Massari, G. Meuli, L. Pellicani, R. Porﬁdo, D. Suma, F. Tarulli, and E. Valenzano. Conﬂicts of Interest: The authors declare no conﬂict of interest. Appendix A In this appendix is listed the python code used for the check of the adopted LSTM algorithm. # Visualize training history from keras.models import Sequential from keras.layers import LSTM from keras.layers import Dense import matplotlib.pyplot as plt import numpy from sklearn import preprocessing from sklearn.metrics import roc_curve from sklearn.metrics import roc_auc_score from matplotlib import pyplot # random seed (a random seed is ﬁxed) seed = 42 numpy.random.seed(seed) ‘dataset loading(csv format)’ dataset = numpy.loadtxt(“C:/user /pime_indian_paper/dataset/diabetes3.csv”, delimiter = “,”) ‘Dataset Normalization’ normalized = preprocessing.normalize(dataset, norm = ‘max’, axis = 0, copy = True) ‘Partitioning example: 80% as training set and the 20% of sample of the test dataset’ X = normalized[:,0:8] Y = normalized[:,8] ‘Dataset structure: 1 column of eight row: time sequence data format’ ‘We modify the dataset structure so that it has a column with 8 rows instead of a row with 8 columns (structure implementing a temporal sequence). For sequential dataset it is not considered the following reshaping’ X = X.reshape(768, 8, 1) Appl. Sci. 2019, 9, 3532 17 of 22 ‘LSTM model creation’ ‘We will use an LSTM (Long Short Term Memory) network. Recurrent networks take as input not only the example of current input they see, but also what they have previously perceived. The decision taken by a recurrent network at the time t-1 inﬂuences the decision it will reach a moment later in time t: the recurrent networks have two sources of input, the present and the recent past. We will use on each neuron the RELU activation function that ﬂattens the response to all negative values to zero, while leaving everything unchanged for values equal to or greater than zero (normalization)’ model = Sequential() model.add(LSTM(32, input_shape = (8,1), return_sequences = True, kernel_initializer = ‘uniform’, activation = ‘relu’)) model.add(LSTM(64, kernel_initializer = ‘uniform’, return_sequences = True, activation = ‘relu’ )) model.add(LSTM(128, kernel_initializer = ‘uniform’, activation = ‘relu’)) model.add(Dense(256, activation = ‘relu’)) model.add(Dense(128, activation = ‘relu’)) model.add(Dense(64, activation = ‘relu’)) model.add(Dense(16, activation = ‘relu’)) model.add(Dense(1, activation = ‘sigmoid’)) ‘Loss function’ ‘We compile the model using as a NADAM optimizer that combines the peculiarities of the RMSProp optimizer with the momentum concept’ ‘We calculate the loss function through the binary crossentropy’ model.compile(loss = ‘binary_crossentropy’, optimizer = ‘NADAM’, metrics = [‘accuracy’]) model.summary() # Fit the model history = model.ﬁt(X, Y, validation_split = 0.33, epochs = 300, batch_size = 64, verbose = 1) ‘Graphical Reporting’ plt.plot(history.history[‘acc’]) plt.plot(history.history[‘val_acc’]) plt.title(‘model accuracy’) plt.ylabel(‘accuracy’) plt.xlabel(‘epoch’) plt.legend([‘train’, ‘test’], loc = ‘upper left’) plt.saveﬁg(‘accuracy.png’) plt.show() ‘Outputs plotting’ plt.plot(history.history[‘loss’]) plt.plot(history.history[‘val_loss’]) plt.title(‘model loss’) plt.ylabel(‘loss’) plt.xlabel(‘epoch’) plt.legend([‘train’, ‘test’],loc = ‘upper left’) plt.saveﬁg(‘loss.png’) plt.show() ‘model saving’ model.save(‘pima_indian.model’) Appl. Sci. 2019, 9, 3532 18 of 22 ‘Curva ROC Curve’ probs = model.predict_proba(X) probs = probs[:,0] auc = roc_auc_score(Y, probs) print(‘AUC: %.3f’ % auc) fpr, tpr, thresholds = roc_curve(Y, probs) pyplot.plot([0, 1], [0, 1], linestyle = ‘–’) pyplot.plot(fpr, tpr, marker = ‘.’) pyplot.saveﬁg(‘roc.png’) pyplot.show() Appendix B In this appendix section is indicated the whole dataset structure of the information system monitoring patients at home. Figure A1 illustrates the database layout design upgrading the information Appl. Sci. 2019, 9, x 19 of 23 system architecture of Figure 1a. Figure A1. Database structure of the information system oriented on homecare assistance monitoring Figure A1. Database structure of the information system oriented on homecare assistance monitoring and integrating LSTM algorithms. and integrating LSTM algorithms. Below are indicated the main requirements of the designed database enhancing possible facilities. 1. Model for alarm prediction The patient ID, of sex SEX, having pathology pathology, born on BithDay, is examined at home by the id_devices providing measurements which will be stored in the database. The patient therapeutic status is indicated by id-therapia. Alarms is a tuple that contains the relative code_parameter and with the min and max values of the parameter that produced the alarm (thresholds). 2. Predictive model of patient’s health status It is possible to predict the status of patients by applying the LSTM algorithm based on historical data processing of vital parameters dataset. 3. Classification of the adequacy of therapy for each patient who has experienced an alarm All patient with id_parameters_threshold having a value above or below the threshold limit, are involved in a particular therapeutic status identified by id_therap_status, and by particular measures of pathology. Id_therap is the therapy that the patient is following. Every patient with a pathology follows a specific therapeutic program. If the patient’s state of health is recorded as critical, then, it Appl. Sci. 2019, 9, 3532 19 of 22 Below are indicated the main requirements of the designed database enhancing possible facilities. 1. Model for alarm prediction The patient ID, of sex SEX, having pathology pathology, born on BithDay, is examined at home by the id_devices providing measurements which will be stored in the database. The patient therapeutic status is indicated by id-therapia. Alarms is a tuple that contains the relative code_parameter and with the min and max values of the parameter that produced the alarm (thresholds). 2. Predictive model of patient’s health status It is possible to predict the status of patients by applying the LSTM algorithm based on historical data processing of vital parameters dataset. 3. Classiﬁcation of the adequacy of therapy for each patient who has experienced an alarm All patient with id_parameters_threshold having a value above or below the threshold limit, are involved in a particular therapeutic status identiﬁed by id_therap_status, and by particular measures of pathology. Id_therap is the therapy that the patient is following. Every patient with a pathology follows a speciﬁc therapeutic program. If the patient’s state of health is recorded as critical, then, it will be possible to use an LSTM-based program which, based on historical data, will provide support about the adequacy of his therapy. 4. Support for the diagnosis and prognosis of the disease Starting with the analysis of historical data, it is possible to establish the temporal evolution of the pathology. For example, it is possible to identify the patient that is most “similar” to the current patient. The patient id_patient is hospitalized on the ﬁrst day by communicating messages to the operator, who receives documents typology (document_typology), containing ﬁlename (properly processed). The LSTM will provide a diagnostic indication of the pathology and a prognostic on its temporal evolution. 5. Evaluation of the human resources (operators) The patient assistance operations will provide important information about Key Performance Indicators (KPI) of operators. 6. Documents data processing for the development of assisted diagnosis assistances The data processing of all the documents and ﬁle about a patient will allow to optimize the homecare assistance process. 7. Analysis of the relationships between classes Proper association rules allow us to obtain interesting relationships within the database attributes. In this case it is possible to establish relationships of the type: [id_patient, pathology] -> “parameters”, thus supporting the prognostic. 8. Analysis of devices The device records will contain the identiﬁcation number of the device associated with the patient. All data of devices will be contained into tables associated with the patient ID. 9. Inspection of the pharmacological dosage administered to the patient At each patient id_patient is associated with a therapy id_therapy. An important relationship to analyze is: id_patient, id_therapy] -> “alarm”. 10. Real time Criticality analysis The constantly monitored patient conditions can be displayed in real time. Historical measures can be applied in order to predict critical moments. Appl. Sci. 2019, 9, 3532 20 of 22 Appendix C Below are listed the MLP script enabling data processing. model = Sequential() model.add(Dense(93, input_shape = (1200,), activation = ‘relu’)) model.add(Dense(93, activation = ‘relu’)) model.add(Dense(1, activation = ‘relu’)) model.compile(metrics = [‘accuracy’, auroc], optimizer = Nadam(lr = 0.002, schedule_decay = 0.004), loss = ‘mean_squared_error’) model.summary() Below are the reported model summary Layer (type) Output Shape Param # dense_1 (Dense) (None, 93) 111693 dense_2 (Dense) (None, 93) 8742 dense_3 (Dense) (None, 1) 94 Total params: 120,529 Trainable params: 120,529 Non-trainable params: 0 References 1. Wimmer, H.; Powell, L.M. A comparison of open source tools for data science. J. Inf. Syst. Appl. Res. 2016, 9, 4–12. 2. Al-Khoder, A.; Harmouch, H. Evaluating four of the most popular open source and free data mining tools. Int. J. Acad. Sci. Res. 2015, 3, 13–23. 3. Gulli, A.; Pal, S. Deep Learning with Keras- Implement Neural Networks with Keras on Theano and TensorFlow; Birmingham- Mumbai Packt Book: Birmingham, UK, 2017; ISBN 978-1-78712-842-2. 4. Kovalev, V.; Kalinovsky, A.; Kovalev, S. Deep learning with theano, torch, cae, TensorFlow, and deeplearning4j: Which one is the best in speed and accuracy? In Proceedings of the XIII International Conference on Pattern Recognition and Information Processing, Minsk, Belarus, 3–5 October 2016; Belarus State University: Minsk, Belarus, 2016; pp. 99–103. 5. Li, J.-S.; Yu, H.-Y.; Zhang, X.-G. Data mining in hospital information system. In New Fundamental Technologies in Data Mining; Funatsu, K., Ed.; Intech: London, UK, 2011. 6. Goodwin, L.; VanDyne, M.; Lin, S. Data mining issues an opportunities for building nursing knowledge. J. Biomed. Inform. 2003, 36, 379–388. [CrossRef] 7. Belacel, N.; Boulassel, M.R. Multicriteria fuzzy assignment method: A useful tool to assist medical diagnosis. Artif. Intell. Med. 2001, 21, 201–207. [CrossRef] 8. Demšar, J.; Zupan, B.; Aoki, N.; Wall, M.J.; Granchi, T.H.; Beck, J.R. Feature mining and predictive model construction from severe trauma patient’s data. Int. J. Med. Inform. 2001, 36, 41–50. [CrossRef] 9. Kusiak, A.; Dixon, B.; Shital, S. Predicting survival time for kidney dialysis patients: a data mining approach. Comput. Biol. Med. 2005, 35, 311–327. [CrossRef] 10. Yu, H.-Y.; Li, J.-S. Data mining analysis of inpatient fees in hospital information system. In Proceedings of the IEEE International Symposium on IT in Medicine & Education (ITME2009), Jinan, China, 14–16 August 2009. 11. Chae, Y.M.; Kim, H.S. Analysis of healthcare quality indicator using data mining and decision support system. Exp. Syst. Appl. 2003, 24, 167–172. [CrossRef] 12. Morando, M.; Ponte, S.; Ferrara, E.; Dellepiane, S. Deﬁnition of motion and biophysical indicators for home-based rehabilitation through serious games. Information 2018, 9, 105. [CrossRef] 13. Ozcan, Y.A. Quantitative Methods in Health Care Management, 2nd ed.; Josey-Bass: San Francisco, CA, USA, 2009; pp. 10–44. Appl. Sci. 2019, 9, 3532 21 of 22 14. Ghavami, P.; Kapur, K. Artiﬁcial neural network-enabled prognostics for patient health management. In Proceedings of the IEEE Conference on Prognostics and Health Management (PHM), Denver, CO, USA, 18–21 June 2012. 15. Grossi, E. Artiﬁcial neural networks and predictive medicine: A revolutionary paradigm shift. In Artiﬁcial Neural Networks—Methodological Advances and Biomedical Applications, 1st ed.; Suzuki, K., Ed.; InTech: Rijeka, Croatia, 2011; Volume 1, pp. 130–150. 16. Adhikari, N.C.D. Prevention of heart problem using artiﬁcial intelligence. Int. J. Artif. Intell. Appl. 2018, 9, 21–35. [CrossRef] 17. Galiano, A.; Massaro, A.; Boussahel, B.; Barbuzzi, D.; Tarulli, F.; Pellicani, L.; Renna, L.; Guarini, A.; De Tullio, G.; Nardelli, G.; et al. Improvements in haematology for home health assistance and monitoring by a web based communication system. In Proceedings of the IEEE International Symposium on Medical Measurements and Applications MeMeA, Benevento, Italy, 15–18 May 2016. 18. Massaro, A.; Maritati, V.; Savino, N.; Galiano, A.; Convertini, D.; De Fonte, E.; Di Muro, M. A Study of a health resources management platform integrating neural networks and DSS telemedicine for homecare assistance. Information 2018, 9, 176. [CrossRef] 19. Massaro, A.; Maritati, V.; Savino, N.; Galiano, A. Neural networks for automated smart health platforms oriented on heart predictive diagnostic big data systems. In Proceedings of the AEIT 2018 International Annual Conference, Bari, Italy, 3–5 October 2018. 20. Saadatnejad, S.; Oveisi, M.; Hashemi, M. LSTM-based ECG classiﬁcation for continuous monitoring on personal wearable devices. IEEE J. Biomed. Health Inform. 2019. [CrossRef] [PubMed] 21. Pham, T.; Tran, T.; Phung, D.; Venkatesh, S. Predicting healthcare trajectories from medical records: A deep learning approach. J. Biomed. Inform. 2017, 69, 218–229. [CrossRef] [PubMed] 22. Kaji, D.A.; Zech, J.R.; Kim, J.S.; Cho, S.K.; Dangayach, N.S.; Costa, A.B.; Oermann, E.K. An attention based deep learning model of clinical events in the intensive care unit. PLoS ONE 2019, 14, e0211057. [CrossRef] [PubMed] 23. Pima Indians Diabetes Database. Available online: https://gist.github.com/ktisha/ c21e73a1bd1700294ef790c56c8aec1f (accessed on 27 August 2019). 24. Predict the Onset of Diabetes Based on Diagnostic Measures. Available online: https://www.kaggle.com/ uciml/pima-indians-diabetes-database (accessed on 21 June 2019). 25. Wu, H.; Yang, S.; Huang, Z.; He, J.; Wang, X. Type 2 diabetes mellitus prediction model based on data mining. Inform. Med. Unlocked 2018, 10, 100–107. [CrossRef] 26. Luo, M.; Ke Wang, M.; Cai, Z.; Liu, A.; Li, Y.; Cheang, C.F. Using imbalanced triangle synthetic data for machine learning anomaly detection. Comput. Mater. Contin. 2019, 58, 15–26. [CrossRef] 27. Al Helal, M.; Chowdhury, A.I.; Islam, A.; Ahmed, E.; Mahmud, S.; Hossain, S. An optimization approach to improve classiﬁcation performance in cancer and diabetes prediction. In Proceedings of the International Conference on Electrical, Computer and Communication Engineering (ECCE), Cox’sBazar, Bangladesh, 7–9 February 2019. 28. Li, T.; Fong, S. A fast feature selection method based on coecient of variation for diabetics prediction using machine learning. Int. J. Extr. Autom. Connect. Health 2019, 1, 1–11. [CrossRef] 29. Puneet, M.; Singh, Y.A. Impact of preprocessing methods on healthcare predictions. In Proceedings of the 2nd International Conference on Advanced Computing and Software Engineering (ICACSE), Sultanpur, India, 8–9 February 2019. 30. Stranieri, A.; Yatsko, A.; Jelinek, H.F.; Venkatraman, S. Data-analytically derived ﬂexi, le HbA1c thresholds for type 2 diabetes mellitus diagnostic. Artif. Intell. Res. 2016, 5, 111–134. 31. Sudharsan, B.; Peeples, M.M.; Shomali, M.E. Hypoglycemia prediction using machine learning models for patients with type 2 diabetes. J. Diabetes Sci. Technol. 2015, 9, 86–90. [CrossRef] 32. Mhaskar, H.N.; Pereverzyev, S.V.; Van Der Walt, M.D. A deep learning approach to diabetic blood glucose prediction. Front. Appl. Math. Stat. 2017, 3, 1–14. [CrossRef] 33. Contreras, I.; Vehi, J. Artiﬁcial intelligence for diabetes management and decision support: Literature review. J. Med. Internet Res. 2018, 20, 1–24. [CrossRef] 34. Bosnyak, Z.; Zhou, F.L.; Jimenez, J.; Berria, R. Predictive modeling of hypoglycemia risk with basal insulin use in type 2 diabetes: Use of machine learning in the LIGHTNING study. Diabetes Ther. 2019, 10, 605–615. [CrossRef] [PubMed] Appl. Sci. 2019, 9, 3532 22 of 22 35. Massaro, A.; Meuli, G.; Galiano, A. Intelligent electrical multi outlets controlled and activated by a data mining engine oriented to building electrical management. Int. J. Soft Comput. Artif. Intell. Appl. 2018, 7, 1–20. [CrossRef] 36. Myers, J.L.; Well, A.D. Research Design and Statistical Analysis, 2nd ed.; Lawrence Erlbaum: Mahwah, NJ, USA, 2003. 37. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [CrossRef] [PubMed] 38. Mohapatra, S.K.; Mihir, J.K.S.; Mohanty, N. Detection of diabetes using multilayer perceptron. In International Conference on Intelligent Computingand Applications, Advances in Intelligent Systems and Computing; Bhaskar, M.A., Dash, S.S., Das, S., Panigrahi, B.K., Eds.; Springer: Singapore, 2019. 39. Swets, J.A. Measuring the accuracy of diagnostic systems. Science 1988, 240, 1285–1293. [CrossRef] [PubMed] 40. Diabetes Data Set. Available online: https://archive.ics.uci.edu/ml/datasets/Diabetes (accessed on 19 August 2019). 41. Chui, K.T.; Fung, D.C.L.; Lytras, M.D. Predicting at-risk University students in a virtual learning environment via a machine learning algorithm. Comput. Hum. Behav. 2018, in press. [CrossRef] © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Journal

Applied Sciences – Multidisciplinary Digital Publishing Institute

Published: Aug 28, 2019

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

LSTM DSS Automatism and Dataset Optimization for Diabetes Prediction

LSTM DSS Automatism and Dataset Optimization for Diabetes Prediction

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

LSTM DSS Automatism and Dataset Optimization for Diabetes Prediction

LSTM DSS Automatism and Dataset Optimization for Diabetes Prediction

References (40)

Abstract

Journal

Recommended Articles

There are no references for this article.

Our policy towards the use of cookies