Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Comparison of Different Machine Learning Techniques to Predict Diabetic Kidney Disease

Comparison of Different Machine Learning Techniques to Predict Diabetic Kidney Disease Hindawi Journal of Healthcare Engineering Volume 2022, Article ID 7378307, 9 pages https://doi.org/10.1155/2022/7378307 Research Article Comparison of Different Machine Learning Techniques to Predict Diabetic Kidney Disease Satish Kumar David , Mohamed Rafiullah , and Khalid Siddiqui Strategic Center for Diabetes Research, College of Medicine, King Saud University, Riyadh, Saudi Arabia Correspondence should be addressed to Satish Kumar David; satishkdavid@gmail.com Received 1 December 2021; Revised 10 March 2022; Accepted 21 March 2022; Published 1 April 2022 Academic Editor: M. Praveen Kumar Reddy Copyright © 2022 Satish Kumar David et al. &is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Background. Diabetic kidney disease (DKD), one of the complications of diabetes in patients, leads to progressive loss of kidney function. Timely intervention is known to improve outcomes. &erefore, screening patients to identify high-risk populations is important. Machine learning classification techniques can be applied to patient datasets to identify high-risk patients by building a predictive model. Objective. &is study aims to identify a suitable classification technique for predicting DKD by applying different classification techniques to a DKD dataset and comparing their performance using WEKA machine learning software. Methods. &e performance of nine different classification techniques was analyzed on a DKD dataset with 410 instances and 18 attributes. Data preprocessing was carried out using the PartitionMembershipFilter. A 10-fold cross validation was performed on the dataset. &e performance was assessed on the basis of the execution time, accuracy, correctly and incorrectly classified instances, kappa statistics (K), mean absolute error, root mean squared error, and true values of the confusion matrix. Results. With an accuracy of 93.6585% and a higher K value (0.8731), IBK and random tree classification techniques were found to be the best performing techniques. Moreover, they also exhibited the lowest root mean squared error rate (0.2496). &ere were 15 false-positive instances and 11 false-negative instances with these prediction models. Conclusions. &is study identified IBK and random tree classification techniques as the best performing classifiers and accurate prediction methods for DKD. require labeled training data. It analyzes the training ex- 1. Introduction ample data to deduce a pattern that can be applied to new Advancements in information technology have led to the example data. Classification, statistical regression, and as- creation of enormous volumes of data. Besides, the devel- sociation rules are commonly used supervised learning opments in healthcare database management systems have techniques in medical and clinical research [3]. Classifica- resulted in a vast number of medical databases. Managing tion methods are used to classify, detect, and analyze disease large volumes of heterogeneous data and creating useful datasets to build a prediction model [4]. knowledge from them has become an important field of Machine learning is an integral part of artificial intel- research known as data mining. It is a way of discovering ligence (AI) that allows the systems to perform a specific task innovative, valuable, valid, and reasonable patterns in data without using explicit programming. It works by creating [1]. &ere are two data mining techniques, namely, unsu- patterns and inferences by building a model based on a pervised and supervised learning techniques. Unsupervised training dataset. Machine learning involves developing learning techniques identify novel patterns with minimum computer programs that can use data to learn for themselves human supervision. It works with unlabeled data and looks [5]. Waikato environment for knowledge analysis (WEKA) for a hidden pattern in the data. It builds a model based on is a data mining software that contains algorithms for data the results obtained. A commonly used unsupervised analysis and predictive modeling. It consists of all the major technique is clustering [2]. Supervised learning techniques learning techniques for classification and regression, such as 2 Journal of Healthcare Engineering Bayesian classifiers, decision trees, rule sets, support vector analysis was used to extract the data. Dimensionality re- machines, logistic and multilayer perceptrons, linear re- duction was carried out using the firefly algorithm. &e gression, and nearest-neighbor methods. It also has “meta- accuracy of the deep neural network model was found to be learners” such as bagging, stacking, boosting, and schemes 97% and it outperformed other classification techniques that perform automatic parameter tuning using cross-vali- such as support vector machines, KNN, decision tree, NB, dation, cost-sensitive classification, etc. [6]. A comparison of and XGBoost-based models [15]. Chowdhury et al. analyzed the advantages and disadvantages of these classifiers in the data from the epidemiology of diabetes interventions and presented in Supplementary table 1. complications clinical trials to develop a prediction model Learning algorithms need to be validated as the dataset based on different machine learning algorithms. It included may not be truly representing the population. Cross-vali- 1375 patients with type 1 diabetes and 19 attributes. &e dation hold-out set or resubstitution are some of the vali- random forest model was found to be best (96%), followed dation techniques. &ere are standard quantitative by a light gradient-boosted machine (95%) [16]. XGBoost performance parameters such as accuracy and root mean and random forest algorithms were used to develop a model squared error available in WEKA software. It also provides to predict the 5-year risk of CKD. &e dataset included graphical performance indicators such as receiver operating 88,973 individuals. &e AUC was 0.75 for predicting any characteristic curves and precision-recall curves. &e visu- stage of CKD and 0.82 for severe endpoints. &e models alization tools available in WEKA allow the identification of outperformed the Centers for Disease Control and Pre- outliers [7]. vention (CDC) risk score [17]. Diabetic kidney disease (DKD) is one of the most &e currently available techniques use specific methods common complications of diabetes that causes increased for building the DKD prediction models. A comparative mortality and morbidity in patients [8]. It occurs in 20–40% analysis is needed to identify an accurate method for the of people with diabetes. DKD is the single largest cause of prediction of DKD. In this study, we aimed to identify an end-stage renal disease (ESRD) worldwide and has become accurate classification technique for predicting DKD by an enormous burden on healthcare systems [9]. Patients in comparing different classification techniques applied to a the early stage of diabetic nephropathy are characterized as DKD dataset using WEKA machine learning software. Here microalbuminuria (albumin-to-creatinine ratio (ACR) of we report the use of a machine learning technique to detect 30–299mg/g). In many patients, it usually progresses to patients with DKD using known cases of DKD as a training macroalbuminuria (albumin-to-creatinine ratio (ACR) of dataset. ≥300mg/g) followed by ESRD. However, screening patients early for diabetic nephropathy will help delay the onset of 3. Materials and Methods microalbuminuria and may prevent the progression of micro to macroalbuminuria and ESRD [10]. Standard Clinical and biochemical data of patients who had DKD methods to detect renal impairment involve specialized were gathered for this study. Figure 1 shows the risk factors blood and urine tests. However, data mining techniques can affecting diabetic kidney disease. be applied to the available datasets to establish a prediction &e data collected were transformed to data types ARFF model that can be used for detecting DKD cases. file. ARFF is an acronym that stands for attribute-relation AI technique was used to build a predictive model that file format. It is an extension of the CSV file format where a detected DKD aggravation with 71% accuracy [11]. Machine header is used. &is header provides metadata about the data learning methods were used to predict the initiation of renal types in the columns. &e data was saved with an extension replacement therapy in chronic kidney disease patients. of CSV from Microsoft Excel and then opened in WEKA Only the comorbidity data were used to build the prediction using the “ArffViewer” under the “Tools” option to save it model. &e area under the receiver operating characteristic with an ARFF extension. &is conversion has to be done in curve for predicting the initiation of renal replacement order for the data to be used in WEKA. A 10-fold cross- therapy within a year from CKD diagnosis was found to be validation was performed on the dataset, and then the data 0.773 [12]. An AI-based recursive rule extraction technique was analyzed using WEKA. Different machine learning was used to derive lower urinary albumin to creatinine ratio classification techniques were applied, and the outcomes cut-offs for the early detection of DKD. &is technique were compared (Figure 2). &e best performing technique identified two cutoff values with an accuracy of 77.56% [13]. was identified based on findings to predict DKD (Figure 3). Ravizza et al. developed a model from real-world data of people with type 2 diabetes for detecting chronic kidney disease. &e area under the receiver operating characteristic 3.1. Dataset. &e diabetic kidney disease dataset was gath- curve of the model was 0.7937 [14]. ered from our previous DKD cohort [18]. &ere are 410 instances and 18 attributes (14 numeric and 4 nominal) that were used in the analysis of the prediction of DKD. &e 2. Recent Works dataset attributes are age (years), gender (male/female), Early detection of diabetic retinopathy was developed using serum albumin (mg/dL), sodium (mmol/L), potassium deep learning techniques. &e dataset was preprocessed (mmol/L), urea (mg/dL), glucose (mg/dL), creatinine (mg/ before the classification. A standard scalar technique was dL), HbA1c (%) Hb (g/dL), white blood cell counts (WBCs) 9 12 used to normalize the date, and principal component (10 /L), red blood cell counts (RBCs) (10 /L) Hb (%), Journal of Healthcare Engineering 3 multilayer perceptron, AdaBoostM1, Hoeffding Tree, and IBK. &e 10-fold cross-validation is the standard method of Obesity evaluation for different machine learning techniques. &e Heart dataset was divided into ten equal subsets, with one subset Family problem used for testing and one for training. &is was continued history or until all the subsets had been used for testing. We applied the Stroke 10-fold cross-validation test for evaluating the performance of different classifiers, as shown in Figures 5–8. &e pre- dictions for each test instance are then listed in the “Clas- sifier Output” pane in WEKA. High WEKA machine learning software was used for learning Smoking blood different models, preprocessing, and feature selection pressure schemes to identify the best classification method by comparison. Age Diabetes 4. Results and Discussion Table 1 shows the comparative results from the10-fold cross- validation testing of different classifiers. Figure 1: Risk factors affecting diabetic kidney disease. Results show that IBK and multilayer perceptron are the fastest and slowest classifiers, respectively. &e accuracy of the classifiers is comparable to each other. However, the IBK platelets counts (10 /L) (M/µl), systolic BP sitting condition and random tree methods are the most accurate (93.6585%). (mmHg), diastolic BP sitting condition (mmHg), hyper- &e number of correctly classified instances in the IBK tension (yes/no), and retinopathy (yes/no). &e attribute method is the highest, followed by the random tree and nephropathy was classified into two classes as DKD and not random forest methods. In the case of incorrectly classified DKD. 410 patients with diabetes were classified according to instances, the IBK and random tree methods have the lowest their urinary albumin excretion creatinine ratio (ACR) using instances. AdaBoostM1 was found to be the lowest in ac- American Diabetes Association (ADA) criteria for diabetic curacy and correctly classified instances and has the highest nephropathy stage cutoff and eGFR values. incorrectly classified instances among all the classifiers. Both IBK and random tree techniques are found to be superior to 3.2. Preprocessing. Preprocessing is a data mining technique other classifiers in terms of execution time, accuracy, cor- that involves transforming raw data into an understandable rectly classified instances, and incorrectly classified format. WEKA now also has a PartitionMembershipFilter instances. that can apply any PartitionGenerator to a given dataset to Table 2 shows the results of Kappa statistics (K), mean obtain these vectors for all instances. For preprocessing, a absolute error (MAE), and root mean squared error (RMSE) partition membership filter is used. for the different classification methods. &ere are four interfaces to WEKA which can be started A Kappa statistics (K) value greater than 0 means the from the main GUI Chooser window. Figure 4 shows the classifier is doing better than the chance of agreement. IBK DKD dataset after loading in the explorer window of the and random tree have shown greater K values than the other WEKA tool. &e visualization section with blue and red code classifiers in this study. Mean absolute error (MAE) values indicates the data in the form of a graph. In WEKA, results indicate how close the prediction result is to the actual are partitioned into several subitems for easier analysis, values. &e results show that the random tree classifier has evaluation, and simulation. It begins with partitioning the lowest MAE. &erefore, the prediction result of the correctly and incorrectly classified instances in numeric and random tree classifier is very close to the true cases of DKD. percentage values, followed by the computation of Kappa Root mean squared error (RMSE) rates are used to identify statistics, mean absolute error, and root mean squared error the best classification technique when their MAE values are in numeric values. found to be similar. &e IBK classifier achieved the lowest RMSE rate when compared to other classifiers. With the 3.3. Classification. Classification is a data mining algorithm lower K value and higher MAE and RMSE rates, the pre- to find out the output of a new data instance. In this study, diction values of AdaBoostM1 are considered to be the least different classifiers were applied on the DKD dataset for significant. On the other hand, both the IBK and random comparing their accuracy, correctly classified instances, tree techniques are found to achieve better prediction re- incorrectly classified instances, error rate, and execution sults, and the other classifiers’ prediction results are average. time to evaluate overall performance and identify the best Table 3 shows the confusion matrix of the classification classifier for DKD prediction. &e nine different classifica- methods. tion techniques that were used in the study are as follows: &e confusion matrix table describes the performance of random forest, J48, Na¨ıve Bayes, REP tree, random tree, different classification models on the DKD test dataset for 4 Journal of Healthcare Engineering Dataset of DKD Data Preprocessing Data Feature Feature Data Data Trans Extraction Selection Sampling Splitting -form Feature Model Train Model Test Selection Classifiers Random Tree Random Forest IBK Multilayer Perceptron Naїve Bayes Prediction AdaBoostM1 REP Tree J48 Hoeffding Tree Prediction on Prediction Result Agregate each Determination Prediction classifier Results Evaluation DKD Performance of DKD Non DKD Figure 2: Block diagram of the proposed research. which the actual DKD cases are known. &e IBK classifier However, in another study, J48 was found to be suitable for correctly identified 93.0% of patients as not having DKD and screening DKD [20]. &e gradient boosting classifier was the 94.42% of patients as having DKD. &ere were 7.46% of accurate method in the detection of DKD with the least false-positive cases and 5.26% of false-negative cases. It has number of predictors [25]. C4.5 classifier efficiently pre- the best prediction performance among all the classifiers dicted chronic kidney disease from a high-dimensional investigated. Our results are comparable to the previously dataset [26]. A review found that many researchers have reported prediction models for DKD (Table 4). A maximum used KNN, ANN, Naıve Bays, SVM, and decision tree (J48, accuracy level was achieved when a recursive feature C4.5) for a prediction of chronic kidney disease from the elimination technique was used to choose the attributes [19]. given dataset. &e highly accurate classifier was SVM Many studies have reported different classifiers for the (98.5%), and the least accurate was the Bayes network prediction of DKD. A probabilistic neural network method (57.5%) [27]. was found to provide better classification and prediction &e AdaBoost classifier algorithm was found to be highly performance in determining the stages of DKD [23]. accurate (0.917) for the prediction of diabetic nephropathy BayesNet and REP tree algorithms showed accurate per- in a dataset of 884 patients and 70 attributes. When the formance in the prediction of chronic kidney disease [24]. attributes were decreased to the top 5 only, the performance Journal of Healthcare Engineering 5 Diabetes Kidney Data Disease Dataset from Pre-processing SCDR-Cohort Use WEKA Various Predict & Results Classification Choose the best Comparison Methods Figure 3: Schematic illustration of the methodology used for identifying the best performing classification technique. Figure 4: WEKA-Explorer window. Figure 5: Classifier IBK result. 6 Journal of Healthcare Engineering Figure 6: Classifier random tree result. Figure 7: Classifier random forest result. Figure 8: Classifier AdaBoostM1 result. Journal of Healthcare Engineering 7 Table 1: Comparison of different classifiers applied on the DKD dataset. Classifier Execution time (seconds) Accuracy (%) Correctly classified instances Incorrectly classified instances IBK 0 93.6585 384 26 Random tree 0.01 93.6585 384 26 Random forest 0.28 93.4146 383 27 Multilayer perceptron 8.3 93.1707 382 28 J48 0.13 89.7561 368 42 Hoeffding tree 0.04 86.0976 353 57 REP tree 0.08 85.122 349 61 Na¨ıve bayes 0.01 80.9756 332 78 AdaBoostM1 0.11 79.0244 324 86 Table 2: Classification results from WEKA. Classifier Kappa statistics (K) Mean absolute error (MAE) Root mean squared error (RMSE) IBK 0.8731 0.1096 0.2496 Random tree 0.8731 0.1093 0.2497 Random forest 0.8681 0.1267 0.2542 Multilayer perceptron 0.8633 0.1117 0.2513 J48 0.7947 0.1595 0.3074 Hoeffding tree 0.7223 0.1389 0.3696 REP tree 0.7025 0.2194 0.3565 Na¨ıve bayes 0.6199 0.1899 0.4261 AdaBoostM1 0.5827 0.3246 0.4009 Table 3: Confusion matrix of different classifiers. Prediction Classifiers Actual state (clinical definition) (197 DKD and 213 not DKD) DKD Not DKD 186 11 DKD IBK 15 198 NOT DKD 186 11 DKD Random tree 15 198 NOT DKD 184 13 DKD Random forest 14 199 NOT DKD 184 13 DKD Multilayer perceptron 15 198 NOT DKD 174 23 DKD J48 19 194 NOT DKD 36 177 DKD Hoeffding tree 81 116 NOT DKD 171 26 DKD REP tree 35 178 NOT DKD 165 32 DKD Na¨ıve bayes 46 167 NOT DKD 172 25 DKD AdaBoostM1 61 152 NOT DKD was not affected [28]. Our results show that IBK and random Random forest and simple logistic regression methods tree classifiers with a dataset of 410 patients and 18 attributes were shown to have better performance in the prediction of achieved an accuracy of 93.6585%. A systematic review on nephropathy in type 2 diabetes from the ACCORD trial dataset [30]. Pasadana et al. also found the random forest machine learning methods for prediction of diabetes complications found that random forest algorithm is the classifier to be the best technique for DKD prediction [31]. overall best prediction performing classifier [29]. We found Random forest regression was used to build a model with that the IBK algorithm is the best prediction performing data from real-world electronic medical records to predict classifier, in general, IBK means KNN algorithm is one of the future kidney functions accurately and provide clinical best classifiers. decision support [32]. In the present study, based on the 8 Journal of Healthcare Engineering Table 4: Comparison of recent works of predictive models for diabetic kidney disease or diabetic nephropathy. Accuracy Source Dataset Model Complication (%) Sobrinho et al., 2020 114 instances and 8 J48 decision tree DKD 95 [20] attributes 400 instances and 24 Recursive feature elimination to choose attributes followed by Senan et al., 2021 [19] DKD 100 attributes random forest classification Almansour et al., 400 instances and 24 Artificial neural network CKD 99.7 2019 [21] attributes Khanam and foo, 768 instances and 9 Neural network Diabetes 88.6 2021 [22] attributes 410 instances and 18 Our study IBK and random tree DKD 93.6585 attributes performance evaluation of classifiers on the DKD dataset, we Data Availability found that the IBK and random tree classifiers exhibited the &e data are available from the corresponding author on best performance compared to the other classifiers like J48, reasonable request. Na¨ıve Bayes, REP tree, AdaBoostM1, Hoeffding Tree, ran- dom forest, and multilayer perceptron. &e predictive models can be used in real-life situations Conflicts of Interest when extensive invasive tests are not possible. High-risk &e authors declare no conflicts of interest. patients may be identified using the available dataset. Our predictive model was developed using easily available rou- tine laboratory parameters. &erefore, screening patients to Authors’ Contributions identify those who are vulnerable for developing kidney disease is possible in primary clinics. It will help the clini- SKD and MR developed the concept, designed the study, conducted the analysis, reviewed the results, and prepared cians to decide on starting intensive preventive therapy for the high-risk patients. the manuscript; KS reviewed the dataset, results, and manuscript. 5. Conclusions Acknowledgments In this paper, we have applied different classification &is work was funded by the National Plan for Science, techniques to a DKD dataset for the prediction of DKD. Technology and Innovation (MAARIFAH), King Abdulaziz IBK and random tree classification techniques are identi- City for Science and Technology, Kingdom of Saudi Arabia, fied as the best performing classifiers and accurate pre- grant to the Strategic Center for Diabetes Research, College diction methods for DKD. &ese techniques may be used to of Medicine, King Saud University, Riyadh, Saudi Arabia. detect DKD patients with easily available basic lab pa- rameters. Using data mining techniques for predictive analytics, especially in the medical field, can save time and Supplementary Materials money. Our study compared nine different types of clas- Supplementary Table 1. Comparison of advantages and sification algorithms using the WEKA data mining tool to disadvantages of different classifiers used in our study. identify the best classifier that is suitable for the DKD (Supplementary Materials) dataset. &ese models will be useful in the early prediction of chronic kidney disease to take proactive interventions and reduce the mortality and morbidity associated with the References disease. &e prediction models may be developed further [1] U. Fayyad and P. Stolorz, “Data mining and KDD: Promise for predicting the progression of DKD in vulnerable and challenges,” Future Generation Computer Systems, vol.13, patients. no. 2, pp. 99–115, 1997. [2] L. Guerra, L. M. McGarry, V. Robles, C. Bielza, P. Larrañaga, and R. Yuste, “Comparison between supervised and unsu- Abbreviations pervised classifications of neuronal cell types: a case study,” Developmental Neurobiology, vol. 71, no. 1, pp. 71–82, 2011. DKD: Diabetic kidney disease [3] I. Yoo, P. Alafaireet, M. Marinov et al., “Data mining in K: Kappa statistics healthcare and biomedicine: a survey of the literature,” MAE: Mean absolute error Journal of Medical Systems, vol. 36, no. 4, pp. 2431–2448, 2012. RMSE: Root mean squared error [4] H. Polat, H. Danaei Mehr, and A. Cetin, “Diagnosis of chronic WEKA: Waikato environment for knowledge analysis kidney disease based on support vector machine by feature ESRD: End-stage renal disease selection methods,” Journal of Medical Systems, vol. 41, no. 4, AI: Artificial intelligence. p. 55, 2017. Journal of Healthcare Engineering 9 [5] O. Corporation, Machine Learning-Based Adaptive Intelli- [22] J. J. Khanam and S. Y. Foo, “A comparison of machine learning algorithms for diabetes prediction,” ICT Express, gence: 0e Future of Cybersecurity Executive Summary. Jan- uary, 2018. vol. 7, no. 4, pp. 432–439, 2021. [23] E.-H. A. Rady and A. S. Anwar, “Prediction of kidney disease [6] S. K. David, A. Saeb, and K. Al. Rubeaan, “Comparative analysis of data mining tools and classification techniques stages using data mining algorithms,” Informatics in Medicine Unlocked, vol. 15, p. 100178, 2019. using WEKA in medical Bioinformatics,” Computer Engi- [24] M. Sohail, H. M. Ahmed, M. Shabbir, and K. Noor, “Pre- neering and Intelligent, vol. 4, no. 13, pp. 28–39, 2013. dicting chronic kidney disease by using classification algo- [7] R. R. Bouckaert, E. Frank, M. Hall et al., WEKA Manual for rithms in,” WE!, vol. 11, no. 6, pp. 1047–1050, 2020. Version 3, pp. 7-8, 2013. [25] M. Almasoud and T. E. Ward, “Detection of chronic kidney [8] S.-Y. Lee and M. E. Choi, “Urinary biomarkers for early disease using machine learning algorithms with least number diabetic nephropathy: beyond albuminuria,” Pediatric Ne- of predictors,” International Journal of Advanced Computer phrology, vol. 30, no. 7, pp. 1063–1075, 2015. Science and Applications, vol. 10, no. 8, pp. 89–96, 2019. [9] W. G. Couser, G. Remuzzi, S. Mendis, and M. Tonelli, “&e [26] J. Sarada and N. V. M. Lakshmi, “Data analytics on chronic contribution of chronic kidney disease to the global burden of kidney disease data,” in Proceedings of the IADS International major noncommunicable diseases,” Kidney International, Conference on Computing, Communications & Data Engi- vol. 80, no. 12, pp. 1258–1270, 2011. neering (CCODE), 2018. [10] American Diabetes Association, “Standards of medical care in [27] S. Zeynu, A. Professor, and S. Patil, “Survey on prediction of diabetes,” Diabetes Care, vol. 28, no. 1, pp. s4–s36, 2005. chronic kidney disease using data mining classification [11] M. Makino, R. Yoshimoto, M. Ono et al., “Artificial intelli- techniques and feature selection,” Shruti Patil, vol. 118, no. 8, gence predicts the progression of diabetic kidney disease using pp. 149–156, 2018. big data machine learning,” Scientific Reports, vol. 9, no. 1, [28] Y. Jian, M. Pasquier, A. Sagahyroon, and F. Aloul, “A machine pp. 1–9, 2019. learning Approach to predicting diabetes complications,” [12] E. Dovgan, A. Gradiˇsek, M. Luˇstrek et al., “Using machine Healthcare, vol. 9, no. 12, 2021. learning models to predict the initiation of renal replacement [29] K. R. Tan, J. J. B. Seng, Y. H. Kwan et al., “Evaluation of therapy among chronic kidney disease patients,” PLoS One, machine learning methods developed for prediction of dia- vol. 15, no. 6, p. e0233976, 2020. betes complications: a systematic review,” Journal of Diabetes [13] Y. Hayashi, “Detection of lower albuminuria levels and early Science and Technology, p. 193229682110569, 2021. development of diabetic kidney disease using an artificial [30] V. Rodriguez-Romero, R. F. Bergstrom, B. S. Decker, G. Lahu, intelligence-based rule extraction Approach,” Diagnostics, M. Vakilynejad, and R. R. Bies, “Prediction of nephropathy in vol. 9, no. 4, 2019. type 2 diabetes: an analysis of the ACCORD trial applying [14] S. Ravizza, T. Huschto, A. Adamov et al., “Predicting the early machine learning techniques,” Clinical and Translational risk of chronic kidney disease in patients with diabetes using Science, vol. 12, no. 5, pp. 519–528, 2019. real-world data,” Nature Medicine, vol. 25, no. 1, pp. 57–59, [31] I. A. Pasadana, D. Hartama, M. Zarlis et al., “Chronic kidney disease prediction by using different decision tree tech- [15] T. R. Gadekallu, N. Khare, S. Bhattacharya et al., “Early de- niques,” Journal of Physics: Conference Series, vol. 1255, tection of diabetic retinopathy using pca-firefly based deep p. 12024, 2019. learning model,” Electronics, vol. 9, no. 2, pp. 1–16, 2020. [32] J. Zhao, S. Gu, and A. McDermaid, “Predicting outcomes of [16] N. H. Chowdhury, M. B. Reaz, F. Haque et al., “Performance chronic kidney disease from EMR data based on Random analysis of Conventional machine learning algorithms for Forest Regression,” Mathematical Biosciences, vol. 310, identification of chronic kidney disease in type 1 diabetes pp. 24–30, 2019. mellitus patients,” Diagnostics, vol. 11, no. 12, 2021. [17] A. Allen, Z. Iqbal, A. Green-Saxena et al., “Prediction of diabetic kidney disease with machine learning algorithms, upon the initial diagnosis of type 2 diabetes mellitus,” BMJ Open Diabetes Research & Amp; Care, vol. 10, no. 1, p. e002560, 2022. [18] K. Al-Rubeaan, K. Siddiqui, M. Alghonaim, A. M. Youssef, and D. AlNaqeb, “&e Saudi Diabetic Kidney Disease study (Saudi-DKD): clinical characteristics and biochemical pa- rameters,” Annals of Saudi Medicine, vol. 38, no. 1, pp. 46–56, [19] E. M. Senan, M. H. Al-Adhaileh, F. W. Alsaade et al., “Di- agnosis of chronic kidney disease using Effective classification algorithms and recursive feature Elimination techniques,” Journal of Healthcare Engineering, vol. 2021, p.1004767, 2021. [20] A. Sobrinho, A. C. M. D. S. Queiroz, L. D. Da Silva, E. D. B. Costa, M. E. Pinheiro, and A. Perkusich, “Computer- aided diagnosis of chronic kidney disease in developing Countries: a comparative analysis of machine learning techniques,” IEEE Access, vol. 8, pp. 25407–25419, 2020. [21] N. A. Almansour, H. F. Syed, N. R. Khayat et al., “Neural network and support vector machine for the prediction of chronic kidney disease: a comparative study,” Computers in Biology and Medicine, vol. 109, pp. 101–111, 2019. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Journal of Healthcare Engineering Hindawi Publishing Corporation

Comparison of Different Machine Learning Techniques to Predict Diabetic Kidney Disease

Loading next page...
 
/lp/hindawi-publishing-corporation/comparison-of-different-machine-learning-techniques-to-predict-keQex5pb2U
Publisher
Hindawi Publishing Corporation
Copyright
Copyright © 2022 Satish Kumar David et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
ISSN
2040-2295
eISSN
2040-2309
DOI
10.1155/2022/7378307
Publisher site
See Article on Publisher Site

Abstract

Hindawi Journal of Healthcare Engineering Volume 2022, Article ID 7378307, 9 pages https://doi.org/10.1155/2022/7378307 Research Article Comparison of Different Machine Learning Techniques to Predict Diabetic Kidney Disease Satish Kumar David , Mohamed Rafiullah , and Khalid Siddiqui Strategic Center for Diabetes Research, College of Medicine, King Saud University, Riyadh, Saudi Arabia Correspondence should be addressed to Satish Kumar David; satishkdavid@gmail.com Received 1 December 2021; Revised 10 March 2022; Accepted 21 March 2022; Published 1 April 2022 Academic Editor: M. Praveen Kumar Reddy Copyright © 2022 Satish Kumar David et al. &is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Background. Diabetic kidney disease (DKD), one of the complications of diabetes in patients, leads to progressive loss of kidney function. Timely intervention is known to improve outcomes. &erefore, screening patients to identify high-risk populations is important. Machine learning classification techniques can be applied to patient datasets to identify high-risk patients by building a predictive model. Objective. &is study aims to identify a suitable classification technique for predicting DKD by applying different classification techniques to a DKD dataset and comparing their performance using WEKA machine learning software. Methods. &e performance of nine different classification techniques was analyzed on a DKD dataset with 410 instances and 18 attributes. Data preprocessing was carried out using the PartitionMembershipFilter. A 10-fold cross validation was performed on the dataset. &e performance was assessed on the basis of the execution time, accuracy, correctly and incorrectly classified instances, kappa statistics (K), mean absolute error, root mean squared error, and true values of the confusion matrix. Results. With an accuracy of 93.6585% and a higher K value (0.8731), IBK and random tree classification techniques were found to be the best performing techniques. Moreover, they also exhibited the lowest root mean squared error rate (0.2496). &ere were 15 false-positive instances and 11 false-negative instances with these prediction models. Conclusions. &is study identified IBK and random tree classification techniques as the best performing classifiers and accurate prediction methods for DKD. require labeled training data. It analyzes the training ex- 1. Introduction ample data to deduce a pattern that can be applied to new Advancements in information technology have led to the example data. Classification, statistical regression, and as- creation of enormous volumes of data. Besides, the devel- sociation rules are commonly used supervised learning opments in healthcare database management systems have techniques in medical and clinical research [3]. Classifica- resulted in a vast number of medical databases. Managing tion methods are used to classify, detect, and analyze disease large volumes of heterogeneous data and creating useful datasets to build a prediction model [4]. knowledge from them has become an important field of Machine learning is an integral part of artificial intel- research known as data mining. It is a way of discovering ligence (AI) that allows the systems to perform a specific task innovative, valuable, valid, and reasonable patterns in data without using explicit programming. It works by creating [1]. &ere are two data mining techniques, namely, unsu- patterns and inferences by building a model based on a pervised and supervised learning techniques. Unsupervised training dataset. Machine learning involves developing learning techniques identify novel patterns with minimum computer programs that can use data to learn for themselves human supervision. It works with unlabeled data and looks [5]. Waikato environment for knowledge analysis (WEKA) for a hidden pattern in the data. It builds a model based on is a data mining software that contains algorithms for data the results obtained. A commonly used unsupervised analysis and predictive modeling. It consists of all the major technique is clustering [2]. Supervised learning techniques learning techniques for classification and regression, such as 2 Journal of Healthcare Engineering Bayesian classifiers, decision trees, rule sets, support vector analysis was used to extract the data. Dimensionality re- machines, logistic and multilayer perceptrons, linear re- duction was carried out using the firefly algorithm. &e gression, and nearest-neighbor methods. It also has “meta- accuracy of the deep neural network model was found to be learners” such as bagging, stacking, boosting, and schemes 97% and it outperformed other classification techniques that perform automatic parameter tuning using cross-vali- such as support vector machines, KNN, decision tree, NB, dation, cost-sensitive classification, etc. [6]. A comparison of and XGBoost-based models [15]. Chowdhury et al. analyzed the advantages and disadvantages of these classifiers in the data from the epidemiology of diabetes interventions and presented in Supplementary table 1. complications clinical trials to develop a prediction model Learning algorithms need to be validated as the dataset based on different machine learning algorithms. It included may not be truly representing the population. Cross-vali- 1375 patients with type 1 diabetes and 19 attributes. &e dation hold-out set or resubstitution are some of the vali- random forest model was found to be best (96%), followed dation techniques. &ere are standard quantitative by a light gradient-boosted machine (95%) [16]. XGBoost performance parameters such as accuracy and root mean and random forest algorithms were used to develop a model squared error available in WEKA software. It also provides to predict the 5-year risk of CKD. &e dataset included graphical performance indicators such as receiver operating 88,973 individuals. &e AUC was 0.75 for predicting any characteristic curves and precision-recall curves. &e visu- stage of CKD and 0.82 for severe endpoints. &e models alization tools available in WEKA allow the identification of outperformed the Centers for Disease Control and Pre- outliers [7]. vention (CDC) risk score [17]. Diabetic kidney disease (DKD) is one of the most &e currently available techniques use specific methods common complications of diabetes that causes increased for building the DKD prediction models. A comparative mortality and morbidity in patients [8]. It occurs in 20–40% analysis is needed to identify an accurate method for the of people with diabetes. DKD is the single largest cause of prediction of DKD. In this study, we aimed to identify an end-stage renal disease (ESRD) worldwide and has become accurate classification technique for predicting DKD by an enormous burden on healthcare systems [9]. Patients in comparing different classification techniques applied to a the early stage of diabetic nephropathy are characterized as DKD dataset using WEKA machine learning software. Here microalbuminuria (albumin-to-creatinine ratio (ACR) of we report the use of a machine learning technique to detect 30–299mg/g). In many patients, it usually progresses to patients with DKD using known cases of DKD as a training macroalbuminuria (albumin-to-creatinine ratio (ACR) of dataset. ≥300mg/g) followed by ESRD. However, screening patients early for diabetic nephropathy will help delay the onset of 3. Materials and Methods microalbuminuria and may prevent the progression of micro to macroalbuminuria and ESRD [10]. Standard Clinical and biochemical data of patients who had DKD methods to detect renal impairment involve specialized were gathered for this study. Figure 1 shows the risk factors blood and urine tests. However, data mining techniques can affecting diabetic kidney disease. be applied to the available datasets to establish a prediction &e data collected were transformed to data types ARFF model that can be used for detecting DKD cases. file. ARFF is an acronym that stands for attribute-relation AI technique was used to build a predictive model that file format. It is an extension of the CSV file format where a detected DKD aggravation with 71% accuracy [11]. Machine header is used. &is header provides metadata about the data learning methods were used to predict the initiation of renal types in the columns. &e data was saved with an extension replacement therapy in chronic kidney disease patients. of CSV from Microsoft Excel and then opened in WEKA Only the comorbidity data were used to build the prediction using the “ArffViewer” under the “Tools” option to save it model. &e area under the receiver operating characteristic with an ARFF extension. &is conversion has to be done in curve for predicting the initiation of renal replacement order for the data to be used in WEKA. A 10-fold cross- therapy within a year from CKD diagnosis was found to be validation was performed on the dataset, and then the data 0.773 [12]. An AI-based recursive rule extraction technique was analyzed using WEKA. Different machine learning was used to derive lower urinary albumin to creatinine ratio classification techniques were applied, and the outcomes cut-offs for the early detection of DKD. &is technique were compared (Figure 2). &e best performing technique identified two cutoff values with an accuracy of 77.56% [13]. was identified based on findings to predict DKD (Figure 3). Ravizza et al. developed a model from real-world data of people with type 2 diabetes for detecting chronic kidney disease. &e area under the receiver operating characteristic 3.1. Dataset. &e diabetic kidney disease dataset was gath- curve of the model was 0.7937 [14]. ered from our previous DKD cohort [18]. &ere are 410 instances and 18 attributes (14 numeric and 4 nominal) that were used in the analysis of the prediction of DKD. &e 2. Recent Works dataset attributes are age (years), gender (male/female), Early detection of diabetic retinopathy was developed using serum albumin (mg/dL), sodium (mmol/L), potassium deep learning techniques. &e dataset was preprocessed (mmol/L), urea (mg/dL), glucose (mg/dL), creatinine (mg/ before the classification. A standard scalar technique was dL), HbA1c (%) Hb (g/dL), white blood cell counts (WBCs) 9 12 used to normalize the date, and principal component (10 /L), red blood cell counts (RBCs) (10 /L) Hb (%), Journal of Healthcare Engineering 3 multilayer perceptron, AdaBoostM1, Hoeffding Tree, and IBK. &e 10-fold cross-validation is the standard method of Obesity evaluation for different machine learning techniques. &e Heart dataset was divided into ten equal subsets, with one subset Family problem used for testing and one for training. &is was continued history or until all the subsets had been used for testing. We applied the Stroke 10-fold cross-validation test for evaluating the performance of different classifiers, as shown in Figures 5–8. &e pre- dictions for each test instance are then listed in the “Clas- sifier Output” pane in WEKA. High WEKA machine learning software was used for learning Smoking blood different models, preprocessing, and feature selection pressure schemes to identify the best classification method by comparison. Age Diabetes 4. Results and Discussion Table 1 shows the comparative results from the10-fold cross- validation testing of different classifiers. Figure 1: Risk factors affecting diabetic kidney disease. Results show that IBK and multilayer perceptron are the fastest and slowest classifiers, respectively. &e accuracy of the classifiers is comparable to each other. However, the IBK platelets counts (10 /L) (M/µl), systolic BP sitting condition and random tree methods are the most accurate (93.6585%). (mmHg), diastolic BP sitting condition (mmHg), hyper- &e number of correctly classified instances in the IBK tension (yes/no), and retinopathy (yes/no). &e attribute method is the highest, followed by the random tree and nephropathy was classified into two classes as DKD and not random forest methods. In the case of incorrectly classified DKD. 410 patients with diabetes were classified according to instances, the IBK and random tree methods have the lowest their urinary albumin excretion creatinine ratio (ACR) using instances. AdaBoostM1 was found to be the lowest in ac- American Diabetes Association (ADA) criteria for diabetic curacy and correctly classified instances and has the highest nephropathy stage cutoff and eGFR values. incorrectly classified instances among all the classifiers. Both IBK and random tree techniques are found to be superior to 3.2. Preprocessing. Preprocessing is a data mining technique other classifiers in terms of execution time, accuracy, cor- that involves transforming raw data into an understandable rectly classified instances, and incorrectly classified format. WEKA now also has a PartitionMembershipFilter instances. that can apply any PartitionGenerator to a given dataset to Table 2 shows the results of Kappa statistics (K), mean obtain these vectors for all instances. For preprocessing, a absolute error (MAE), and root mean squared error (RMSE) partition membership filter is used. for the different classification methods. &ere are four interfaces to WEKA which can be started A Kappa statistics (K) value greater than 0 means the from the main GUI Chooser window. Figure 4 shows the classifier is doing better than the chance of agreement. IBK DKD dataset after loading in the explorer window of the and random tree have shown greater K values than the other WEKA tool. &e visualization section with blue and red code classifiers in this study. Mean absolute error (MAE) values indicates the data in the form of a graph. In WEKA, results indicate how close the prediction result is to the actual are partitioned into several subitems for easier analysis, values. &e results show that the random tree classifier has evaluation, and simulation. It begins with partitioning the lowest MAE. &erefore, the prediction result of the correctly and incorrectly classified instances in numeric and random tree classifier is very close to the true cases of DKD. percentage values, followed by the computation of Kappa Root mean squared error (RMSE) rates are used to identify statistics, mean absolute error, and root mean squared error the best classification technique when their MAE values are in numeric values. found to be similar. &e IBK classifier achieved the lowest RMSE rate when compared to other classifiers. With the 3.3. Classification. Classification is a data mining algorithm lower K value and higher MAE and RMSE rates, the pre- to find out the output of a new data instance. In this study, diction values of AdaBoostM1 are considered to be the least different classifiers were applied on the DKD dataset for significant. On the other hand, both the IBK and random comparing their accuracy, correctly classified instances, tree techniques are found to achieve better prediction re- incorrectly classified instances, error rate, and execution sults, and the other classifiers’ prediction results are average. time to evaluate overall performance and identify the best Table 3 shows the confusion matrix of the classification classifier for DKD prediction. &e nine different classifica- methods. tion techniques that were used in the study are as follows: &e confusion matrix table describes the performance of random forest, J48, Na¨ıve Bayes, REP tree, random tree, different classification models on the DKD test dataset for 4 Journal of Healthcare Engineering Dataset of DKD Data Preprocessing Data Feature Feature Data Data Trans Extraction Selection Sampling Splitting -form Feature Model Train Model Test Selection Classifiers Random Tree Random Forest IBK Multilayer Perceptron Naїve Bayes Prediction AdaBoostM1 REP Tree J48 Hoeffding Tree Prediction on Prediction Result Agregate each Determination Prediction classifier Results Evaluation DKD Performance of DKD Non DKD Figure 2: Block diagram of the proposed research. which the actual DKD cases are known. &e IBK classifier However, in another study, J48 was found to be suitable for correctly identified 93.0% of patients as not having DKD and screening DKD [20]. &e gradient boosting classifier was the 94.42% of patients as having DKD. &ere were 7.46% of accurate method in the detection of DKD with the least false-positive cases and 5.26% of false-negative cases. It has number of predictors [25]. C4.5 classifier efficiently pre- the best prediction performance among all the classifiers dicted chronic kidney disease from a high-dimensional investigated. Our results are comparable to the previously dataset [26]. A review found that many researchers have reported prediction models for DKD (Table 4). A maximum used KNN, ANN, Naıve Bays, SVM, and decision tree (J48, accuracy level was achieved when a recursive feature C4.5) for a prediction of chronic kidney disease from the elimination technique was used to choose the attributes [19]. given dataset. &e highly accurate classifier was SVM Many studies have reported different classifiers for the (98.5%), and the least accurate was the Bayes network prediction of DKD. A probabilistic neural network method (57.5%) [27]. was found to provide better classification and prediction &e AdaBoost classifier algorithm was found to be highly performance in determining the stages of DKD [23]. accurate (0.917) for the prediction of diabetic nephropathy BayesNet and REP tree algorithms showed accurate per- in a dataset of 884 patients and 70 attributes. When the formance in the prediction of chronic kidney disease [24]. attributes were decreased to the top 5 only, the performance Journal of Healthcare Engineering 5 Diabetes Kidney Data Disease Dataset from Pre-processing SCDR-Cohort Use WEKA Various Predict & Results Classification Choose the best Comparison Methods Figure 3: Schematic illustration of the methodology used for identifying the best performing classification technique. Figure 4: WEKA-Explorer window. Figure 5: Classifier IBK result. 6 Journal of Healthcare Engineering Figure 6: Classifier random tree result. Figure 7: Classifier random forest result. Figure 8: Classifier AdaBoostM1 result. Journal of Healthcare Engineering 7 Table 1: Comparison of different classifiers applied on the DKD dataset. Classifier Execution time (seconds) Accuracy (%) Correctly classified instances Incorrectly classified instances IBK 0 93.6585 384 26 Random tree 0.01 93.6585 384 26 Random forest 0.28 93.4146 383 27 Multilayer perceptron 8.3 93.1707 382 28 J48 0.13 89.7561 368 42 Hoeffding tree 0.04 86.0976 353 57 REP tree 0.08 85.122 349 61 Na¨ıve bayes 0.01 80.9756 332 78 AdaBoostM1 0.11 79.0244 324 86 Table 2: Classification results from WEKA. Classifier Kappa statistics (K) Mean absolute error (MAE) Root mean squared error (RMSE) IBK 0.8731 0.1096 0.2496 Random tree 0.8731 0.1093 0.2497 Random forest 0.8681 0.1267 0.2542 Multilayer perceptron 0.8633 0.1117 0.2513 J48 0.7947 0.1595 0.3074 Hoeffding tree 0.7223 0.1389 0.3696 REP tree 0.7025 0.2194 0.3565 Na¨ıve bayes 0.6199 0.1899 0.4261 AdaBoostM1 0.5827 0.3246 0.4009 Table 3: Confusion matrix of different classifiers. Prediction Classifiers Actual state (clinical definition) (197 DKD and 213 not DKD) DKD Not DKD 186 11 DKD IBK 15 198 NOT DKD 186 11 DKD Random tree 15 198 NOT DKD 184 13 DKD Random forest 14 199 NOT DKD 184 13 DKD Multilayer perceptron 15 198 NOT DKD 174 23 DKD J48 19 194 NOT DKD 36 177 DKD Hoeffding tree 81 116 NOT DKD 171 26 DKD REP tree 35 178 NOT DKD 165 32 DKD Na¨ıve bayes 46 167 NOT DKD 172 25 DKD AdaBoostM1 61 152 NOT DKD was not affected [28]. Our results show that IBK and random Random forest and simple logistic regression methods tree classifiers with a dataset of 410 patients and 18 attributes were shown to have better performance in the prediction of achieved an accuracy of 93.6585%. A systematic review on nephropathy in type 2 diabetes from the ACCORD trial dataset [30]. Pasadana et al. also found the random forest machine learning methods for prediction of diabetes complications found that random forest algorithm is the classifier to be the best technique for DKD prediction [31]. overall best prediction performing classifier [29]. We found Random forest regression was used to build a model with that the IBK algorithm is the best prediction performing data from real-world electronic medical records to predict classifier, in general, IBK means KNN algorithm is one of the future kidney functions accurately and provide clinical best classifiers. decision support [32]. In the present study, based on the 8 Journal of Healthcare Engineering Table 4: Comparison of recent works of predictive models for diabetic kidney disease or diabetic nephropathy. Accuracy Source Dataset Model Complication (%) Sobrinho et al., 2020 114 instances and 8 J48 decision tree DKD 95 [20] attributes 400 instances and 24 Recursive feature elimination to choose attributes followed by Senan et al., 2021 [19] DKD 100 attributes random forest classification Almansour et al., 400 instances and 24 Artificial neural network CKD 99.7 2019 [21] attributes Khanam and foo, 768 instances and 9 Neural network Diabetes 88.6 2021 [22] attributes 410 instances and 18 Our study IBK and random tree DKD 93.6585 attributes performance evaluation of classifiers on the DKD dataset, we Data Availability found that the IBK and random tree classifiers exhibited the &e data are available from the corresponding author on best performance compared to the other classifiers like J48, reasonable request. Na¨ıve Bayes, REP tree, AdaBoostM1, Hoeffding Tree, ran- dom forest, and multilayer perceptron. &e predictive models can be used in real-life situations Conflicts of Interest when extensive invasive tests are not possible. High-risk &e authors declare no conflicts of interest. patients may be identified using the available dataset. Our predictive model was developed using easily available rou- tine laboratory parameters. &erefore, screening patients to Authors’ Contributions identify those who are vulnerable for developing kidney disease is possible in primary clinics. It will help the clini- SKD and MR developed the concept, designed the study, conducted the analysis, reviewed the results, and prepared cians to decide on starting intensive preventive therapy for the high-risk patients. the manuscript; KS reviewed the dataset, results, and manuscript. 5. Conclusions Acknowledgments In this paper, we have applied different classification &is work was funded by the National Plan for Science, techniques to a DKD dataset for the prediction of DKD. Technology and Innovation (MAARIFAH), King Abdulaziz IBK and random tree classification techniques are identi- City for Science and Technology, Kingdom of Saudi Arabia, fied as the best performing classifiers and accurate pre- grant to the Strategic Center for Diabetes Research, College diction methods for DKD. &ese techniques may be used to of Medicine, King Saud University, Riyadh, Saudi Arabia. detect DKD patients with easily available basic lab pa- rameters. Using data mining techniques for predictive analytics, especially in the medical field, can save time and Supplementary Materials money. Our study compared nine different types of clas- Supplementary Table 1. Comparison of advantages and sification algorithms using the WEKA data mining tool to disadvantages of different classifiers used in our study. identify the best classifier that is suitable for the DKD (Supplementary Materials) dataset. &ese models will be useful in the early prediction of chronic kidney disease to take proactive interventions and reduce the mortality and morbidity associated with the References disease. &e prediction models may be developed further [1] U. Fayyad and P. Stolorz, “Data mining and KDD: Promise for predicting the progression of DKD in vulnerable and challenges,” Future Generation Computer Systems, vol.13, patients. no. 2, pp. 99–115, 1997. [2] L. Guerra, L. M. McGarry, V. Robles, C. Bielza, P. Larrañaga, and R. Yuste, “Comparison between supervised and unsu- Abbreviations pervised classifications of neuronal cell types: a case study,” Developmental Neurobiology, vol. 71, no. 1, pp. 71–82, 2011. DKD: Diabetic kidney disease [3] I. Yoo, P. Alafaireet, M. Marinov et al., “Data mining in K: Kappa statistics healthcare and biomedicine: a survey of the literature,” MAE: Mean absolute error Journal of Medical Systems, vol. 36, no. 4, pp. 2431–2448, 2012. RMSE: Root mean squared error [4] H. Polat, H. Danaei Mehr, and A. Cetin, “Diagnosis of chronic WEKA: Waikato environment for knowledge analysis kidney disease based on support vector machine by feature ESRD: End-stage renal disease selection methods,” Journal of Medical Systems, vol. 41, no. 4, AI: Artificial intelligence. p. 55, 2017. Journal of Healthcare Engineering 9 [5] O. Corporation, Machine Learning-Based Adaptive Intelli- [22] J. J. Khanam and S. Y. Foo, “A comparison of machine learning algorithms for diabetes prediction,” ICT Express, gence: 0e Future of Cybersecurity Executive Summary. Jan- uary, 2018. vol. 7, no. 4, pp. 432–439, 2021. [23] E.-H. A. Rady and A. S. Anwar, “Prediction of kidney disease [6] S. K. David, A. Saeb, and K. Al. Rubeaan, “Comparative analysis of data mining tools and classification techniques stages using data mining algorithms,” Informatics in Medicine Unlocked, vol. 15, p. 100178, 2019. using WEKA in medical Bioinformatics,” Computer Engi- [24] M. Sohail, H. M. Ahmed, M. Shabbir, and K. Noor, “Pre- neering and Intelligent, vol. 4, no. 13, pp. 28–39, 2013. dicting chronic kidney disease by using classification algo- [7] R. R. Bouckaert, E. Frank, M. Hall et al., WEKA Manual for rithms in,” WE!, vol. 11, no. 6, pp. 1047–1050, 2020. Version 3, pp. 7-8, 2013. [25] M. Almasoud and T. E. Ward, “Detection of chronic kidney [8] S.-Y. Lee and M. E. Choi, “Urinary biomarkers for early disease using machine learning algorithms with least number diabetic nephropathy: beyond albuminuria,” Pediatric Ne- of predictors,” International Journal of Advanced Computer phrology, vol. 30, no. 7, pp. 1063–1075, 2015. Science and Applications, vol. 10, no. 8, pp. 89–96, 2019. [9] W. G. Couser, G. Remuzzi, S. Mendis, and M. Tonelli, “&e [26] J. Sarada and N. V. M. Lakshmi, “Data analytics on chronic contribution of chronic kidney disease to the global burden of kidney disease data,” in Proceedings of the IADS International major noncommunicable diseases,” Kidney International, Conference on Computing, Communications & Data Engi- vol. 80, no. 12, pp. 1258–1270, 2011. neering (CCODE), 2018. [10] American Diabetes Association, “Standards of medical care in [27] S. Zeynu, A. Professor, and S. Patil, “Survey on prediction of diabetes,” Diabetes Care, vol. 28, no. 1, pp. s4–s36, 2005. chronic kidney disease using data mining classification [11] M. Makino, R. Yoshimoto, M. Ono et al., “Artificial intelli- techniques and feature selection,” Shruti Patil, vol. 118, no. 8, gence predicts the progression of diabetic kidney disease using pp. 149–156, 2018. big data machine learning,” Scientific Reports, vol. 9, no. 1, [28] Y. Jian, M. Pasquier, A. Sagahyroon, and F. Aloul, “A machine pp. 1–9, 2019. learning Approach to predicting diabetes complications,” [12] E. Dovgan, A. Gradiˇsek, M. Luˇstrek et al., “Using machine Healthcare, vol. 9, no. 12, 2021. learning models to predict the initiation of renal replacement [29] K. R. Tan, J. J. B. Seng, Y. H. Kwan et al., “Evaluation of therapy among chronic kidney disease patients,” PLoS One, machine learning methods developed for prediction of dia- vol. 15, no. 6, p. e0233976, 2020. betes complications: a systematic review,” Journal of Diabetes [13] Y. Hayashi, “Detection of lower albuminuria levels and early Science and Technology, p. 193229682110569, 2021. development of diabetic kidney disease using an artificial [30] V. Rodriguez-Romero, R. F. Bergstrom, B. S. Decker, G. Lahu, intelligence-based rule extraction Approach,” Diagnostics, M. Vakilynejad, and R. R. Bies, “Prediction of nephropathy in vol. 9, no. 4, 2019. type 2 diabetes: an analysis of the ACCORD trial applying [14] S. Ravizza, T. Huschto, A. Adamov et al., “Predicting the early machine learning techniques,” Clinical and Translational risk of chronic kidney disease in patients with diabetes using Science, vol. 12, no. 5, pp. 519–528, 2019. real-world data,” Nature Medicine, vol. 25, no. 1, pp. 57–59, [31] I. A. Pasadana, D. Hartama, M. Zarlis et al., “Chronic kidney disease prediction by using different decision tree tech- [15] T. R. Gadekallu, N. Khare, S. Bhattacharya et al., “Early de- niques,” Journal of Physics: Conference Series, vol. 1255, tection of diabetic retinopathy using pca-firefly based deep p. 12024, 2019. learning model,” Electronics, vol. 9, no. 2, pp. 1–16, 2020. [32] J. Zhao, S. Gu, and A. McDermaid, “Predicting outcomes of [16] N. H. Chowdhury, M. B. Reaz, F. Haque et al., “Performance chronic kidney disease from EMR data based on Random analysis of Conventional machine learning algorithms for Forest Regression,” Mathematical Biosciences, vol. 310, identification of chronic kidney disease in type 1 diabetes pp. 24–30, 2019. mellitus patients,” Diagnostics, vol. 11, no. 12, 2021. [17] A. Allen, Z. Iqbal, A. Green-Saxena et al., “Prediction of diabetic kidney disease with machine learning algorithms, upon the initial diagnosis of type 2 diabetes mellitus,” BMJ Open Diabetes Research & Amp; Care, vol. 10, no. 1, p. e002560, 2022. [18] K. Al-Rubeaan, K. Siddiqui, M. Alghonaim, A. M. Youssef, and D. AlNaqeb, “&e Saudi Diabetic Kidney Disease study (Saudi-DKD): clinical characteristics and biochemical pa- rameters,” Annals of Saudi Medicine, vol. 38, no. 1, pp. 46–56, [19] E. M. Senan, M. H. Al-Adhaileh, F. W. Alsaade et al., “Di- agnosis of chronic kidney disease using Effective classification algorithms and recursive feature Elimination techniques,” Journal of Healthcare Engineering, vol. 2021, p.1004767, 2021. [20] A. Sobrinho, A. C. M. D. S. Queiroz, L. D. Da Silva, E. D. B. Costa, M. E. Pinheiro, and A. Perkusich, “Computer- aided diagnosis of chronic kidney disease in developing Countries: a comparative analysis of machine learning techniques,” IEEE Access, vol. 8, pp. 25407–25419, 2020. [21] N. A. Almansour, H. F. Syed, N. R. Khayat et al., “Neural network and support vector machine for the prediction of chronic kidney disease: a comparative study,” Computers in Biology and Medicine, vol. 109, pp. 101–111, 2019.

Journal

Journal of Healthcare EngineeringHindawi Publishing Corporation

Published: Apr 1, 2022

References