Machine Learning Models to Predict In-Hospital Mortality among Inpatients with COVID-19: Underestimation and Overestimation Bias Analysis in Subgroup Populations

Javad Zarei; Amir Jamshidnezhad; Maryam Haddadzadeh Shoushtari; Ali Mohammad Hadianfard; Maria Cheraghi; Abbas Sheikhtaheri

doi:10.1155/2022/1644910

Machine Learning Models to Predict In-Hospital Mortality among Inpatients with COVID-19: Underestimation and Overestimation Bias Analysis in Subgroup Populations

Zarei, Javad;Jamshidnezhad, Amir;Haddadzadeh Shoushtari, Maryam;Mohammad Hadianfard, Ali;Cheraghi, Maria;Sheikhtaheri, Abbas 2022-06-23 00:00:00 Hindawi Journal of Healthcare Engineering Volume 2022, Article ID 1644910, 13 pages https://doi.org/10.1155/2022/1644910 Research Article Machine Learning Models to Predict In-Hospital Mortality among Inpatients with COVID-19: Underestimation and Overestimation Bias Analysis in Subgroup Populations 1 1 2 Javad Zarei , Amir Jamshidnezhad , Maryam Haddadzadeh Shoushtari , 1 3 4 Ali Mohammad Hadianfard , Maria Cheraghi , and Abbas Sheikhtaheri Department of Health Information Technology, School of Allied Medical Sciences, Ahvaz Jundishapur University of Medical Sciences, Ahvaz, Iran Air Pollution and Respiratory Diseases Research Center, Ahvaz Jundishapur University of Medical Sciences, Ahvaz, Iran Social Determinant of Health Research Center, Department of Public Health, School of Health, Ahvaz Jundishapur University of Medical Sciences, Ahvaz, Iran Department of Health Information Management, School of Health Management and Information Sciences, Iran University of Medical Sciences, Tehran, Iran Correspondence should be addressed to Abbas Sheikhtaheri; sheikhtaheri.a@iums.ac.ir Received 1 February 2022; Revised 17 April 2022; Accepted 22 May 2022; Published 23 June 2022 Academic Editor: Cosimo Ieracitano Copyright © 2022 Javad Zarei et al. �is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Prediction of the death among COVID-19 patients can help healthcare providers manage the patients better. We aimed to develop machine learning models to predict in-hospital death among these patients. We developed di‰erent models using di‰erent feature sets and datasets developed using the data balancing method. We used demographic and clinical data from a multicenter COVID- 19 registry. We extracted 10,657 records for con‘rmed patients with PCR or CTscans, who were hospitalized at least for 24 hours at the end of March 2021. �e death rate was 16.06%. Generally, models with 60 and 40 features performed better. Among the 240 models, the C5 models with 60 and 40 features performed well. �e C5 model with 60 features outperformed the rest based on all evaluation metrics; however, in external validation, C5 with 32 features performed better. �is model had high accuracy (91.18%), F-score (0.916), Area under the Curve (0.96), sensitivity (94.2%), and speci‘city (88%). �e model suggested in this study uses simple and available data and can be applied to predict death among COVID-19 patients. Furthermore, we concluded that machine learning models may perform di‰erently in di‰erent subpopulations in terms of gender and age groups. Since the beginning of the COVID-19 pandemic, one of 1. Introduction the most critical challenges for the healthcare systems has In spite of more than 2 years since the COVID-19 pandemic been to increase the number of patients with severe and performing vaccination in many countries, the disease’s symptoms and the growing demand for hospitalization. In prevalence and mortality have not slowed down, and many developing countries, which do not have su¥cient health- countries are still experiencing high peaks [1]. In addition, care infrastructure, the increase in inpatients has put a lot of multiple mutations in the virus have become a new challenge burden on the healthcare system. Moreover, numerous to control the disease, leading to the spread of the disease studies have reported various risk factors such as old age, and increased mortality [2–4]. Until April 16, 2022, more male gender, and underlying medical conditions (such as than 500 million cases of the disease and more than 6 million hypertension, cardiovascular disease, diabetes, COPD, deaths due to COVID-19 have been reported globally, with cancer, and obesity) for the deterioration of COVID-19 more than 7 million cases and 140,000 deaths in Iran [1]. patients [5–9]. 2 Journal of Healthcare Engineering (e use of modern and noninvasive methods to triage inpatients in Khuzestan province, Iran. (is registry collects patientsintospeciﬁcandknowncategoriesattheearlystages demographic data, signs and symptoms, patient outcomes, of the disease is beneﬁcial [10]. One of these approaches is PCR and CT results, and comorbidities from 38 hospitals. the use of predictive models based on machine learning (e details of data collection and data quality control were [11, 12]. For example, developing predictive models based published elsewhere [30]. on mortality risk factors can positively prevent mortality We included onlypatients with aconﬁrmed diagnosis of through controlling acute conditions and planning in in- COVID-19 based on PCR test or CT scan results for this tensive care units [13]. Furthermore, machine learning can modeling study. Furthermore, we included only patients classify patients based on the deteriorating risk and predict who were hospitalized for more than 24 hours. Because the likelihood of death to manage resources optimally outpatients and hospitalized patients with a short stay (less [14, 15]. than 24 hours) had a lot of missing data, we excluded these To date, several studies have been published on the cases from theﬁnal analysis. We also included patients from application of machine learning to develop diagnostic all age groups. Finally, we extracted data for 10,657 patients. models or predict the death of patients due to COVID-19 (efrequencyofnonsurvivingpatients(untildischarge)was [14–23]. For example, several deep learning models have 1711(16.06%);8946patients(83.94%)weredischargedalive. been reported to diagnose COVID-19 based on images [24]. Figure 1 shows the steps of this study. In a study, researchers developed an enhanced fuzzy-based deeplearningmodeltodiﬀerentiatebetweenCOVID-19and 2.2. Data Preprocessing infectious pneumonia (no-COVID-19) based on portable CXRs and achieved up to 81% accuracy. (eir fuzzy model 2.2.1. Imputing Missing Variables. Because of the data had only three misclassiﬁcations on the validation dataset qualitycontrolsintheregistry,thedatabasehadalowrateof [24]. missing data. (e 28 variables had a missing rate below 4% As for death prediction, several studies have also been (Supplement 1, Table S1). In machine learning, data im- published [16, 25–28]. (e results obtained from the studies putation is a standard approach to improve the models’ on machine learning-based predictive methods indicated performance. Diﬀerent methods such as imputation with that those methods had reliable predictability and could mean, median, or mode are common. We imputed the identify the correlation between intervening variables in missing values with the mean for age and the highest fre- complex and ambiguous conditions caused by COVID-19. quency of values for nonnumerical variables as well [11, 33]. (erefore, they can be used to predict such situations in the future. Although those techniques have been tested on some regional datasets of the risk factors, the performance of the 2.2.2. Features and Feature Selection. (e outcome measure models can be improved when they apply to diﬀerent of the study is in-hospital mortality until discharge which is datasets related to other countries such as Iran, where the collected as binary (yes/no). (e dataset contains 60 input prevalence of the COVID-19 and related deaths is high. variables. Age and the number of comorbidities are nu- Iran is one of the ﬁrst countries to face a widespread merical; oxygen saturation level (PO2) includes two values outbreak of the disease and has experienced more than four including below and above 93%. We created three dummy major epidemic waves with the highest mortality rates variables for the diagnosis method (only positive PCR, only [29,30].Asaresult,duetothehighprevalenceandmortality abnormal CT, positive PCR, and abnormal CT). Other rate of COVID-19 in Iran and the limitation of healthcare variables have two values: yes or no. resources[31,32],itisvitaltohaveapredictionmodelbased For feature selection, we applied univariate analysis on Iranian conditions and local data. (erefore, this study using Chi-square or Fisher exact tests for nonnumerical aimed to ﬁt a model for predicting the death caused by variables and Mann-Whitney U test for age and number of COVID-19 based on machine learning algorithms. Many comorbidities (due to abnormal distribution). We created previous models are based on laboratory, imaging, or diﬀerentfeaturesetstobuildthepredictionmodels.(eﬁrst treatment data [16, 25–28]; however, we suggested models set included all the 60 variables. (e second set consisted of based on available demographic data, symptoms, and variables that were signiﬁcant in univariate analysis (P value comorbidities that can be easily collected. We also con- <0.05). (e third feature set included the marginal variables ducted a bias analysis of machine learning models based on based on univariate analysis (P value <0.2). To create the subgroups of patient populations to show the bias of these fourth feature set, we used the feature selection node in the models. IBM SPSS modeler. (is node identiﬁes important features based on univariate analysis as well as the frequency of 2. Materials and Methods missing values and the percentage of records with the same value. Table 1 shows the variables in each of these feature 2.1. Population and Data. We extracted data from the sets. Khuzestan COVID-19 registry system belonging to Ahvaz Jundishapure University of Medical Sciences (AJUMS). From the beginning of the pandemic, this registry collects 2.2.3. Data Balancing. Weﬁrstdevelopedourmodelswitha variety of machine learning algorithms on the original data from suspected (based on clinical signs) and conﬁrmed (based on the results of PCR or CT scan) outpatients and dataset(dataset1).Wefoundtheinappropriateperformance Journal of Healthcare Engineering 3 Data extraction form the registry Exclusion of outpatients, suspected patients and patients with less than 24 h hospitalization Missing value imputation Developing test (30%) and train Test dataset (70%) datasets Train dataset Developing 4 different feature sets feature set1 (17) feature set2 (32) feature set3 (40) feature set4 (60) Data balancing Original dataset1 (16.06% death) Dataset 2 (36.5% death) Dataset 3 (49.98% death) Model development and evaluation Model evaluation and comparison (Test data) Figure 1: Overview of the study steps. of these models, in terms of the sensitivity, because of the sets, and developed our models using common machine small number of samples in the death class (83.94% sur- learningalgorithmsthatareusuallyreportedtoperformwell viving vs. 16.06% nonsurviving, ratio �5.23), so the models in medicine including Multiple Layer Perceptron (MLP) did not perform well to predict death. (ere are various neural networks [11, 12, 34], Chi-Squared Detection of methods such as oversampling the minor class or under- Automatic Interaction (CHAID), C5, and Random Forest sampling the major class to solve this problem [11, 12]. We (RF) decision trees [11, 12, 33, 34], Support Vector Machine (SVM) with Radial Basic Function (RBF) kernel [12, 35, 36], oversampled the death cases to create more balanced datasets. Datasets 2 and 3 included 5,133 (36.5%, and Bayesian network [12, 37–39]. ratio �1.74) and 8,938 (49.98%, ratio �1) nonsurviving We ﬁrst developed models based on the default settings patients,respectively.Wedevelopedourmodelswithallfour of parameters. We developed CHAID decision trees with a feature sets on these three datasets. maximumdepthofﬁveandaminimumrecordoftwointhe nodes. Moreover, we implemented the C5 tree with a minimumoftworecordsinnodes.RFwasalsoimplemented 2.3. Model Development and Evaluation. We randomly di- withamaximumdepthof10,andaminimumofﬁverecords videdthedataintotwosets,training(70%)andtesting(30%) in nodes using 100 models. (e SVM model was 4 Journal of Healthcare Engineering Table 1: Diﬀerent feature sets. Feature Number of Method Features set features Age, contact with COVID-19 patients, cough, diabetes, diagnosis only by abnormal CT, diagnosis only by positive PCR, diagnosis by positive PCR and Feature selection node 1 17 abnormal CT, gender, heart diseases, HTN, and ICU. Admission, intubation, (default setting) muscle ache, number of comorbidity, oxygen therapy blood oxygen saturation level, and respiratory distress. Age,cancer,chronickidneydisease,chronicliverdisease,contact(withaprobable or conﬁrmed case in the 14 days before the onset of symptoms), convulsion, cough, diabetes, diagnosis only by abnormal CT, diagnosis only by positive PCR, Univariate analysis diagnosis by positive PCR and abnormal CT, dialysis, diarrhea, dizziness, drug 2 32 (P value <0.05) abuse, gender, headache, heart diseases, HIV/AIDS, HTN, and ICU. Admission, immune diseases, intubation, nervous system diseases, number of comorbidities, other chronic lung diseases, oxygen therapy, paralysis, blood oxygen saturation level, pregnancy, respiratory distress, and unconsciousness. (efeatureset2+asthma,chronichematologydiseases,mentaldisorders,muscle Univariate analysis 3 40 ache, other diseases (comorbidities), drowsiness, gustatory dysfunction, and (P value <0.2) weakness. (e feature set 3+abdominal pain, autoimmune disease, chest pain, chills, constipation, ocular manifestations, fever, GI bleeding, hemoptysis, nausea, 4 All features 60 anorexia, other GI signs, paresis, runny nose, skin manifestations, sore throat, olfactory dysfunction, smoking, sweating, and vomiting. implemented with a regularization parameter of 10 and a the best performing models in external evaluation and the gamma of 0.1. We additionally developed MLPs using the external dataset. diﬀerent number of neurons (5, 10, 15, and 20) in one and two hidden layers and also with the number of neurons 2.6. Analysis. We applied IBM SPSS statistical software suggested by the software. We also implemented the best version 23 for statistical analysis and IBM SPSS modeler CHAID, C5, and MLP with boosting ensemble method and version18todevelopandevaluatemachinelearningmodels. 10-fold cross-validation. Furthermore, we implemented We evaluated and compared the models using confusion stack models (combining individual models) [40]. Our matrix, accuracy, precision, sensitivity, speciﬁcity, F-score, analysis showed that models developed on dataset 3 had and Area under the Curve (AUC). To select the best per- generally better performance. (erefore, we developed stack forming models, we compared the models obtained from models, based on the best individual models, on this dataset each dataset-feature with each other based on AUC and with diﬀerent feature sets. F-score. 2.4. External Validation. For external validation, we 2.7. Ethical Considerations. (is study received ethical ap- extracted 1734 records from the Khuzestan COVID-19 provals from the Ethics Research Committee of Ahvaz registry system. (ese data are from four diﬀerent hospitals Jundishapur University of Medical Sciences (IR.AJUMS. in diﬀerent timeframes. (erefore, these data were not used REC.1400.325). intrainingortestingthemodels.(isdatasetcontained1425 surviving and 309 nonsurviving patients. Inclusion and 3. Results exclusioncriteriaweresimilartothetraining/testingdataset, described in Section 2.1. (e best performing models se- 3.1. Descriptive Data. We extracted data for 10,657 patients lectedfromthepreviousstepandalsoensemblemodelswere from the Khuzestan COVID-19 registry [30]. (e frequency validated using this dataset. of nonsurviving patients (until discharge) was 1711 (16.06%); 8946 patients (83.94%) were discharged alive. 2.5.SubpopulationBiasAnalysis. Previous studies show that Table 2 shows that the death due to COVID-19 was sig- predictive models may have diﬀerent performances against niﬁcantly higher among men, older patients, and those who diﬀerent subpopulations,for example,indiﬀerent sexor age have been in contact with infected individuals. In addition, groups [41, 42]. To assess this eﬀect, we adopted the method respiratory distress, convulsion, altered consciousness, and suggested by Seyyed-Kalantari et al. (ey suggested the use paralysis were more common among the nonsurviving of false-positive rate (FPR) and false-negative rate (FNR) in patients. Conversely, cough, headache, diarrhea, and diz- subpopulations to assess the underdiagnosis and over- ziness were less prevalent among them. Furthermore, ox- diagnosis of machine learning models [41]. We similarly ygen saturation status was better among the recovered calculated FNR and FPR to assess the underprediction or patients versus the dead. Moreover, the comorbidities and overprediction of death in our models. To this end, we used risk factors (excluding pregnancy) as well as the intubation, Journal of Healthcare Engineering 5 Table 2: Comparison of surviving and nonsurviving patients. Variables Alive (n �8946) Dead (n �1711) Total patients (n �10657) P value Age Mean (±SD), years 54±18.3 65.7±16.2 55.88±18.46 <0.0001 Median (Q1, Q3) 56 (42, 67) 67 (57, 77) 58 (43, 69) Sex, male 4611 (51.5) 1010 (59) 5621 (52.7) <0.0001 Contact with infected people (yes) 3169 (35.4) 706 (41.3) 3875 (36.4) <0.0001 Sign and symptoms Cough (yes) 5296 (59.2) 899 (52.5) 6195 (58.1) <0.0001 Respiratory distress (yes) 5021 (56.1) 1288 (75.3) 6309 (59.2) <0.0001 Fever (yes) 4225 (47.2) 802 (46.9) 5027 (47.2) 0.788 Muscle aches (yes) 2417 (27) 426 (24.9) 2843 (26.7) 0.069 Chills (yes) 70 (0.8) 9 (0.5) 79 (0.7) 0.257 Vomiting (yes) 452 (5.1) 79 (4.9) 531 (5) 0.448 Headache (yes) 480 (5.4) 51 (3) 531 (5) <0.0001 Chest pain (yes) 304 (3.4) 61 (3.6) 365 (3.4) 0.728 Diarrhea (yes) 315 (3.5) 40 (2.3) 355 (3.3) 0.012 Sore throat (yes) 48 (0.2) 4 (0.2) 52 (0.5) 0.100 Gustatory dysfunction (yes) 98 (1.1) 10 (0.6) 108 (1) 0.053 Olfactory dysfunction (yes) 123 (1.4) 19 (1.1) 142 (1.3) 0.382 Abdominal pain (yes) 203 (2.3) 31 (1.8) 234 (2.2) 0.237 Runny nose (yes) 8 (0.1) 0 (0.0) 8 (0.1) 0.216 Convulsion (yes) 42 (0.5) 19 (1.1) 61 (0.6) 0.001 Altered consciousness (yes) 213 (2.4) 419 (24.5) 633 (5.9) <0.0001 GI bleeding (yes) 5 (0.1) 0 (0.0) 5 (0.0) 0.417 Skin lesion/rush (yes) 11 (0.1) 3 (0.2) 14 (0.1) 0.584 Dizziness (yes) 249 (2.8) 30 (1.8) 279 (2.6) 0.014 Paresis (yes) 54 (0.6) 11 (0.6) 65 (0.6) 0.848 Paralysis (yes) 22 (0.2) 13 (0.8) 35 (0.3) 0.001 Weakness (yes) 350 (3.9) 80 (4.7) 430 (4) 0.142 Sweating (yes) 11 (0.1) 2 (0.1) 13 (0.1) 0.947 Ocular manifestations (yes) 3 (0.0) 0 (0.0) 3 (0.0) 0.449 Hemoptysis (yes) 6 (0.1) 2 (0.1) 8 (0.1) 0.491 Drowsiness (yes) 3 (0.0) 2 (0.1) 5 (0.0) 0.185 Constipation (yes) 7 (0.1) 1 (0.1) 8 (0.1) 0.784 Nausea (yes) 478 (5.3) 89 (5.2) 567 (5.3) 0.811 Anorexia (yes) 724 (8.1) 138 (8.1) 862 (8.1) 0.969 Other GI symptoms (yes) 7 (0.1) 0 (0.0) 7 (0.1) 0.247 Blood oxygen saturation level (i) Less than 93 2046 (22.9) 934 (54.6) 2980 (28) <0.0001 (ii) More than 93 6900 (77.1) 777 (45.4) 7677 (72) Comorbidity Any comorbidity (yes) 3314 (37) 826 (48.3) 4140 (38.8) <0.0001 Number of comorbidities <0.0001 0 5632 (63) 885 (51.7) 6517 (61.2) 1 1868 (2.9) 391 (22.9) 2259 (21.2) 2 946 (10.6) 275 (16.1) 1221 (11.5) 3 396 (4.4) 112 (6.5) 508 (4.8) >3 104 (1.1) 48 (2.8) 152 (1.5) Number of comorbidities (mean±SD) 0.6±0.9 0.87±1.1 0.65±0.97 <0.0001 Hypertension (yes) 1291 (14.4) 356 (20.8) 1647 (5.5) <0.0001 Heart diseases (yes) 1102 (12.3) 294 (17.2) 1396 (13.11) <0.0001 Diabetes (yes) 1577 (17.6) 376 (22) 1953 (18.3) <0.0001 Immunodeﬁciency diseases (yes) 32 (0.4) 13 (0.8) 45 (0.4) 0.019 Asthma (yes) 198 (2.2) 28 (1.6) 226 (2.1) 0.129 Neurological diseases (yes) 140 (1.6) 49 (2.9) 189 (1.8) <0.0001 Chronic kidney diseases (yes) 289 (3.2) 114 (6.7) 403 (3.8) <0.0001 Dialysis (yes) 78 (0.9) 33 (1.9) 111 (1) <0.0001 Other chronic lung diseases (yes) 136 (1.5) 44 (2.6) 180 (1.7) 0.002 Chronic hematologic diseases (yes) 740 (0.8) 20 (1.2) 94 (0.9) 0.166 Cancer (yes) 172 (1.9) 80 (4.7) 252 (2.4) <0.0001 Autoimmune diseases (yes) 2 (0.0) 0 (0.0) 2 (0.0) 0.536 6 Journal of Healthcare Engineering Table 2: Continued. Variables Alive (n �8946) Dead (n �1711) Total patients (n �10657) P value Chronic liver diseases (yes) 46 (0.5) 16 (0.9) 62 (0.6) 0.036 HIV/AIDS (yes) 7 (0.1) 5 (0.3) 12 (0.1) 0.016 Mental disorders (yes) 26 (0.3) 2 (0.1) 28 (0.3) 0.198 Smoking (yes) 143 (1.6) 33 (1.9) 176 (1.7) 0.326 Drug abuse (yes) 54 (0.6) 21 (1.2) 75 (0.7) 0.005 Other comorbidities (yes) 286 (3.2) 69 (4) 355 (0.0) 0.078 Pregnancy 63 (0.7) 2 (0.1) 65 (0.6) 0.004 Care and treatment Intubation (yes) 308 (3.44) 962 (56.2) 1270 (11.9) <0.0001 ICU care (yes) 1323 (14.8) 1088 (63.6) 2411 (22.6) <0.0001 Oxygen therapy (yes) 2921 (32.7) 682 (39.9) 3603 (33.8) <0.0001 Diagnosis method (i) Only abnormal CT 3197 (35.7) 583 (31.4) 3735 (35) <0.0001 (ii) Only positive PCR 1161 (13) 160 (9.4) 1321 (12.4) <0.0001 (iii) Positive PCR and abnormal CT 4588 (51.3) 1013 (59.2) 5601 (52.6) <0.0001 Signiﬁcant diﬀerence. Table 3: Top 10 models developed on original dataset 1. Setting Feature set Accuracy Sensitivity Speciﬁcity Precision F-score AUC Bayesian network Default 2 91.12 64.7 96.2 76.4 0.701 0.914 CHIAD Default 2 90.76 54 97.8 82.6 0.653 0.909 MLP 2.5.5 boosting 1 90.63 53.6 97.7 81.5 0.647 0.904 MLP Boosting 1.10 3 90.79 54 97.8 82.3 0.652 0.903 C5 Boosting 2 90.7 56.4 97.3 79.9 0.662 0.901 MLP 2.10.10 2 90.55 53.4 97.7 81.5 0.646 0.901 MLP 2.5.5 1 90.31 55.4 97 77.6 0.646 0.901 RF Default 2 84.52 77.5 85.9 51.3 0.617 0.9 MLP 2.20.20 3 90.51 53.6 97.5 80.5 0.643 0.899 Bayesian network Default 1 90.46 55.5 97.1 78.5 0.65 0.899 For MLPs, the numbers for MLP indicate the number of layers, the number of neurons in hidden layer 1, and the number of neurons in hidden layer 2. oxygen therapy at the beginning of hospitalization, and ICU CHAID tree on 32 features, respectively. (e ROC curve for admission were signiﬁcantly higher among the dead. the best models is presented in Supplementary Figure S1. 3.2. 0e Machine Learning Algorithms and 0eir Evaluation. 3.2.2. 0e Machine Learning Algorithms on Dataset 2. (e results of performing various models with diﬀerent (e details on the performance of the models based on settings on three datasets and four feature groups are re- dataset 2 are given in Supplement 1, Tables S6–S9. (e ported as follows. ﬁndings showed that the lowest and highest accuracy were 82.64% (MLP with 60 features) and 87.86% (RF with 60 features), respectively. Moreover, the minimum and maxi- 3.2.1. 0e Machine Learning Algorithms on Original Dataset mum values of the AUC were 0.888 (MLP with 60 features) 1. (edetailsontheperformanceofthemodelsaregivenin and0.942(SVMwith60features),respectively.Accordingto Supplement 1 (Tables S2–S5). (e result showed that the theﬁndings,thesensitivityforpredictingdeathwasbetween lowest and highest accuracy of the models based on the 0.658 (MLP network) and 0.861 (CHAID tree with 32 original dataset 1 were 84.52% (RF with 32 features) and features). (e best results obtained for each algorithm based 91.12% (Bayesian network with 32 features), respectively. In on dataset 2 were shown in Supplementary Figure S2. addition, the minimum and maximum AUC were 0.757 (C5 According to Table 4, SVM and C5 models had the best with 32 features) and 0.914 (Bayesian network with 32 performance on 60 and 40 features, respectively. features), respectively. According to the ﬁndings, the sen- sitivity for predicting death based on original dataset 1 was low and between 0.484 (MLP network with 60 features) and 3.2.3. 0e Machine Learning Algorithms on Dataset 3. 0.775 (RF with 32 features) which indicates that the sensi- (e details on the performance of the models based on tivity of the models on imbalanced data is not appropriate. dataset 3 are given in Supplement 1, Tables S10–S13. (e Table 3 shows the results of the performance of the top 10 results showed that the lowest and highest accuracy were models based on the test data of dataset 1. According to the 81.27% (CHIAD tree with 32 features) and 92.77% (C5 with table,thebesttwomodelsweretheBayesiannetworkandthe 60 features), respectively. Moreover, the minimum and Journal of Healthcare Engineering 7 Table 4: Top 10 models developed on dataset 2. Settings Feature set Accuracy Sensitivity Speciﬁcity Precision F-score AUC SVM RBF default 4 87.83 83.4 90.3 82.9 0.832 0.942 C5 Boosting 3 87.44 81.8 90.6 82.7 0.822 0.94 SVM RBF default 3 87.59 82.7 90.3 82.4 0.826 0.938 C5 Boosting 4 87.88 79.9 92.4 85.5 0.826 0.938 RF Default 4 87.86 85.7 89.1 81.5 0.836 0.931 C5 Boosting 2 86.68 78.5 91.5 84.3 0.813 0.927 C5 Boosting 1 85.99 77.2 90.8 82.2 0.797 0.926 SVM RBF default 2 86.61 79 91.1 83.7 0.813 0.926 MLP 1.10 3 85.38 77 90 80.9 0.789 0.923 RF Default 1 85.26 85.2 85.3 76.2 0.804 0.923 For MLPs, the numbers for MLP indicate the number of layers, the number of neurons in hidden layer 1, and the number of neurons in hidden layer 2. Table 5: Top 10 models developed on dataset 3. Settings Feature set Accuracy Sensitivity Speciﬁcity Precision F-score AUC C5 Boosting 4 92.77 95.1 90.5 90.8 0.929 0.972 C5 Boosting 3 91.74 93.6 89.8 90.5 0.92 0.965 C5 Boosting 2 91.18 94.2 88 89.1 0.916 0.96 SVM RBF default 4 90.16 92.7 87.7 88.1 0.903 0.956 C5 Boosting 1 89.28 91.3 87.3 87.7 0.895 0.952 SVM RBF default 3 88.81 90.5 87.1 87.9 0.892 0.944 MLP 2.15.15 boosting 3 88.59 90.2 86.9 87.7 0.889 0.94 MLP 2.12.12 boosting 4 87.61 88.5 86.8 86.8 0.876 0.938 C5 Default 3 87.4 89.8 85 86.1 0.879 0.934 SVM RBF default 2 86.34 86.6 86.1 86.6 0.866 0.932 For MLPs, the numbers for MLP indicate the number of layers, the number of neurons in hidden layer 1, and the number of neurons in hidden layer 2. Table 6: Ensemble models developed on dataset 3. ID Included models Feature set Accuracy Sensitivity Speciﬁcity Precision F-score AUC 1 Table S10 1 86.10 0.799 0.924 0.914 0.853 0.954 2 Table S11 2 87.39 0.859 0.889 0.888 0.873 0.954 3 Table S12 3 87.26 0.831 0.915 0.908 0.867 0.954 4 Table S13 4 89.13 0.864 0.919 0.916 0.890 0.961 maximum AUC were 0.899 (CHIAD with 32 features) and (Table 5) using an external dataset. As shown in Table 7, C5 0.972 (C5 with 60 features), respectively. (e sensitivity for boosting models with feature sets 1 and 2 have better scores. predicting death was also between 0.752 (MLP with 60 features) and 0.951 (C5 tree with 60 features). (e best 3.5. Subpopulation Bias Analysis. We selected the four best results obtained for each algorithm based on dataset 3 are models based on external validation for subpopulation bias shown in Supplementary Figure S3. According to Table 5, analysis(Supplement1,TableS14).Figures2and3showthe the C5 model had the best performance with diﬀerent FPR and FNR of these models. As these ﬁgures indicate, features, and SVM with 60 features was also one of the mostofthesemodelsbetterperformonfemalepatientsthan optimal models. male patients. Furthermore, the performance of these models decreases in older patients. As for FPR, Figure 2 indicates that SVM and C5 (feature set 2) have a less biased 3.3. Ensemble Models. Table 6 indicates that the best en- prediction in terms of gender and age groups. Additionally, semble model had 89.13% accuracy and 0.961 AUC. Figure 3 shows that C5 (feature set 2) has a less biased However, the comparison of these models with the corre- prediction. sponding individual models (Table 5) shows that C5 models have better performance than these ensemble models, even though these ensemble models are better than other indi- 3.6. Comparison of the Models. A comparison of the models showed that, with the balancing of the data, the sensitivity vidual models. andAUCincreased.However,theaccuracybasedondataset 2 decreased, but it also increased based on dataset 3. Fur- 3.4. External Validation. We evaluated all ensemble models thermore, models with 60 and 40 features performed better. (Table 6) and the top 10 models developed on dataset 3 In general, the C5 model with 60 features outperformed the 8 Journal of Healthcare Engineering Table 7: External validation on dataset 3. Models Settings Feature set Accuracy Sensitivity Speciﬁcity Precision F-score AUC C5 Boosting 1 92.56 0.955 0.919 0.720 0.821 0.974 C5 Boosting 2 91.81 0.964 0.908 0.695 0.808 0.98 SVM RBF default 3 91.00 0.848 0.924 0.706 0.771 0.955 Ensemble 2 — 2 87.77 0.861 0.881 0.611 0.715 0.954 SVM RBF default 2 88.24 0.890 0.881 0.618 0.729 0.953 Ensemble 1 — 1 88.75 0.819 0.902 0.645 0.722 0.949 C5 Boosting 3 86.51 0.935 0.850 0.575 0.712 0.948 Ensemble 3 — 3 88.18 0.783 0.903 0.637 0.702 0.931 MLP 2.15.15 boosting 3 87.95 0.767 0.904 0.634 0.694 0.914 MLP 2.12.12 boosting 4 87.31 0.754 0.899 0.618 0.679 0.914 Ensemble 4 — 4 86.62 0.770 0.887 0.596 0.672 0.91 C5 Boosting 4 85.64 0.748 0.880 0.575 0.650 0.889 C5 Default 3 85.24 0.780 0.868 0.562 0.653 0.887 SVM RBF default 4 83.79 0.725 0.862 0.533 0.615 0.868 For MLPs, the numbers for MLP indicate the number of layers, the number of neurons in hidden layer 1, and the number of neurons in hidden layer 2. Subgroup FPR Subgroup FPR 0.4 0.4 0.26 0.3 0.3 0.25 0.2 0.15 0.2 0.16 0.12 0.11 0.07 0.06 0.1 0.08 0.07 0.1 0.04 0.04 0.05 0.03 (a) (b) Subgroup FPR Subgroup FPR 0.38 0.4 0.4 0.3 0.3 0.2 0.18 0.2 0.2 0.14 0.1 0.09 0.1 0.1 0.06 0.06 0.07 0.1 0.04 0.04 0.04 (c) (d) Figure 2: Subgroup false-positive rate (FPR) for diﬀerent models. (a) C5 model on feature set 1. (b) C5 model on feature set 2. (c) SVM model on feature set 3. (d) Ensemble model on feature set 2. restbasedonallevaluationindicators;however,basedonthe cough,unconsciousness,positivePCR,andabnormalCTare external validation, C5 boosting models with feature sets 1 considered the most important death predictors by this (17features)and2(32features)havebetterexternalvalidity. model. Subpopulation analysis suggests that the C5 boosting model with 32 features has less bias. 4. Discussion Intheﬁrststageofthestudy,theriskfactorsfordeathdueto 3.7. Variable Importance. Figure 4 shows the importance of COVID-19 were discovered using univariate analysis. (en, each variable in the selected model (C5). As indicated, in- based on the important features, diﬀerent machine learning tubation, number of comorbidities, age, gender, respiratory models were developed to predict death. (e results showed distress, blood oxygen saturation level, ICU admission, signiﬁcant diﬀerences between recovered and nonrecovered Male Male Female Female 0-20 0-20 21-40 21-40 41-60 41-60 61-80 61-80 81+ 81+ Male Male Female Female 0-20 0-20 21-40 21-40 41-60 41-60 61-80 61-80 81+ 81+ Journal of Healthcare Engineering 9 Subgroup FNR Subgroup FNR 0.4 0.4 0.33 0.3 0.3 0.2 0.2 0.1 0.09 0.1 0.1 0.06 0.06 0.05 0.05 0.04 0.04 0.03 00 0 0 0 (a) (b) Subgroup FNR Subgroup FNR 0.4 0.4 0.33 0.3 0.3 0.24 0.22 0.19 0.2 0.2 0.16 0.15 0.14 0.13 0.12 0.12 0.1 0.1 0.1 0.1 (c) (d) Figure 3: Subgroup false-negative rate (FNR) for diﬀerent models. (a) C5 model on feature set 1. (b) C5 model on feature set 2. (c) SVM model on feature set 3. (d) Ensemble model on feature set 2. Predictor Importance Target: Survival Intubation NumberComorbidity Age Gender Respiratory distress oxygen saturation level ICU Cough Unconscious Diagnosis_positive PCR and Abnormal CT 0.0 0.2 0.4 0.6 0.8 1.0 Diagnosis_positivePCRandAbnoramlCT Intubation Least Important Most Important Figure 4: Variable importance of the selected model. patients in terms of age, sex, contact with infected people, [43],andlowoxygensaturation[17,18,23,43]increasedcases respiratory distress, convulsion, altered consciousness, pa- of death due to COVID-19. Some researchers indicate that high blood pressure, heart disease, cancer, kidney disease ralysis, blood oxygen saturation level, the number of comorbidities, intubation, oxygen therapy, and the need for [16,17],diabetes[18],cerebrovasculardiseases[28],smoking ICU services. [18, 23], and asthma [16] increased mortality from COVID- Wefoundthatintubation,numberofcomorbidities,age, 19. However, our model did not consider these factors sig- gender, respiratory distress, blood oxygen saturation level, niﬁcant. It is worth mentioning that these risk factors in- ICU admission, cough, unconsciousness, positive PCR, and creased the number of comorbidities in a patient and this abnormalCTarethemostimportantdeathpredictors.Other factor was also considered signiﬁcant in the C5 model. studies showed that age [17, 18, 23, 27, 28, 43], male gender We developed various models with diﬀerent features to [43],respiratorydisease[16,17],thenumberofcomorbidities predictdeathfromCOVID-19.Basedontheresults,thebest Male Male Female Female 0-20 0-20 21-40 21-40 41-60 41-60 61-80 61-80 81+ 81+ Male Male Female Female 0-20 0-20 21-40 21-40 41-60 41-60 61-80 61-80 81+ 81+ 10 Journal of Healthcare Engineering Table 8: Some machine learning models suggested in the literature to predict death from COVID-19. Number of patients, death rate, number of Author Models Accuracy AUC features Decision tree (DT) 99.85 NA LR 97.49 NA SVM 98.85 NA Muhammad et al. [44] 1505, NA, 4 Naive Bayes 97.52 NA RF 99.60 NA KNN 98.06 NA RF 87.93 0.94 ANN 89.98 0.93 Pourhomayoun and SVM 89.02 0.88 307382, NA, 57 Shakibi [22] KNN 89.83 0.90 LR 87.91 0.92 DT 86.87 0.93 Gradient boosting 88.9 0.939 decision tree, 83 features Li et al. [20] 2924, 8.8%, diﬀerent features (83, 152, 5) LR, 152 features 86.8 0.928 LR, 5 features 88.7 0.915 Adaboost, gradient Goncalves and Rouco NA 0.919 827601, 8.7%, 3 boosting, and RF [21] LR NA 0.917 SVM linear 91.9 0.962 LASSO 91.1 0.963 LASSO (14 days) 86.8 0.944 An et al. [16] 8000, 2.2%, 10 SVM linear (14 days) 87.7 0.941 LASSO (30 days) 89.5 0.953 SVM linear (30 days) 87.7 0.948 XGBoost (17 and 3 Yadaw et al. [18] 3841, 8.1%, 17 and 3 NA 0.91 features) F1: Yan et al. [19] 375, 35%, 3 XGBoost 90 0.97 SVM 95.8 0.976 ANN 95.6 0.976 Gao et al. [43] 2160, 11%, 14 Ensemble 95.5 0.976 LR 95.4 0.974 GBDT 94.8 0.953 (192,26%) only criticallyill patients,47 (17 93 (47 features) 87.8 (17 Chen et al. [28] SVM linear NA nonlaboratory, 30 laboratory) features) 85.6 (30 features) Booth et al. [45] 398, 10.8%, 5 SVM-RBF 93 Parchure et al. [17] 567, 17.8%, 55 RF 65.5 85.5 Zhao et al. [23] 641, 12.8%, 47 LR NA 0.82 LR 96.5 0.83 SVM 97 0.825 Das et al. [27] 3524, 2.1%, 4 KNN 92.4 0.759 RF 92.4 0.787 Gradient boosting 97.1 0.787 Chen et al. [25] 1002 severe and critical cases, 16.1%, 7 LR NA 0.903 F1: Deep neural network 0.970 0.985 Khan et al. [26] 103888, 5.7%, 15 RF, XGBoost 0.946 0.972 LR, DT 0.945 0.972 KNN 0.944 0.971 (ese studies did not report the AUC. performance was related to the C5 decision tree with 32 (demographic, laboratory, radiographic, therapeutic, signs features. In the same way, several studies tried to develop and symptoms, and comorbidities) and datasets are used, it machinelearningmodelsforpredictingdeathfromCOVID- is not easy to compare the studies. For example, some re- 19 [16–23, 25–28, 43–45]. Since a variety of variables searchersusedlaboratorydatatodevelopmodelsinaddition Journal of Healthcare Engineering 11 to other variables [17, 23, 28, 43], and a study applied only 5. Conclusions laboratory variables [45]. In another study, vital signs and Diﬀerent machine learning models were developed to imaging results were used to develop models [23]. However, predict the likelihood of death caused by COVID-19. (e the variables used in our study were similar to most of the best prediction model was the C5 decision tree (accu- studies. Despite this, a comparison of our study with pre- racy �91.18%, AUC �0.96, and F �0.916). (erefore, this vious studies showed that the performance of our selected model can be used to detect high-risk patients and improve model was better than those models (Table 8). (e model theuseoffacilities,equipment,andmedicalpractitionersfor developed by Gao et al. [43] has better performance patients with COVID-19. (AUC �0.976 vs. AUC �0.972); however, this model was developed with small sample size. In addition, the F-score (F �0.97) of the model developed by Yan et al. [19] was Data Availability higher than our selected model. However, Barish et al. [46] showed that Yan’s model did not have a good result in the (e data used to support the ﬁndings of this study are re- external validation. Khan’s model [26] also has a higher stricted by the Ethics Research Committee of Ahvaz Jun- F-score than our model. Khan et al. and Gao et al. used dishapur University of Medical Sciences in order to protect unbalanced data; Barish et al. [46] have shown that models patient privacy. developed based on unbalanced data to predict death from COVID-19 may not have accurate results in the real Conflicts of Interest environment. We found that machine learning models perform (e authors declare that there are no conﬂicts of interest. diﬀerently in subpopulations in terms of gender and age groups. Other studies similarly show that predictive Authors’ Contributions models have diﬀerent performances in diﬀerent ethnic groups, genders, and age groups of patients and patients J. Zarei and A. Jamshidnezhad contributed to conceptual- with diﬀerent insurance [41, 42]. (erefore, researchers ization, data curation, and writing—review and editing. and clinicians should apply these models to diﬀerent M. H. Shoushtari contributed to conceptualization, meth- population groups cautiously. Moreover, developing odology, writing—review and editing. A. Hadianfard and models for diﬀerent patient groups may be necessary. M. Cheraghi contributed to conceptualization and wri- (e strengths of our model are the use of demographic ting—review and editing. A. Sheikhtaheri contributed to data, symptoms, and comorbidities that can be easily conceptualization, methodology, data analysis, supervision, collected. Despite some previous studies, we did not use writing—theoriginaldraft,andwriting—reviewandediting. laboratory, treatment, and imaging data. It can be con- All authors reviewed the ﬁnal version of the manuscript and sidered a limitation. However, we supposed that all pa- approved it to submit. (is study was counducted based on tients received almost similar treatments. Moreover, the Khuzestan COVID-19 registry data. We would like to applying models which are developed based on treatment thank Khuzestan COVID-19 registry for providing data for data may be diﬃcult because of changes in patients’ this study. treatment. Furthermore, models that depend on labora- tory and imaging data require a lot of time and cost to Acknowledgments gather these data to use the model in a real clinical en- vironment. A comparison of our study with those that (is study was supported by Ahvaz Jundishapur University used laboratory and imaging data (Table 8) indicates that of Medical Sciences. (e funder had no role in the study our selected model outperforms many of these models. A design; data collection, analysis, and interpretation; writing study also indicated that imaging data did not aﬀect the of the report; and the decision to submit. performance of machine learning models to predict death from COVID-19 [23]. In addition, the data used in our Supplementary Materials study have been collected from 38 hospitals, which is the strength of the study. A similar study indicated that up to Supplement 1: detailed Tables S1–S14. Supplement 2: Fig- 20%ofmissingdatainCOVID-19studiesisacceptablefor ures S1–S3. (Supplementary Materials) developing machine learning models [18]; however, the missing rate in our study was under 4%. Despite the strengths, some limitations should be References considered. Firstly, we only analyzed the subpopulation [1] World Health Organization, “WHO coronavirus (COVID- bias based on gender and age groups. Future studies 19) dashboard,” WHO, Geneva, Switzerland, 2022, https:// should consider other variables in this analysis. Fur- covid19.who.int. thermore, there are several well-established models such [2] A. C. Darby and J. A. Hiscox, “Covid-19: variants and vac- as APACHE and SOFA [41, 42]. Researchers are rec- cination,” BMJ, vol. 372, p. n771, 2021. ommended to compare the performance of machine [3] I. Cosic, D. Cosic, and I. Loncarevic, “Analysis of mutated learning models with these models to predict deaths from SARS-CoV-2 variants using resonant recognition model,” COVID-19. International Journal of Sciences,vol.10,no.7,pp.6–11,2021. 12 Journal of Healthcare Engineering [4] L. Samaranayake and K. S. Fakhruddin, “SARS-CoV-2 vari- [20] S. Li, Y. Lin, T. Zhu et al., “Development and external ants and COVID-19: an overview,” Dental Update, vol. 48, evaluation of predictions models for mortality of COVID-19 no. 3, pp. 235–238, 2021. patients using machine learning method,” Neural Computing [5] V. S. Malik, K. Ravindra, S. V. Attri, S. K. Bhadada, and & Applications, pp. 1–10, 2021. M.Singh,“Higherbodymassindexisanimportantriskfactor [21] C. P. Goncalves and J. Rouco, “Comparing decision tree- in COVID-19 patients: a systematic review and meta-analy- based ensemble machine learning models for COVID-19 death probability proﬁling,” Journal of Vaccines & Vaccina- sis,” Environmental Science and Pollution Research, vol. 27, no. 33, Article ID 42123, 2020. tion, vol. 12, no. 1, Article ID 1000441, 2021. [22] M. Pourhomayoun and M. Shakibi, “Predictingmortality risk [6] M. Parohan, S. Yaghoubi, A. Seraji, M. Javanbakht, P. Sarraf, and M. Djalali, “Risk factors for mortality in patients with in patients with COVID-19 using machine learning to help Coronavirusdisease2019(COVID-19)infection:asystematic medical decision-making,” Smart Health, vol. 20, Article ID review and meta-analysis of observational studies,” 0e Aging 100178, 2021. Male, vol. 34, no. 5, pp. 1–9, 2020. [23] Z. Zhao, A. Chen, W. Hou et al., “Prediction model and risk [7] R. H. Li and H. H. Sigurslid, “Predictors of mortality in scores of ICU admission and mortality in COVID-19,” PLoS hospitalized COVID-19 patients: a systematic review and One, vol. 15, no. 7, Article ID e0236618, 2020. [24] C. Ieracitano, N. Mammone, M. Versaci et al., “A fuzzy- meta-analysis,” vol. 92, no. 10, pp. 1875–1883, 2020. [8] K. Mackey, C. K. Ayers, K. K. Kondo et al., “Racial and ethnic enhanced deep learning approach for early detection of Covid-19 pneumonia from portable chest X-ray images,” disparities in COVID-19-related infections, hospitalizations, and deaths,” Annals of Internal Medicine, vol. 174, no. 3, Neurocomputing, vol. 481, pp. 202–215, 2022. pp. 362–373, 2021. [25] B.Chen,H.-Q.Gu,Y.Liuetal.,“Amodeltopredicttheriskof [9] J.Li,D.Q.Huang,B.Zouetal.,“EpidemiologyofCOVID-19: mortality in severely ill COVID-19 patients,” Computational a systematic review and meta-analysis of clinical character- and Structural Biotechnology Journal, vol. 19, pp. 1694–1700, istics,riskfactors,andoutcomes,” Journal of Medical Virology, 2021. vol. 93, no. 3, pp. 1449–1458, 2021. [26] I. U. Khan, N. Aslam, M. Aljabri et al., “Computational in- [10] W.M.Shaban,A.H.Rabie,A.I.Saleh,andM.A.Abo-Elsoud, telligence-based model for mortality rate prediction in “A new COVID-19 patients detection strategy (CPDS) based COVID-19 patients,” International Journal of Environmental Research and Public Health, vol. 18, no. 12, p. 6429, 2021. on hybrid feature selection and enhanced KNN classiﬁer,” Knowledge-Based Systems, vol. 205, Article ID 106270, 2020. [27] A. K. Das, S. Mishra, and S. Saraswathy Gopalan, “Predicting [11] A. Sheikhtaheri, A. Orooji, A. Pazouki, and M. Beitollahi, “A CoVID-19 community mortality risk using machine learning clinical decision support system for predicting the early and development of an online prognostic tool,” PeerJ, vol. 8, complications of one-anastomosis gastric bypass surgery,” Article ID e10083, 2020. Obesity Surgery, vol. 29, no. 7, pp. 2276–2286, 2019. [28] Y. Chen, Z. Linli, Y. Lei et al., “Risk factors for mortality in [12] A. Sheikhtaheri, M. R. Zarkesh, R. Moradi, and F. Kermani, critically ill patients with COVID-19 in Huanggang, China: a “Prediction of neonatal deaths in NICUs: development and single-center multivariate pattern analysis,” Journal of Med- validation of machine learning models,” BMC Medical In- ical Virology, vol. 93, no. 4, pp. 2046–2055, 2020. [29] M. Ghafari, A. Kadivar, and A. Katzourakis, “Excess deaths formatics and Decision Making, vol. 21, no. 1, p. 131, 2021. [13] H. B. Syeda, M. Syed, K. W. Sexton et al., “Role of machine associated with the Iranian COVID-19 epidemic: a province- learning techniques to tackle the COVID-19 crisis: systematic level analysis,” International Journal of Infectious Diseases, review,” JMIR medical informatics, vol. 9, no. 1, Article ID vol. 107, pp. 101–115, 2021. e23811, 2021. [30] J. Zarei, M. Dastoorpoor, A. Jamshidnezhad, M. Cheraghi, [14] P.Pan,Y.Li,Y.Xiaoetal.,“Prognosticassessmentofcovid-19 and A. Sheikhtaheri, “Regional COVID-19 registry in Khu- intheintensivecareunitbymachinelearningmethods:model zestan, Iran: a study protocol and lessons learned from a pilot development and validation,” Journal of Medical Internet implementation,” Informatics in Medicine Unlocked, vol. 23, Research, vol. 23, no. 3, Article ID e23128, 2020. Article ID 100520, 2021. [15] L. Ryan, C. Lam, S. Mataraso et al., “Mortality prediction [31] A.Abdoli,“Iran,sanctions,andtheCOVID-19crisis,” Journal model for the triage of COVID-19, pneumonia, and of Medical Economics, vol. 23, no. 12, pp. 1–8, 2020. [32] A. Murphy, Z. Abdi, I. Harirchi, M. McKee, and mechanically ventilated ICU patients: a retrospective study,” Annals of Medicine and Surgery, vol. 59, pp. 207–216, 2020. E. Ahmadnezhad, “Economic sanctions and Iran’s capacity to [16] C. An, H. Lim, D. W. Kim, J. H. Chang, Y. J. Choi, and respond to COVID-19,” 0e Lancet Public Health, vol. 5, S. W. Kim, “Machine learning prediction for mortality of no. 5, p. e254, 2020. patients diagnosed with COVID-19: a nationwide Korean [33] J. Han, J. Pei, and M. Kamber, Data mining: concepts and cohort study,” Scientiﬁc Reports, vol. 10, no. 1, Article ID techniques, Elsevier, Amsterdam, Netherland, 2011. 18811, 2020. [34] H. Bhavsar and A. Ganatra, “A comparative study of training [17] P. Parchure, H. Joshi, K. Dharmarajan et al., “Development algorithms for supervised machine learning,” International and validation of a machine learning-based prediction model Journal of Soft Computing and Engineering, vol. 2, no. 4, pp. 2231–2307, 2012. for near-term in-hospital mortality among patients with COVID-19,” BMJ Supportive & Palliative Care, 2020. [35] H. C. Koh and G. Tan, “Data mining applications in [18] A. S. Yadaw, Y.-c. Li, S. Bose, R. Iyengar, S. Bunyavanich, and healthcare,” Journal of Healthcare Information Management, G. Pandey, “Clinical features of COVID-19 mortality: de- vol. 19, no. 2, p. 65, 2011. velopment and validation of a clinical prediction model,” 0e [36] D. Senthilkumar and S. Paulraj, “Prediction of low birth Lancet Digital Health, vol. 2, no. 10, pp. e516–e525, 2020. weight infants and its risk factors using data mining tech- [19] L. Yan, H.-T. Zhang, J. Goncalves et al., “An interpretable niques,” in Proceedings of the International Conference on mortality prediction model for COVID-19 patients,” Nature Industrial Engineering and Operations Management, Dubai, Machine Intelligence, vol. 2, no. 5, pp. 283–288, 2020. U. A. Emirates (UAE), March 2015. Journal of Healthcare Engineering 13 [37] N. Friedman, D. Geiger, and M. Goldszmidt, “Bayesian network classiﬁers,” Machine Learning, vol. 29, no. 2-3, pp. 131–163, 1997. [38] J. Cheng and R. Greiner, “Comparing Bayesian network classiﬁers,” in Proceedings of the Fifteenth conference on Uncertainty in artiﬁcial intelligence, pp. 101–108, Morgan Kaufmann Publishers Inc, Stockholm Sweden, August 1999. [39] F. V. Jensen, An introduction to Bayesian networks, UCL press, London, United Kingdom, 1996. [40] Z. Zhang, L. Chen, P. Xu, and Y. Hong, “Predictive analytics with ensemble modeling in laparoscopic surgery: a technical note,” Laparoscopic, Endoscopic and Robotic Surgery, vol. 5, no. 1, pp. 25–34, 2022. [41] L. Seyyed-Kalantari, H. Zhang, M. B. A. McDermott, I. Y. Chen, and M. Ghassemi, “Underdiagnosis bias of arti- ﬁcial intelligence algorithms applied to chest radiographs in under-served patient populations,” Nature Medicine, vol. 27, no. 12, pp. 2176–2182, 2021. [42] R.Sarkar,C.Martin,H.Mattie,J.W.Gichoya,D.J.Stone,and L.A.Celi,“Performanceofintensivecareunitseverityscoring systemsacrossdiﬀerentethnicitiesintheUSA:aretrospective observational study,” 0e Lancet Digital Health, vol. 3, no. 4, pp. e241–e249, 2021. [43] Y. Gao, G. Y. Cai, W. Fang et al., “Machine learning based early warning system enables accurate mortality risk pre- diction for COVID-19,” Nature Communications, vol. 11, no. 1, pp. 5033–5110, 2020. [44] L. J. Muhammad, M. M. Islam, S. S. Usman, and S. I. Ayon, “Predictive data mining models for novel coronavirus (COVID-19) infected patients’ recovery,” SN Computer Sci- ence, vol. 1, no. 4, p. 206, 2020. [45] A. L. Booth, E. Abels, and P. McCaﬀrey, “Development of a prognostic model for mortality in COVID-19 infection using machine learning,” Modern Pathology, vol. 34, no. 3, pp. 522–531, 2020. [46] M. Barish, S. Bolourani, L. F. Lau, S. Shah, and T. P. Zanos, “External validation demonstrates limited clinical utility of the interpretable mortality prediction model for patients with COVID-19,” Nature Machine Intelligence, vol. 3, no. 1, pp. 25–27, 2021. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Journal of Healthcare Engineering Hindawi Publishing Corporation http://www.deepdyve.com/lp/hindawi-publishing-corporation/machine-learning-models-to-predict-in-hospital-mortality-among-y32ssnSWdS

Loading next page...

References (48)

M. Syed H. B. Syeda
Role of machine learning techniques to tackle the COVID-19 crisis: systematic review,
JMIR medical informatics, 9
Wentao Bao, Lin Chen, Ping Xu, Yucai Hong (2022)
Predictive analytics with ensemble modeling in laparoscopic surgery: A technical note
Laparoscopic, Endoscopic and Robotic Surgery
Boran Chen, H. Gu, Yi (刘艺), Gu-qin Zhang, Hang-Ju Yang, Huifang Hu, Chenyang Lu, Yang Li, Liyi Wang, Yi (刘毅), Yi Zhao, Huaqin Pan (2021)
A model to predict the risk of mortality in severely ill COVID-19 patients
Computational and Structural Biotechnology Journal, 19
K. Mackey, Chelsea Ayers, Karli Kondo, S. Saha, S. Advani, Sarah Young, H. Spencer, Max Rusek, Johanna Anderson, S. Veazie, Mia Smith, D. Kansagara (2020)
Racial and Ethnic Disparities in COVID-19–Related Infections, Hospitalizations, and Deaths
Annals of Internal Medicine
Logan Ryan, Carson Lam, S. Mataraso, Angier Allen, A. Green-Saxena, Emily Pellegrini, J. Hoffman, C. Barton, Andrea McCoy, R. Das (2020)
Mortality prediction model for the triage of COVID-19, pneumonia, and mechanically ventilated ICU patients: A retrospective study
Annals of Medicine and Surgery, 59
J. Zarei, M. Dastoorpoor, A. Jamshidnezhad, M. Cheraghi, A. Sheikhtaheri (2021)
Regional COVID-19 registry in Khuzestan, Iran: A study protocol and lessons learned from a pilot implementation
Informatics in Medicine Unlocked, 23
A. Abdoli (2020)
Iran, sanctions, and the COVID-19 crisis
Journal of Medical Economics, 23
(2022)
WHO coronavirus (COVID-19) dashboard
M. Pourhomayoun and M. Shakibi
Predicting mortality risk in patients with COVID-19 using machine learning to help medical decision-making,
Smart Health, 20
Irfan Khan, N. Aslam, M. Aljabri, Sumayh Aljameel, Mariam Kamaleldin, Fatima Alshamrani, S. Chrouf (2021)
Computational Intelligence-Based Model for Mortality Rate Prediction in COVID-19 Patients
International Journal of Environmental Research and Public Health, 18
Yue Gao, G. Cai, Wei Fang, Huayi Li, Si-yuan Wang, Lingxi Chen, Yang Yu, Dan Liu, Sen Xu, Peng-fei Cui, S. Zeng, Xinxia Feng, Ruidi Yu, Ya Wang, Yuan Yuan, X. Jiao, J. Chi, Jiahao Liu, Ruyuan Li, Xu Zheng, Chunyan Song, Ning Jin, Wenjian Gong, Xingchi Liu, Lei-lei Huang, Xun Tian, Lin Li, H. Xing, D. Ma, Chunrui Li, F. Ye, Q. Gao (2020)
Machine learning based early warning system enables accurate mortality risk prediction for COVID-19
Nature Communications, 11
Warda Shaban, Asmaa Rabie, A. Saleh, M. Abo-Elsoud (2020)
A new COVID-19 Patients Detection Strategy (CPDS) based on hybrid feature selection and enhanced KNN classifier
Knowledge-Based Systems, 205
Abdulkadir Atalan (2020)
Is the lockdown important to prevent the COVID-19 pandemic? Effects on psychology, environment and economy-perspective
Annals of Medicine and Surgery, 56
H. Syeda, Mahanazuddin Syed, K. Sexton, Shorabuddin Syed, Salma Begum, F. Syed, Feliciano Yu (2020)
The Role of Machine Learning Techniques to Tackle COVID-19 Crisis: A Systematic Review.
medRxiv
Li Yan, Hai-Tao Zhang, Jorge Gonçalves, Yang Xiao, Maolin Wang, Yuqi Guo, Chuan Sun, Xiuchuan Tang, Liang Jing, Mingyang Zhang, Xiang Huang, Ying Xiao, Haosen Cao, Yanyan Chen, Tongxin Ren, Fang Wang, Yaru Xiao, Sufang Huang, X. Tan, Nian-nian Huang, B. Jiao, Cheng Cheng, Yong Zhang, A. Luo, Laurent Mombaerts, Junyang Jin, ZHIGUO CAO, Shusheng Li, Hui Xu, Ye Yuan (2020)
An interpretable mortality prediction model for COVID-19 patients
Nature Machine Intelligence, 2
W. Tian, Wanlin Jiang, J. Yao, C. Nicholson, Rebecca Li, Haakon Sigurslid, Luke Wooster, J. Rotter, Xiuqing Guo, R. Malhotra (2020)
Predictors of mortality in hospitalized COVID‐19 patients: A systematic review and meta‐analysis
Journal of Medical Virology, 92
L. Seyyed-Kalantari, Haoran Zhang, Matthew McDermott, I. Chen, M. Ghassemi (2021)
Underdiagnosis bias of artificial intelligence algorithms applied to chest radiographs in under-served patient populations
Nature Medicine, 27
A. Murphy, Z. Abdi, I. Harirchi, M. Mckee, E. Ahmadnezhad (2020)
Economic sanctions and Iran's capacity to respond to COVID-19
The Lancet. Public Health, 5
M. Barish, Siavash Bolourani, Lawrence Lau, Sareen Shah, Theodoros Zanos (2020)
External validation demonstrates limited clinical utility of the interpretable mortality prediction model for patients with COVID-19
Nature Machine Intelligence, 3
H. Koh, Gerald Tan (2005)
Data mining applications in healthcare.
Journal of healthcare information management : JHIM, 19 2
C. Ieracitano, N. Mammone, M. Versaci, G. Varone, Dr. Ali, A. Armentano, Grazia Calabrese, Anna, Ferrarelli, Lorena Turano, Carmela Tebala, Zain Hussain, Zakariya, Sheikh, Aziz Sheikh, G. Sceni, Amir Hussain, Francesco Carlo, Morabito (2022)
A fuzzy-enhanced deep learning approach for early detection of Covid-19 pneumonia from portable chest X-ray images
Neurocomputing, 481
M. Jibril, Md. Islam, U. Sharif, Safial Ayon (2020)
Predictive Data Mining Models for Novel Coronavirus (COVID-19) Infected Patients’ Recovery
Sn Computer Science, 1
C. Gonçalves, J. Rouco (2020)
Comparing Decision Tree-Based Ensemble Machine Learning Models for COVID-19 Death Probability Profiling
medRxiv
(2015)
Prediction of Low Birth Weight Infants and Its Risk Factors Using Data Mining Techniques Senthilkumar
E. Mendes (2014)
Introduction to Bayesian Networks
P. Parchure, Himanshu Joshi, K. Dharmarajan, R. Freeman, D. Reich, M. Mazumdar, P. Timsina, A. Kia (2020)
Development and validation of a machine learning-based prediction model for near-term in-hospital mortality among patients with COVID-19
BMJ Supportive & Palliative Care, 12
C. An, Hyunsun Lim, Dong-Wook Kim, J. Chang, Yoon Choi, S. Kim (2020)
Machine learning prediction for mortality of patients diagnosed with COVID-19: a nationwide Korean cohort study
Scientific Reports, 10
V. Malik, K. Ravindra, S. Attri, S. Bhadada, Meenu Singh (2020)
Higher body mass index is an important risk factor in COVID-19 patients: a systematic review and meta-analysis
Environmental Science and Pollution Research International, 27
P. Pan, Yichao Li, Yongjiu Xiao, B. Han, L. Su, M. Su, Yansheng Li, Siqi Zhang, D. Jiang, Xia Chen, Fuquan Zhou, Ling Ma, Pengtao Bao, L. Xie (2020)
Prognostic Assessment of COVID-19 in the Intensive Care Unit by Machine Learning Methods: Model Development and Validation
Journal of Medical Internet Research, 22
Zirun Zhao, Anne Chen, W. Hou, James Graham, Haifang Li, P. Richman, H. Thode, A. Singer, T. Duong (2020)
Prediction model and risk scores of ICU admission and mortality in COVID-19
PLoS ONE, 15
I. Cosic, D. Cosic, I. Loncarevic (2021)
Analysis of Delta (Indian) Variant of SARS-CoV-2 Infectivity using Resonant Recognition Model
International Journal of Sciences
A. Das, S. Mishra, S. Gopalan (2020)
Predicting CoVID-19 community mortality risk using machine learning and development of an online prognostic tool
PeerJ, 8
H. Bhavsar, Amit Ganatra (2012)
A Comparative Study of Training Algorithms for Supervised Machine Learning
A. Yadaw, Yan-Chak Li, S. Bose, Ravi Iyengar, S. Bunyavanich, G. Pandey (2020)
Clinical features of COVID-19 mortality: development and validation of a clinical prediction model
The Lancet. Digital Health, 2
A. Sheikhtaheri, M. Zarkesh, Raheleh Moradi, F. Kermani (2021)
Prediction of neonatal deaths in NICUs: development and validation of machine learning models
BMC Medical Informatics and Decision Making, 21
Adam Booth, E. Abels, P. McCaffrey (2020)
Development of a prognostic model for mortality in COVID-19 infection using machine learning
Modern Pathology, 34
M. Parohan, S. Yaghoubi, Asal Seraji, M. Javanbakht, P. Sarraf, M. Djalali (2020)
Risk factors for mortality in patients with Coronavirus disease 2019 (COVID-19) infection: a systematic review and meta-analysis of observational studies
The Aging Male, 23
L. Samaranayake, K. Fakhruddin (2021)
SARS-CoV-2 Variants and COVID-19: An Overview
Dental update, 48
M. Ghafari, A. Kadivar, A. Katzourakis (2020)
Excess deaths associated with the Iranian COVID-19 epidemic: A province-level analysis
International Journal of Infectious Diseases, 107
N. Friedman, D. Geiger, M. Goldszmidt (1997)
Bayesian Network Classifiers
Machine Learning, 29
A. Darby, J. Hiscox (2021)
Covid-19: variants and vaccination
BMJ, 372
R. Sarkar, Christopher Martin, H. Mattie, J. Gichoya, David Stone, L. Celi (2021)
Performance of intensive care unit severity scoring systems across different ethnicities in the USA: a retrospective observational study
The Lancet. Digital health, 3
Simin Li, Yulan Lin, T. Zhu, Mengjie Fan, Shicheng Xu, Weihao Qiu, Can Chen, Linfeng Li, Yao Wang, Jun Yan, J. Wong, L. Naing, Shabei Xu (2020)
Development and external evaluation of predictions models for mortality of COVID-19 patients using machine learning method
Neural Computing & Applications, 35
Yinyin Chen, Z. Linli, Yuting Lei, Yiya Yang, Zhipeng Liu, Youchun Xia, Yumei Liang, Huabo Zhu, Shuixia Guo (2020)
Risk factors for mortality in critically ill patients with COVID‐19 in Huanggang, China: A single‐center multivariate pattern analysis
Journal of Medical Virology, 93
D. Senthilkumar and S. Paulraj
Prediction of low birth weight infants and its risk factors using data mining techniques,
Proceedings of the International Conference on Industrial Engineering and Operations Management
Jie Cheng, R. Greiner (1999)
Comparing Bayesian Network Classifiers
ArXiv, abs/1301.6684
A. Sheikhtaheri, A. Orooji, A. Pazouki, Maryam Beitollahi (2019)
A Clinical Decision Support System for Predicting the Early Complications of One-Anastomosis Gastric Bypass Surgery
Obesity Surgery, 29
Jie Li, Daniel Huang, B. Zou, Hongli Yang, W. Hui, F. Rui, Natasha Yee, Chuanli Liu, S. Nerurkar, Justin Kai, M. Teng, Xiaohe Li, Hua Zeng, J. Borghi, L. Henry, R. Cheung, M. Nguyen (2020)
Epidemiology of COVID‐19: A systematic review and meta‐analysis of clinical characteristics, risk factors, and outcomes
Journal of Medical Virology, 93

Publisher: Hindawi Publishing Corporation
Copyright: Copyright © 2022 Javad Zarei et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
ISSN: 2040-2295
eISSN: 2040-2309
DOI: 10.1155/2022/1644910
Publisher site: See Article on Publisher Site

Abstract

Hindawi Journal of Healthcare Engineering Volume 2022, Article ID 1644910, 13 pages https://doi.org/10.1155/2022/1644910 Research Article Machine Learning Models to Predict In-Hospital Mortality among Inpatients with COVID-19: Underestimation and Overestimation Bias Analysis in Subgroup Populations 1 1 2 Javad Zarei , Amir Jamshidnezhad , Maryam Haddadzadeh Shoushtari , 1 3 4 Ali Mohammad Hadianfard , Maria Cheraghi , and Abbas Sheikhtaheri Department of Health Information Technology, School of Allied Medical Sciences, Ahvaz Jundishapur University of Medical Sciences, Ahvaz, Iran Air Pollution and Respiratory Diseases Research Center, Ahvaz Jundishapur University of Medical Sciences, Ahvaz, Iran Social Determinant of Health Research Center, Department of Public Health, School of Health, Ahvaz Jundishapur University of Medical Sciences, Ahvaz, Iran Department of Health Information Management, School of Health Management and Information Sciences, Iran University of Medical Sciences, Tehran, Iran Correspondence should be addressed to Abbas Sheikhtaheri; sheikhtaheri.a@iums.ac.ir Received 1 February 2022; Revised 17 April 2022; Accepted 22 May 2022; Published 23 June 2022 Academic Editor: Cosimo Ieracitano Copyright © 2022 Javad Zarei et al. �is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Prediction of the death among COVID-19 patients can help healthcare providers manage the patients better. We aimed to develop machine learning models to predict in-hospital death among these patients. We developed di‰erent models using di‰erent feature sets and datasets developed using the data balancing method. We used demographic and clinical data from a multicenter COVID- 19 registry. We extracted 10,657 records for con‘rmed patients with PCR or CTscans, who were hospitalized at least for 24 hours at the end of March 2021. �e death rate was 16.06%. Generally, models with 60 and 40 features performed better. Among the 240 models, the C5 models with 60 and 40 features performed well. �e C5 model with 60 features outperformed the rest based on all evaluation metrics; however, in external validation, C5 with 32 features performed better. �is model had high accuracy (91.18%), F-score (0.916), Area under the Curve (0.96), sensitivity (94.2%), and speci‘city (88%). �e model suggested in this study uses simple and available data and can be applied to predict death among COVID-19 patients. Furthermore, we concluded that machine learning models may perform di‰erently in di‰erent subpopulations in terms of gender and age groups. Since the beginning of the COVID-19 pandemic, one of 1. Introduction the most critical challenges for the healthcare systems has In spite of more than 2 years since the COVID-19 pandemic been to increase the number of patients with severe and performing vaccination in many countries, the disease’s symptoms and the growing demand for hospitalization. In prevalence and mortality have not slowed down, and many developing countries, which do not have su¥cient health- countries are still experiencing high peaks [1]. In addition, care infrastructure, the increase in inpatients has put a lot of multiple mutations in the virus have become a new challenge burden on the healthcare system. Moreover, numerous to control the disease, leading to the spread of the disease studies have reported various risk factors such as old age, and increased mortality [2–4]. Until April 16, 2022, more male gender, and underlying medical conditions (such as than 500 million cases of the disease and more than 6 million hypertension, cardiovascular disease, diabetes, COPD, deaths due to COVID-19 have been reported globally, with cancer, and obesity) for the deterioration of COVID-19 more than 7 million cases and 140,000 deaths in Iran [1]. patients [5–9]. 2 Journal of Healthcare Engineering (e use of modern and noninvasive methods to triage inpatients in Khuzestan province, Iran. (is registry collects patientsintospeciﬁcandknowncategoriesattheearlystages demographic data, signs and symptoms, patient outcomes, of the disease is beneﬁcial [10]. One of these approaches is PCR and CT results, and comorbidities from 38 hospitals. the use of predictive models based on machine learning (e details of data collection and data quality control were [11, 12]. For example, developing predictive models based published elsewhere [30]. on mortality risk factors can positively prevent mortality We included onlypatients with aconﬁrmed diagnosis of through controlling acute conditions and planning in in- COVID-19 based on PCR test or CT scan results for this tensive care units [13]. Furthermore, machine learning can modeling study. Furthermore, we included only patients classify patients based on the deteriorating risk and predict who were hospitalized for more than 24 hours. Because the likelihood of death to manage resources optimally outpatients and hospitalized patients with a short stay (less [14, 15]. than 24 hours) had a lot of missing data, we excluded these To date, several studies have been published on the cases from theﬁnal analysis. We also included patients from application of machine learning to develop diagnostic all age groups. Finally, we extracted data for 10,657 patients. models or predict the death of patients due to COVID-19 (efrequencyofnonsurvivingpatients(untildischarge)was [14–23]. For example, several deep learning models have 1711(16.06%);8946patients(83.94%)weredischargedalive. been reported to diagnose COVID-19 based on images [24]. Figure 1 shows the steps of this study. In a study, researchers developed an enhanced fuzzy-based deeplearningmodeltodiﬀerentiatebetweenCOVID-19and 2.2. Data Preprocessing infectious pneumonia (no-COVID-19) based on portable CXRs and achieved up to 81% accuracy. (eir fuzzy model 2.2.1. Imputing Missing Variables. Because of the data had only three misclassiﬁcations on the validation dataset qualitycontrolsintheregistry,thedatabasehadalowrateof [24]. missing data. (e 28 variables had a missing rate below 4% As for death prediction, several studies have also been (Supplement 1, Table S1). In machine learning, data im- published [16, 25–28]. (e results obtained from the studies putation is a standard approach to improve the models’ on machine learning-based predictive methods indicated performance. Diﬀerent methods such as imputation with that those methods had reliable predictability and could mean, median, or mode are common. We imputed the identify the correlation between intervening variables in missing values with the mean for age and the highest fre- complex and ambiguous conditions caused by COVID-19. quency of values for nonnumerical variables as well [11, 33]. (erefore, they can be used to predict such situations in the future. Although those techniques have been tested on some regional datasets of the risk factors, the performance of the 2.2.2. Features and Feature Selection. (e outcome measure models can be improved when they apply to diﬀerent of the study is in-hospital mortality until discharge which is datasets related to other countries such as Iran, where the collected as binary (yes/no). (e dataset contains 60 input prevalence of the COVID-19 and related deaths is high. variables. Age and the number of comorbidities are nu- Iran is one of the ﬁrst countries to face a widespread merical; oxygen saturation level (PO2) includes two values outbreak of the disease and has experienced more than four including below and above 93%. We created three dummy major epidemic waves with the highest mortality rates variables for the diagnosis method (only positive PCR, only [29,30].Asaresult,duetothehighprevalenceandmortality abnormal CT, positive PCR, and abnormal CT). Other rate of COVID-19 in Iran and the limitation of healthcare variables have two values: yes or no. resources[31,32],itisvitaltohaveapredictionmodelbased For feature selection, we applied univariate analysis on Iranian conditions and local data. (erefore, this study using Chi-square or Fisher exact tests for nonnumerical aimed to ﬁt a model for predicting the death caused by variables and Mann-Whitney U test for age and number of COVID-19 based on machine learning algorithms. Many comorbidities (due to abnormal distribution). We created previous models are based on laboratory, imaging, or diﬀerentfeaturesetstobuildthepredictionmodels.(eﬁrst treatment data [16, 25–28]; however, we suggested models set included all the 60 variables. (e second set consisted of based on available demographic data, symptoms, and variables that were signiﬁcant in univariate analysis (P value comorbidities that can be easily collected. We also con- <0.05). (e third feature set included the marginal variables ducted a bias analysis of machine learning models based on based on univariate analysis (P value <0.2). To create the subgroups of patient populations to show the bias of these fourth feature set, we used the feature selection node in the models. IBM SPSS modeler. (is node identiﬁes important features based on univariate analysis as well as the frequency of 2. Materials and Methods missing values and the percentage of records with the same value. Table 1 shows the variables in each of these feature 2.1. Population and Data. We extracted data from the sets. Khuzestan COVID-19 registry system belonging to Ahvaz Jundishapure University of Medical Sciences (AJUMS). From the beginning of the pandemic, this registry collects 2.2.3. Data Balancing. Weﬁrstdevelopedourmodelswitha variety of machine learning algorithms on the original data from suspected (based on clinical signs) and conﬁrmed (based on the results of PCR or CT scan) outpatients and dataset(dataset1).Wefoundtheinappropriateperformance Journal of Healthcare Engineering 3 Data extraction form the registry Exclusion of outpatients, suspected patients and patients with less than 24 h hospitalization Missing value imputation Developing test (30%) and train Test dataset (70%) datasets Train dataset Developing 4 different feature sets feature set1 (17) feature set2 (32) feature set3 (40) feature set4 (60) Data balancing Original dataset1 (16.06% death) Dataset 2 (36.5% death) Dataset 3 (49.98% death) Model development and evaluation Model evaluation and comparison (Test data) Figure 1: Overview of the study steps. of these models, in terms of the sensitivity, because of the sets, and developed our models using common machine small number of samples in the death class (83.94% sur- learningalgorithmsthatareusuallyreportedtoperformwell viving vs. 16.06% nonsurviving, ratio �5.23), so the models in medicine including Multiple Layer Perceptron (MLP) did not perform well to predict death. (ere are various neural networks [11, 12, 34], Chi-Squared Detection of methods such as oversampling the minor class or under- Automatic Interaction (CHAID), C5, and Random Forest sampling the major class to solve this problem [11, 12]. We (RF) decision trees [11, 12, 33, 34], Support Vector Machine (SVM) with Radial Basic Function (RBF) kernel [12, 35, 36], oversampled the death cases to create more balanced datasets. Datasets 2 and 3 included 5,133 (36.5%, and Bayesian network [12, 37–39]. ratio �1.74) and 8,938 (49.98%, ratio �1) nonsurviving We ﬁrst developed models based on the default settings patients,respectively.Wedevelopedourmodelswithallfour of parameters. We developed CHAID decision trees with a feature sets on these three datasets. maximumdepthofﬁveandaminimumrecordoftwointhe nodes. Moreover, we implemented the C5 tree with a minimumoftworecordsinnodes.RFwasalsoimplemented 2.3. Model Development and Evaluation. We randomly di- withamaximumdepthof10,andaminimumofﬁverecords videdthedataintotwosets,training(70%)andtesting(30%) in nodes using 100 models. (e SVM model was 4 Journal of Healthcare Engineering Table 1: Diﬀerent feature sets. Feature Number of Method Features set features Age, contact with COVID-19 patients, cough, diabetes, diagnosis only by abnormal CT, diagnosis only by positive PCR, diagnosis by positive PCR and Feature selection node 1 17 abnormal CT, gender, heart diseases, HTN, and ICU. Admission, intubation, (default setting) muscle ache, number of comorbidity, oxygen therapy blood oxygen saturation level, and respiratory distress. Age,cancer,chronickidneydisease,chronicliverdisease,contact(withaprobable or conﬁrmed case in the 14 days before the onset of symptoms), convulsion, cough, diabetes, diagnosis only by abnormal CT, diagnosis only by positive PCR, Univariate analysis diagnosis by positive PCR and abnormal CT, dialysis, diarrhea, dizziness, drug 2 32 (P value <0.05) abuse, gender, headache, heart diseases, HIV/AIDS, HTN, and ICU. Admission, immune diseases, intubation, nervous system diseases, number of comorbidities, other chronic lung diseases, oxygen therapy, paralysis, blood oxygen saturation level, pregnancy, respiratory distress, and unconsciousness. (efeatureset2+asthma,chronichematologydiseases,mentaldisorders,muscle Univariate analysis 3 40 ache, other diseases (comorbidities), drowsiness, gustatory dysfunction, and (P value <0.2) weakness. (e feature set 3+abdominal pain, autoimmune disease, chest pain, chills, constipation, ocular manifestations, fever, GI bleeding, hemoptysis, nausea, 4 All features 60 anorexia, other GI signs, paresis, runny nose, skin manifestations, sore throat, olfactory dysfunction, smoking, sweating, and vomiting. implemented with a regularization parameter of 10 and a the best performing models in external evaluation and the gamma of 0.1. We additionally developed MLPs using the external dataset. diﬀerent number of neurons (5, 10, 15, and 20) in one and two hidden layers and also with the number of neurons 2.6. Analysis. We applied IBM SPSS statistical software suggested by the software. We also implemented the best version 23 for statistical analysis and IBM SPSS modeler CHAID, C5, and MLP with boosting ensemble method and version18todevelopandevaluatemachinelearningmodels. 10-fold cross-validation. Furthermore, we implemented We evaluated and compared the models using confusion stack models (combining individual models) [40]. Our matrix, accuracy, precision, sensitivity, speciﬁcity, F-score, analysis showed that models developed on dataset 3 had and Area under the Curve (AUC). To select the best per- generally better performance. (erefore, we developed stack forming models, we compared the models obtained from models, based on the best individual models, on this dataset each dataset-feature with each other based on AUC and with diﬀerent feature sets. F-score. 2.4. External Validation. For external validation, we 2.7. Ethical Considerations. (is study received ethical ap- extracted 1734 records from the Khuzestan COVID-19 provals from the Ethics Research Committee of Ahvaz registry system. (ese data are from four diﬀerent hospitals Jundishapur University of Medical Sciences (IR.AJUMS. in diﬀerent timeframes. (erefore, these data were not used REC.1400.325). intrainingortestingthemodels.(isdatasetcontained1425 surviving and 309 nonsurviving patients. Inclusion and 3. Results exclusioncriteriaweresimilartothetraining/testingdataset, described in Section 2.1. (e best performing models se- 3.1. Descriptive Data. We extracted data for 10,657 patients lectedfromthepreviousstepandalsoensemblemodelswere from the Khuzestan COVID-19 registry [30]. (e frequency validated using this dataset. of nonsurviving patients (until discharge) was 1711 (16.06%); 8946 patients (83.94%) were discharged alive. 2.5.SubpopulationBiasAnalysis. Previous studies show that Table 2 shows that the death due to COVID-19 was sig- predictive models may have diﬀerent performances against niﬁcantly higher among men, older patients, and those who diﬀerent subpopulations,for example,indiﬀerent sexor age have been in contact with infected individuals. In addition, groups [41, 42]. To assess this eﬀect, we adopted the method respiratory distress, convulsion, altered consciousness, and suggested by Seyyed-Kalantari et al. (ey suggested the use paralysis were more common among the nonsurviving of false-positive rate (FPR) and false-negative rate (FNR) in patients. Conversely, cough, headache, diarrhea, and diz- subpopulations to assess the underdiagnosis and over- ziness were less prevalent among them. Furthermore, ox- diagnosis of machine learning models [41]. We similarly ygen saturation status was better among the recovered calculated FNR and FPR to assess the underprediction or patients versus the dead. Moreover, the comorbidities and overprediction of death in our models. To this end, we used risk factors (excluding pregnancy) as well as the intubation, Journal of Healthcare Engineering 5 Table 2: Comparison of surviving and nonsurviving patients. Variables Alive (n �8946) Dead (n �1711) Total patients (n �10657) P value Age Mean (±SD), years 54±18.3 65.7±16.2 55.88±18.46 <0.0001 Median (Q1, Q3) 56 (42, 67) 67 (57, 77) 58 (43, 69) Sex, male 4611 (51.5) 1010 (59) 5621 (52.7) <0.0001 Contact with infected people (yes) 3169 (35.4) 706 (41.3) 3875 (36.4) <0.0001 Sign and symptoms Cough (yes) 5296 (59.2) 899 (52.5) 6195 (58.1) <0.0001 Respiratory distress (yes) 5021 (56.1) 1288 (75.3) 6309 (59.2) <0.0001 Fever (yes) 4225 (47.2) 802 (46.9) 5027 (47.2) 0.788 Muscle aches (yes) 2417 (27) 426 (24.9) 2843 (26.7) 0.069 Chills (yes) 70 (0.8) 9 (0.5) 79 (0.7) 0.257 Vomiting (yes) 452 (5.1) 79 (4.9) 531 (5) 0.448 Headache (yes) 480 (5.4) 51 (3) 531 (5) <0.0001 Chest pain (yes) 304 (3.4) 61 (3.6) 365 (3.4) 0.728 Diarrhea (yes) 315 (3.5) 40 (2.3) 355 (3.3) 0.012 Sore throat (yes) 48 (0.2) 4 (0.2) 52 (0.5) 0.100 Gustatory dysfunction (yes) 98 (1.1) 10 (0.6) 108 (1) 0.053 Olfactory dysfunction (yes) 123 (1.4) 19 (1.1) 142 (1.3) 0.382 Abdominal pain (yes) 203 (2.3) 31 (1.8) 234 (2.2) 0.237 Runny nose (yes) 8 (0.1) 0 (0.0) 8 (0.1) 0.216 Convulsion (yes) 42 (0.5) 19 (1.1) 61 (0.6) 0.001 Altered consciousness (yes) 213 (2.4) 419 (24.5) 633 (5.9) <0.0001 GI bleeding (yes) 5 (0.1) 0 (0.0) 5 (0.0) 0.417 Skin lesion/rush (yes) 11 (0.1) 3 (0.2) 14 (0.1) 0.584 Dizziness (yes) 249 (2.8) 30 (1.8) 279 (2.6) 0.014 Paresis (yes) 54 (0.6) 11 (0.6) 65 (0.6) 0.848 Paralysis (yes) 22 (0.2) 13 (0.8) 35 (0.3) 0.001 Weakness (yes) 350 (3.9) 80 (4.7) 430 (4) 0.142 Sweating (yes) 11 (0.1) 2 (0.1) 13 (0.1) 0.947 Ocular manifestations (yes) 3 (0.0) 0 (0.0) 3 (0.0) 0.449 Hemoptysis (yes) 6 (0.1) 2 (0.1) 8 (0.1) 0.491 Drowsiness (yes) 3 (0.0) 2 (0.1) 5 (0.0) 0.185 Constipation (yes) 7 (0.1) 1 (0.1) 8 (0.1) 0.784 Nausea (yes) 478 (5.3) 89 (5.2) 567 (5.3) 0.811 Anorexia (yes) 724 (8.1) 138 (8.1) 862 (8.1) 0.969 Other GI symptoms (yes) 7 (0.1) 0 (0.0) 7 (0.1) 0.247 Blood oxygen saturation level (i) Less than 93 2046 (22.9) 934 (54.6) 2980 (28) <0.0001 (ii) More than 93 6900 (77.1) 777 (45.4) 7677 (72) Comorbidity Any comorbidity (yes) 3314 (37) 826 (48.3) 4140 (38.8) <0.0001 Number of comorbidities <0.0001 0 5632 (63) 885 (51.7) 6517 (61.2) 1 1868 (2.9) 391 (22.9) 2259 (21.2) 2 946 (10.6) 275 (16.1) 1221 (11.5) 3 396 (4.4) 112 (6.5) 508 (4.8) >3 104 (1.1) 48 (2.8) 152 (1.5) Number of comorbidities (mean±SD) 0.6±0.9 0.87±1.1 0.65±0.97 <0.0001 Hypertension (yes) 1291 (14.4) 356 (20.8) 1647 (5.5) <0.0001 Heart diseases (yes) 1102 (12.3) 294 (17.2) 1396 (13.11) <0.0001 Diabetes (yes) 1577 (17.6) 376 (22) 1953 (18.3) <0.0001 Immunodeﬁciency diseases (yes) 32 (0.4) 13 (0.8) 45 (0.4) 0.019 Asthma (yes) 198 (2.2) 28 (1.6) 226 (2.1) 0.129 Neurological diseases (yes) 140 (1.6) 49 (2.9) 189 (1.8) <0.0001 Chronic kidney diseases (yes) 289 (3.2) 114 (6.7) 403 (3.8) <0.0001 Dialysis (yes) 78 (0.9) 33 (1.9) 111 (1) <0.0001 Other chronic lung diseases (yes) 136 (1.5) 44 (2.6) 180 (1.7) 0.002 Chronic hematologic diseases (yes) 740 (0.8) 20 (1.2) 94 (0.9) 0.166 Cancer (yes) 172 (1.9) 80 (4.7) 252 (2.4) <0.0001 Autoimmune diseases (yes) 2 (0.0) 0 (0.0) 2 (0.0) 0.536 6 Journal of Healthcare Engineering Table 2: Continued. Variables Alive (n �8946) Dead (n �1711) Total patients (n �10657) P value Chronic liver diseases (yes) 46 (0.5) 16 (0.9) 62 (0.6) 0.036 HIV/AIDS (yes) 7 (0.1) 5 (0.3) 12 (0.1) 0.016 Mental disorders (yes) 26 (0.3) 2 (0.1) 28 (0.3) 0.198 Smoking (yes) 143 (1.6) 33 (1.9) 176 (1.7) 0.326 Drug abuse (yes) 54 (0.6) 21 (1.2) 75 (0.7) 0.005 Other comorbidities (yes) 286 (3.2) 69 (4) 355 (0.0) 0.078 Pregnancy 63 (0.7) 2 (0.1) 65 (0.6) 0.004 Care and treatment Intubation (yes) 308 (3.44) 962 (56.2) 1270 (11.9) <0.0001 ICU care (yes) 1323 (14.8) 1088 (63.6) 2411 (22.6) <0.0001 Oxygen therapy (yes) 2921 (32.7) 682 (39.9) 3603 (33.8) <0.0001 Diagnosis method (i) Only abnormal CT 3197 (35.7) 583 (31.4) 3735 (35) <0.0001 (ii) Only positive PCR 1161 (13) 160 (9.4) 1321 (12.4) <0.0001 (iii) Positive PCR and abnormal CT 4588 (51.3) 1013 (59.2) 5601 (52.6) <0.0001 Signiﬁcant diﬀerence. Table 3: Top 10 models developed on original dataset 1. Setting Feature set Accuracy Sensitivity Speciﬁcity Precision F-score AUC Bayesian network Default 2 91.12 64.7 96.2 76.4 0.701 0.914 CHIAD Default 2 90.76 54 97.8 82.6 0.653 0.909 MLP 2.5.5 boosting 1 90.63 53.6 97.7 81.5 0.647 0.904 MLP Boosting 1.10 3 90.79 54 97.8 82.3 0.652 0.903 C5 Boosting 2 90.7 56.4 97.3 79.9 0.662 0.901 MLP 2.10.10 2 90.55 53.4 97.7 81.5 0.646 0.901 MLP 2.5.5 1 90.31 55.4 97 77.6 0.646 0.901 RF Default 2 84.52 77.5 85.9 51.3 0.617 0.9 MLP 2.20.20 3 90.51 53.6 97.5 80.5 0.643 0.899 Bayesian network Default 1 90.46 55.5 97.1 78.5 0.65 0.899 For MLPs, the numbers for MLP indicate the number of layers, the number of neurons in hidden layer 1, and the number of neurons in hidden layer 2. oxygen therapy at the beginning of hospitalization, and ICU CHAID tree on 32 features, respectively. (e ROC curve for admission were signiﬁcantly higher among the dead. the best models is presented in Supplementary Figure S1. 3.2. 0e Machine Learning Algorithms and 0eir Evaluation. 3.2.2. 0e Machine Learning Algorithms on Dataset 2. (e results of performing various models with diﬀerent (e details on the performance of the models based on settings on three datasets and four feature groups are re- dataset 2 are given in Supplement 1, Tables S6–S9. (e ported as follows. ﬁndings showed that the lowest and highest accuracy were 82.64% (MLP with 60 features) and 87.86% (RF with 60 features), respectively. Moreover, the minimum and maxi- 3.2.1. 0e Machine Learning Algorithms on Original Dataset mum values of the AUC were 0.888 (MLP with 60 features) 1. (edetailsontheperformanceofthemodelsaregivenin and0.942(SVMwith60features),respectively.Accordingto Supplement 1 (Tables S2–S5). (e result showed that the theﬁndings,thesensitivityforpredictingdeathwasbetween lowest and highest accuracy of the models based on the 0.658 (MLP network) and 0.861 (CHAID tree with 32 original dataset 1 were 84.52% (RF with 32 features) and features). (e best results obtained for each algorithm based 91.12% (Bayesian network with 32 features), respectively. In on dataset 2 were shown in Supplementary Figure S2. addition, the minimum and maximum AUC were 0.757 (C5 According to Table 4, SVM and C5 models had the best with 32 features) and 0.914 (Bayesian network with 32 performance on 60 and 40 features, respectively. features), respectively. According to the ﬁndings, the sen- sitivity for predicting death based on original dataset 1 was low and between 0.484 (MLP network with 60 features) and 3.2.3. 0e Machine Learning Algorithms on Dataset 3. 0.775 (RF with 32 features) which indicates that the sensi- (e details on the performance of the models based on tivity of the models on imbalanced data is not appropriate. dataset 3 are given in Supplement 1, Tables S10–S13. (e Table 3 shows the results of the performance of the top 10 results showed that the lowest and highest accuracy were models based on the test data of dataset 1. According to the 81.27% (CHIAD tree with 32 features) and 92.77% (C5 with table,thebesttwomodelsweretheBayesiannetworkandthe 60 features), respectively. Moreover, the minimum and Journal of Healthcare Engineering 7 Table 4: Top 10 models developed on dataset 2. Settings Feature set Accuracy Sensitivity Speciﬁcity Precision F-score AUC SVM RBF default 4 87.83 83.4 90.3 82.9 0.832 0.942 C5 Boosting 3 87.44 81.8 90.6 82.7 0.822 0.94 SVM RBF default 3 87.59 82.7 90.3 82.4 0.826 0.938 C5 Boosting 4 87.88 79.9 92.4 85.5 0.826 0.938 RF Default 4 87.86 85.7 89.1 81.5 0.836 0.931 C5 Boosting 2 86.68 78.5 91.5 84.3 0.813 0.927 C5 Boosting 1 85.99 77.2 90.8 82.2 0.797 0.926 SVM RBF default 2 86.61 79 91.1 83.7 0.813 0.926 MLP 1.10 3 85.38 77 90 80.9 0.789 0.923 RF Default 1 85.26 85.2 85.3 76.2 0.804 0.923 For MLPs, the numbers for MLP indicate the number of layers, the number of neurons in hidden layer 1, and the number of neurons in hidden layer 2. Table 5: Top 10 models developed on dataset 3. Settings Feature set Accuracy Sensitivity Speciﬁcity Precision F-score AUC C5 Boosting 4 92.77 95.1 90.5 90.8 0.929 0.972 C5 Boosting 3 91.74 93.6 89.8 90.5 0.92 0.965 C5 Boosting 2 91.18 94.2 88 89.1 0.916 0.96 SVM RBF default 4 90.16 92.7 87.7 88.1 0.903 0.956 C5 Boosting 1 89.28 91.3 87.3 87.7 0.895 0.952 SVM RBF default 3 88.81 90.5 87.1 87.9 0.892 0.944 MLP 2.15.15 boosting 3 88.59 90.2 86.9 87.7 0.889 0.94 MLP 2.12.12 boosting 4 87.61 88.5 86.8 86.8 0.876 0.938 C5 Default 3 87.4 89.8 85 86.1 0.879 0.934 SVM RBF default 2 86.34 86.6 86.1 86.6 0.866 0.932 For MLPs, the numbers for MLP indicate the number of layers, the number of neurons in hidden layer 1, and the number of neurons in hidden layer 2. Table 6: Ensemble models developed on dataset 3. ID Included models Feature set Accuracy Sensitivity Speciﬁcity Precision F-score AUC 1 Table S10 1 86.10 0.799 0.924 0.914 0.853 0.954 2 Table S11 2 87.39 0.859 0.889 0.888 0.873 0.954 3 Table S12 3 87.26 0.831 0.915 0.908 0.867 0.954 4 Table S13 4 89.13 0.864 0.919 0.916 0.890 0.961 maximum AUC were 0.899 (CHIAD with 32 features) and (Table 5) using an external dataset. As shown in Table 7, C5 0.972 (C5 with 60 features), respectively. (e sensitivity for boosting models with feature sets 1 and 2 have better scores. predicting death was also between 0.752 (MLP with 60 features) and 0.951 (C5 tree with 60 features). (e best 3.5. Subpopulation Bias Analysis. We selected the four best results obtained for each algorithm based on dataset 3 are models based on external validation for subpopulation bias shown in Supplementary Figure S3. According to Table 5, analysis(Supplement1,TableS14).Figures2and3showthe the C5 model had the best performance with diﬀerent FPR and FNR of these models. As these ﬁgures indicate, features, and SVM with 60 features was also one of the mostofthesemodelsbetterperformonfemalepatientsthan optimal models. male patients. Furthermore, the performance of these models decreases in older patients. As for FPR, Figure 2 indicates that SVM and C5 (feature set 2) have a less biased 3.3. Ensemble Models. Table 6 indicates that the best en- prediction in terms of gender and age groups. Additionally, semble model had 89.13% accuracy and 0.961 AUC. Figure 3 shows that C5 (feature set 2) has a less biased However, the comparison of these models with the corre- prediction. sponding individual models (Table 5) shows that C5 models have better performance than these ensemble models, even though these ensemble models are better than other indi- 3.6. Comparison of the Models. A comparison of the models showed that, with the balancing of the data, the sensitivity vidual models. andAUCincreased.However,theaccuracybasedondataset 2 decreased, but it also increased based on dataset 3. Fur- 3.4. External Validation. We evaluated all ensemble models thermore, models with 60 and 40 features performed better. (Table 6) and the top 10 models developed on dataset 3 In general, the C5 model with 60 features outperformed the 8 Journal of Healthcare Engineering Table 7: External validation on dataset 3. Models Settings Feature set Accuracy Sensitivity Speciﬁcity Precision F-score AUC C5 Boosting 1 92.56 0.955 0.919 0.720 0.821 0.974 C5 Boosting 2 91.81 0.964 0.908 0.695 0.808 0.98 SVM RBF default 3 91.00 0.848 0.924 0.706 0.771 0.955 Ensemble 2 — 2 87.77 0.861 0.881 0.611 0.715 0.954 SVM RBF default 2 88.24 0.890 0.881 0.618 0.729 0.953 Ensemble 1 — 1 88.75 0.819 0.902 0.645 0.722 0.949 C5 Boosting 3 86.51 0.935 0.850 0.575 0.712 0.948 Ensemble 3 — 3 88.18 0.783 0.903 0.637 0.702 0.931 MLP 2.15.15 boosting 3 87.95 0.767 0.904 0.634 0.694 0.914 MLP 2.12.12 boosting 4 87.31 0.754 0.899 0.618 0.679 0.914 Ensemble 4 — 4 86.62 0.770 0.887 0.596 0.672 0.91 C5 Boosting 4 85.64 0.748 0.880 0.575 0.650 0.889 C5 Default 3 85.24 0.780 0.868 0.562 0.653 0.887 SVM RBF default 4 83.79 0.725 0.862 0.533 0.615 0.868 For MLPs, the numbers for MLP indicate the number of layers, the number of neurons in hidden layer 1, and the number of neurons in hidden layer 2. Subgroup FPR Subgroup FPR 0.4 0.4 0.26 0.3 0.3 0.25 0.2 0.15 0.2 0.16 0.12 0.11 0.07 0.06 0.1 0.08 0.07 0.1 0.04 0.04 0.05 0.03 (a) (b) Subgroup FPR Subgroup FPR 0.38 0.4 0.4 0.3 0.3 0.2 0.18 0.2 0.2 0.14 0.1 0.09 0.1 0.1 0.06 0.06 0.07 0.1 0.04 0.04 0.04 (c) (d) Figure 2: Subgroup false-positive rate (FPR) for diﬀerent models. (a) C5 model on feature set 1. (b) C5 model on feature set 2. (c) SVM model on feature set 3. (d) Ensemble model on feature set 2. restbasedonallevaluationindicators;however,basedonthe cough,unconsciousness,positivePCR,andabnormalCTare external validation, C5 boosting models with feature sets 1 considered the most important death predictors by this (17features)and2(32features)havebetterexternalvalidity. model. Subpopulation analysis suggests that the C5 boosting model with 32 features has less bias. 4. Discussion Intheﬁrststageofthestudy,theriskfactorsfordeathdueto 3.7. Variable Importance. Figure 4 shows the importance of COVID-19 were discovered using univariate analysis. (en, each variable in the selected model (C5). As indicated, in- based on the important features, diﬀerent machine learning tubation, number of comorbidities, age, gender, respiratory models were developed to predict death. (e results showed distress, blood oxygen saturation level, ICU admission, signiﬁcant diﬀerences between recovered and nonrecovered Male Male Female Female 0-20 0-20 21-40 21-40 41-60 41-60 61-80 61-80 81+ 81+ Male Male Female Female 0-20 0-20 21-40 21-40 41-60 41-60 61-80 61-80 81+ 81+ Journal of Healthcare Engineering 9 Subgroup FNR Subgroup FNR 0.4 0.4 0.33 0.3 0.3 0.2 0.2 0.1 0.09 0.1 0.1 0.06 0.06 0.05 0.05 0.04 0.04 0.03 00 0 0 0 (a) (b) Subgroup FNR Subgroup FNR 0.4 0.4 0.33 0.3 0.3 0.24 0.22 0.19 0.2 0.2 0.16 0.15 0.14 0.13 0.12 0.12 0.1 0.1 0.1 0.1 (c) (d) Figure 3: Subgroup false-negative rate (FNR) for diﬀerent models. (a) C5 model on feature set 1. (b) C5 model on feature set 2. (c) SVM model on feature set 3. (d) Ensemble model on feature set 2. Predictor Importance Target: Survival Intubation NumberComorbidity Age Gender Respiratory distress oxygen saturation level ICU Cough Unconscious Diagnosis_positive PCR and Abnormal CT 0.0 0.2 0.4 0.6 0.8 1.0 Diagnosis_positivePCRandAbnoramlCT Intubation Least Important Most Important Figure 4: Variable importance of the selected model. patients in terms of age, sex, contact with infected people, [43],andlowoxygensaturation[17,18,23,43]increasedcases respiratory distress, convulsion, altered consciousness, pa- of death due to COVID-19. Some researchers indicate that high blood pressure, heart disease, cancer, kidney disease ralysis, blood oxygen saturation level, the number of comorbidities, intubation, oxygen therapy, and the need for [16,17],diabetes[18],cerebrovasculardiseases[28],smoking ICU services. [18, 23], and asthma [16] increased mortality from COVID- Wefoundthatintubation,numberofcomorbidities,age, 19. However, our model did not consider these factors sig- gender, respiratory distress, blood oxygen saturation level, niﬁcant. It is worth mentioning that these risk factors in- ICU admission, cough, unconsciousness, positive PCR, and creased the number of comorbidities in a patient and this abnormalCTarethemostimportantdeathpredictors.Other factor was also considered signiﬁcant in the C5 model. studies showed that age [17, 18, 23, 27, 28, 43], male gender We developed various models with diﬀerent features to [43],respiratorydisease[16,17],thenumberofcomorbidities predictdeathfromCOVID-19.Basedontheresults,thebest Male Male Female Female 0-20 0-20 21-40 21-40 41-60 41-60 61-80 61-80 81+ 81+ Male Male Female Female 0-20 0-20 21-40 21-40 41-60 41-60 61-80 61-80 81+ 81+ 10 Journal of Healthcare Engineering Table 8: Some machine learning models suggested in the literature to predict death from COVID-19. Number of patients, death rate, number of Author Models Accuracy AUC features Decision tree (DT) 99.85 NA LR 97.49 NA SVM 98.85 NA Muhammad et al. [44] 1505, NA, 4 Naive Bayes 97.52 NA RF 99.60 NA KNN 98.06 NA RF 87.93 0.94 ANN 89.98 0.93 Pourhomayoun and SVM 89.02 0.88 307382, NA, 57 Shakibi [22] KNN 89.83 0.90 LR 87.91 0.92 DT 86.87 0.93 Gradient boosting 88.9 0.939 decision tree, 83 features Li et al. [20] 2924, 8.8%, diﬀerent features (83, 152, 5) LR, 152 features 86.8 0.928 LR, 5 features 88.7 0.915 Adaboost, gradient Goncalves and Rouco NA 0.919 827601, 8.7%, 3 boosting, and RF [21] LR NA 0.917 SVM linear 91.9 0.962 LASSO 91.1 0.963 LASSO (14 days) 86.8 0.944 An et al. [16] 8000, 2.2%, 10 SVM linear (14 days) 87.7 0.941 LASSO (30 days) 89.5 0.953 SVM linear (30 days) 87.7 0.948 XGBoost (17 and 3 Yadaw et al. [18] 3841, 8.1%, 17 and 3 NA 0.91 features) F1: Yan et al. [19] 375, 35%, 3 XGBoost 90 0.97 SVM 95.8 0.976 ANN 95.6 0.976 Gao et al. [43] 2160, 11%, 14 Ensemble 95.5 0.976 LR 95.4 0.974 GBDT 94.8 0.953 (192,26%) only criticallyill patients,47 (17 93 (47 features) 87.8 (17 Chen et al. [28] SVM linear NA nonlaboratory, 30 laboratory) features) 85.6 (30 features) Booth et al. [45] 398, 10.8%, 5 SVM-RBF 93 Parchure et al. [17] 567, 17.8%, 55 RF 65.5 85.5 Zhao et al. [23] 641, 12.8%, 47 LR NA 0.82 LR 96.5 0.83 SVM 97 0.825 Das et al. [27] 3524, 2.1%, 4 KNN 92.4 0.759 RF 92.4 0.787 Gradient boosting 97.1 0.787 Chen et al. [25] 1002 severe and critical cases, 16.1%, 7 LR NA 0.903 F1: Deep neural network 0.970 0.985 Khan et al. [26] 103888, 5.7%, 15 RF, XGBoost 0.946 0.972 LR, DT 0.945 0.972 KNN 0.944 0.971 (ese studies did not report the AUC. performance was related to the C5 decision tree with 32 (demographic, laboratory, radiographic, therapeutic, signs features. In the same way, several studies tried to develop and symptoms, and comorbidities) and datasets are used, it machinelearningmodelsforpredictingdeathfromCOVID- is not easy to compare the studies. For example, some re- 19 [16–23, 25–28, 43–45]. Since a variety of variables searchersusedlaboratorydatatodevelopmodelsinaddition Journal of Healthcare Engineering 11 to other variables [17, 23, 28, 43], and a study applied only 5. Conclusions laboratory variables [45]. In another study, vital signs and Diﬀerent machine learning models were developed to imaging results were used to develop models [23]. However, predict the likelihood of death caused by COVID-19. (e the variables used in our study were similar to most of the best prediction model was the C5 decision tree (accu- studies. Despite this, a comparison of our study with pre- racy �91.18%, AUC �0.96, and F �0.916). (erefore, this vious studies showed that the performance of our selected model can be used to detect high-risk patients and improve model was better than those models (Table 8). (e model theuseoffacilities,equipment,andmedicalpractitionersfor developed by Gao et al. [43] has better performance patients with COVID-19. (AUC �0.976 vs. AUC �0.972); however, this model was developed with small sample size. In addition, the F-score (F �0.97) of the model developed by Yan et al. [19] was Data Availability higher than our selected model. However, Barish et al. [46] showed that Yan’s model did not have a good result in the (e data used to support the ﬁndings of this study are re- external validation. Khan’s model [26] also has a higher stricted by the Ethics Research Committee of Ahvaz Jun- F-score than our model. Khan et al. and Gao et al. used dishapur University of Medical Sciences in order to protect unbalanced data; Barish et al. [46] have shown that models patient privacy. developed based on unbalanced data to predict death from COVID-19 may not have accurate results in the real Conflicts of Interest environment. We found that machine learning models perform (e authors declare that there are no conﬂicts of interest. diﬀerently in subpopulations in terms of gender and age groups. Other studies similarly show that predictive Authors’ Contributions models have diﬀerent performances in diﬀerent ethnic groups, genders, and age groups of patients and patients J. Zarei and A. Jamshidnezhad contributed to conceptual- with diﬀerent insurance [41, 42]. (erefore, researchers ization, data curation, and writing—review and editing. and clinicians should apply these models to diﬀerent M. H. Shoushtari contributed to conceptualization, meth- population groups cautiously. Moreover, developing odology, writing—review and editing. A. Hadianfard and models for diﬀerent patient groups may be necessary. M. Cheraghi contributed to conceptualization and wri- (e strengths of our model are the use of demographic ting—review and editing. A. Sheikhtaheri contributed to data, symptoms, and comorbidities that can be easily conceptualization, methodology, data analysis, supervision, collected. Despite some previous studies, we did not use writing—theoriginaldraft,andwriting—reviewandediting. laboratory, treatment, and imaging data. It can be con- All authors reviewed the ﬁnal version of the manuscript and sidered a limitation. However, we supposed that all pa- approved it to submit. (is study was counducted based on tients received almost similar treatments. Moreover, the Khuzestan COVID-19 registry data. We would like to applying models which are developed based on treatment thank Khuzestan COVID-19 registry for providing data for data may be diﬃcult because of changes in patients’ this study. treatment. Furthermore, models that depend on labora- tory and imaging data require a lot of time and cost to Acknowledgments gather these data to use the model in a real clinical en- vironment. A comparison of our study with those that (is study was supported by Ahvaz Jundishapur University used laboratory and imaging data (Table 8) indicates that of Medical Sciences. (e funder had no role in the study our selected model outperforms many of these models. A design; data collection, analysis, and interpretation; writing study also indicated that imaging data did not aﬀect the of the report; and the decision to submit. performance of machine learning models to predict death from COVID-19 [23]. In addition, the data used in our Supplementary Materials study have been collected from 38 hospitals, which is the strength of the study. A similar study indicated that up to Supplement 1: detailed Tables S1–S14. Supplement 2: Fig- 20%ofmissingdatainCOVID-19studiesisacceptablefor ures S1–S3. (Supplementary Materials) developing machine learning models [18]; however, the missing rate in our study was under 4%. Despite the strengths, some limitations should be References considered. Firstly, we only analyzed the subpopulation [1] World Health Organization, “WHO coronavirus (COVID- bias based on gender and age groups. Future studies 19) dashboard,” WHO, Geneva, Switzerland, 2022, https:// should consider other variables in this analysis. Fur- covid19.who.int. thermore, there are several well-established models such [2] A. C. Darby and J. A. Hiscox, “Covid-19: variants and vac- as APACHE and SOFA [41, 42]. Researchers are rec- cination,” BMJ, vol. 372, p. n771, 2021. ommended to compare the performance of machine [3] I. Cosic, D. Cosic, and I. Loncarevic, “Analysis of mutated learning models with these models to predict deaths from SARS-CoV-2 variants using resonant recognition model,” COVID-19. International Journal of Sciences,vol.10,no.7,pp.6–11,2021. 12 Journal of Healthcare Engineering [4] L. Samaranayake and K. S. Fakhruddin, “SARS-CoV-2 vari- [20] S. Li, Y. Lin, T. Zhu et al., “Development and external ants and COVID-19: an overview,” Dental Update, vol. 48, evaluation of predictions models for mortality of COVID-19 no. 3, pp. 235–238, 2021. patients using machine learning method,” Neural Computing [5] V. S. Malik, K. Ravindra, S. V. Attri, S. K. Bhadada, and & Applications, pp. 1–10, 2021. M.Singh,“Higherbodymassindexisanimportantriskfactor [21] C. P. Goncalves and J. Rouco, “Comparing decision tree- in COVID-19 patients: a systematic review and meta-analy- based ensemble machine learning models for COVID-19 death probability proﬁling,” Journal of Vaccines & Vaccina- sis,” Environmental Science and Pollution Research, vol. 27, no. 33, Article ID 42123, 2020. tion, vol. 12, no. 1, Article ID 1000441, 2021. [22] M. Pourhomayoun and M. Shakibi, “Predictingmortality risk [6] M. Parohan, S. Yaghoubi, A. Seraji, M. Javanbakht, P. Sarraf, and M. Djalali, “Risk factors for mortality in patients with in patients with COVID-19 using machine learning to help Coronavirusdisease2019(COVID-19)infection:asystematic medical decision-making,” Smart Health, vol. 20, Article ID review and meta-analysis of observational studies,” 0e Aging 100178, 2021. Male, vol. 34, no. 5, pp. 1–9, 2020. [23] Z. Zhao, A. Chen, W. Hou et al., “Prediction model and risk [7] R. H. Li and H. H. Sigurslid, “Predictors of mortality in scores of ICU admission and mortality in COVID-19,” PLoS hospitalized COVID-19 patients: a systematic review and One, vol. 15, no. 7, Article ID e0236618, 2020. [24] C. Ieracitano, N. Mammone, M. Versaci et al., “A fuzzy- meta-analysis,” vol. 92, no. 10, pp. 1875–1883, 2020. [8] K. Mackey, C. K. Ayers, K. K. Kondo et al., “Racial and ethnic enhanced deep learning approach for early detection of Covid-19 pneumonia from portable chest X-ray images,” disparities in COVID-19-related infections, hospitalizations, and deaths,” Annals of Internal Medicine, vol. 174, no. 3, Neurocomputing, vol. 481, pp. 202–215, 2022. pp. 362–373, 2021. [25] B.Chen,H.-Q.Gu,Y.Liuetal.,“Amodeltopredicttheriskof [9] J.Li,D.Q.Huang,B.Zouetal.,“EpidemiologyofCOVID-19: mortality in severely ill COVID-19 patients,” Computational a systematic review and meta-analysis of clinical character- and Structural Biotechnology Journal, vol. 19, pp. 1694–1700, istics,riskfactors,andoutcomes,” Journal of Medical Virology, 2021. vol. 93, no. 3, pp. 1449–1458, 2021. [26] I. U. Khan, N. Aslam, M. Aljabri et al., “Computational in- [10] W.M.Shaban,A.H.Rabie,A.I.Saleh,andM.A.Abo-Elsoud, telligence-based model for mortality rate prediction in “A new COVID-19 patients detection strategy (CPDS) based COVID-19 patients,” International Journal of Environmental Research and Public Health, vol. 18, no. 12, p. 6429, 2021. on hybrid feature selection and enhanced KNN classiﬁer,” Knowledge-Based Systems, vol. 205, Article ID 106270, 2020. [27] A. K. Das, S. Mishra, and S. Saraswathy Gopalan, “Predicting [11] A. Sheikhtaheri, A. Orooji, A. Pazouki, and M. Beitollahi, “A CoVID-19 community mortality risk using machine learning clinical decision support system for predicting the early and development of an online prognostic tool,” PeerJ, vol. 8, complications of one-anastomosis gastric bypass surgery,” Article ID e10083, 2020. Obesity Surgery, vol. 29, no. 7, pp. 2276–2286, 2019. [28] Y. Chen, Z. Linli, Y. Lei et al., “Risk factors for mortality in [12] A. Sheikhtaheri, M. R. Zarkesh, R. Moradi, and F. Kermani, critically ill patients with COVID-19 in Huanggang, China: a “Prediction of neonatal deaths in NICUs: development and single-center multivariate pattern analysis,” Journal of Med- validation of machine learning models,” BMC Medical In- ical Virology, vol. 93, no. 4, pp. 2046–2055, 2020. [29] M. Ghafari, A. Kadivar, and A. Katzourakis, “Excess deaths formatics and Decision Making, vol. 21, no. 1, p. 131, 2021. [13] H. B. Syeda, M. Syed, K. W. Sexton et al., “Role of machine associated with the Iranian COVID-19 epidemic: a province- learning techniques to tackle the COVID-19 crisis: systematic level analysis,” International Journal of Infectious Diseases, review,” JMIR medical informatics, vol. 9, no. 1, Article ID vol. 107, pp. 101–115, 2021. e23811, 2021. [30] J. Zarei, M. Dastoorpoor, A. Jamshidnezhad, M. Cheraghi, [14] P.Pan,Y.Li,Y.Xiaoetal.,“Prognosticassessmentofcovid-19 and A. Sheikhtaheri, “Regional COVID-19 registry in Khu- intheintensivecareunitbymachinelearningmethods:model zestan, Iran: a study protocol and lessons learned from a pilot development and validation,” Journal of Medical Internet implementation,” Informatics in Medicine Unlocked, vol. 23, Research, vol. 23, no. 3, Article ID e23128, 2020. Article ID 100520, 2021. [15] L. Ryan, C. Lam, S. Mataraso et al., “Mortality prediction [31] A.Abdoli,“Iran,sanctions,andtheCOVID-19crisis,” Journal model for the triage of COVID-19, pneumonia, and of Medical Economics, vol. 23, no. 12, pp. 1–8, 2020. [32] A. Murphy, Z. Abdi, I. Harirchi, M. McKee, and mechanically ventilated ICU patients: a retrospective study,” Annals of Medicine and Surgery, vol. 59, pp. 207–216, 2020. E. Ahmadnezhad, “Economic sanctions and Iran’s capacity to [16] C. An, H. Lim, D. W. Kim, J. H. Chang, Y. J. Choi, and respond to COVID-19,” 0e Lancet Public Health, vol. 5, S. W. Kim, “Machine learning prediction for mortality of no. 5, p. e254, 2020. patients diagnosed with COVID-19: a nationwide Korean [33] J. Han, J. Pei, and M. Kamber, Data mining: concepts and cohort study,” Scientiﬁc Reports, vol. 10, no. 1, Article ID techniques, Elsevier, Amsterdam, Netherland, 2011. 18811, 2020. [34] H. Bhavsar and A. Ganatra, “A comparative study of training [17] P. Parchure, H. Joshi, K. Dharmarajan et al., “Development algorithms for supervised machine learning,” International and validation of a machine learning-based prediction model Journal of Soft Computing and Engineering, vol. 2, no. 4, pp. 2231–2307, 2012. for near-term in-hospital mortality among patients with COVID-19,” BMJ Supportive & Palliative Care, 2020. [35] H. C. Koh and G. Tan, “Data mining applications in [18] A. S. Yadaw, Y.-c. Li, S. Bose, R. Iyengar, S. Bunyavanich, and healthcare,” Journal of Healthcare Information Management, G. Pandey, “Clinical features of COVID-19 mortality: de- vol. 19, no. 2, p. 65, 2011. velopment and validation of a clinical prediction model,” 0e [36] D. Senthilkumar and S. Paulraj, “Prediction of low birth Lancet Digital Health, vol. 2, no. 10, pp. e516–e525, 2020. weight infants and its risk factors using data mining tech- [19] L. Yan, H.-T. Zhang, J. Goncalves et al., “An interpretable niques,” in Proceedings of the International Conference on mortality prediction model for COVID-19 patients,” Nature Industrial Engineering and Operations Management, Dubai, Machine Intelligence, vol. 2, no. 5, pp. 283–288, 2020. U. A. Emirates (UAE), March 2015. Journal of Healthcare Engineering 13 [37] N. Friedman, D. Geiger, and M. Goldszmidt, “Bayesian network classiﬁers,” Machine Learning, vol. 29, no. 2-3, pp. 131–163, 1997. [38] J. Cheng and R. Greiner, “Comparing Bayesian network classiﬁers,” in Proceedings of the Fifteenth conference on Uncertainty in artiﬁcial intelligence, pp. 101–108, Morgan Kaufmann Publishers Inc, Stockholm Sweden, August 1999. [39] F. V. Jensen, An introduction to Bayesian networks, UCL press, London, United Kingdom, 1996. [40] Z. Zhang, L. Chen, P. Xu, and Y. Hong, “Predictive analytics with ensemble modeling in laparoscopic surgery: a technical note,” Laparoscopic, Endoscopic and Robotic Surgery, vol. 5, no. 1, pp. 25–34, 2022. [41] L. Seyyed-Kalantari, H. Zhang, M. B. A. McDermott, I. Y. Chen, and M. Ghassemi, “Underdiagnosis bias of arti- ﬁcial intelligence algorithms applied to chest radiographs in under-served patient populations,” Nature Medicine, vol. 27, no. 12, pp. 2176–2182, 2021. [42] R.Sarkar,C.Martin,H.Mattie,J.W.Gichoya,D.J.Stone,and L.A.Celi,“Performanceofintensivecareunitseverityscoring systemsacrossdiﬀerentethnicitiesintheUSA:aretrospective observational study,” 0e Lancet Digital Health, vol. 3, no. 4, pp. e241–e249, 2021. [43] Y. Gao, G. Y. Cai, W. Fang et al., “Machine learning based early warning system enables accurate mortality risk pre- diction for COVID-19,” Nature Communications, vol. 11, no. 1, pp. 5033–5110, 2020. [44] L. J. Muhammad, M. M. Islam, S. S. Usman, and S. I. Ayon, “Predictive data mining models for novel coronavirus (COVID-19) infected patients’ recovery,” SN Computer Sci- ence, vol. 1, no. 4, p. 206, 2020. [45] A. L. Booth, E. Abels, and P. McCaﬀrey, “Development of a prognostic model for mortality in COVID-19 infection using machine learning,” Modern Pathology, vol. 34, no. 3, pp. 522–531, 2020. [46] M. Barish, S. Bolourani, L. F. Lau, S. Shah, and T. P. Zanos, “External validation demonstrates limited clinical utility of the interpretable mortality prediction model for patients with COVID-19,” Nature Machine Intelligence, vol. 3, no. 1, pp. 25–27, 2021.

Journal

Journal of Healthcare Engineering – Hindawi Publishing Corporation

Published: Jun 23, 2022

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Machine Learning Models to Predict In-Hospital Mortality among Inpatients with COVID-19: Underestimation and Overestimation Bias Analysis in Subgroup Populations

Machine Learning Models to Predict In-Hospital Mortality among Inpatients with COVID-19: Underestimation and Overestimation Bias Analysis in Subgroup Populations

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Machine Learning Models to Predict In-Hospital Mortality among Inpatients with COVID-19: Underestimation and Overestimation Bias Analysis in Subgroup Populations

Machine Learning Models to Predict In-Hospital Mortality among Inpatients with COVID-19: Underestimation and Overestimation Bias Analysis in Subgroup Populations

References (48)

Abstract

Journal

Recommended Articles

There are no references for this article.

Our policy towards the use of cookies