Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Neighborhood Optimization for Therapy Decision Support

Neighborhood Optimization for Therapy Decision Support Current Directions in Biomedical Engineering 2019;5(1):1–4 Felix Gräßer*, Hagen Malberg, and Sebastian Zaunseder Neighborhood Optimization for Therapy Decision Support https://doi.org/10.1515/cdbme-2019-0001 treatment recommendations. To that end, this work transfers methodologies from the field of Recommender Systems (RS) Abstract: This work targets the development of a research to the CDSS domain. RS are widely applied in other neighborhood-based Collaborative Filtering therapy recom- domains such as e-commerce or music and movie streaming mender system for clinical decision support. The proposed services. In those applications, sophisticated and specialized algorithm estimates outcome of pharmaceutical therapy op- approaches were developed over the recent years to provide tions in order to derive recommendations. Two approaches, a target user with personalized product recommendations [2]. namely a Relief-based algorithm and a metric learning ap- Such methods can be capable of meeting both the stated relia- proach are investigated. Both adapt similarity functions to bility and interpretability requirements. Specifically, this work the underlying data in order to determine the neighborhood deals with finding a similarity function optimized for the data incorporated into the filtering process. The implemented ap- at hand, which is fundamental for the widely used class of proaches are evaluated regarding the accuracy of the outcome neighborhood-based RS algorithms. estimations. The metric learning approach can outperform the Relief-based algorithms. It is, however, inferior regarding ex- plainability of the generated recommendations. 2 Materials and Methods Keywords: Clinical Decision Support System, CDSS, Ther- 2.1 Psoriasis Data apy Recommender System, Neighborhood Optimization. The CDSS algorithms proposed in this work are evaluated on 1 Introduction the basis of a clinical dataset consisting of 1242 consultation representations 𝑋 from 239 patients suffering from various Clinical decision support systems (CDSS) are intended to types of the skin disease Psoriasis [3]. Each consultation rep- provide assistance for personalized diagnosis and treatment resentation 𝑥 incorporates the individual therapy history, as decisions [1]. Suchlike systems are expected to play an in- well as demographic and condition related data, adding up creasingly important role in future healthcare. Especially data- to a total of 125 attributes. The level of measurement of the driven approaches, employing data mining and machine learn- present data ranges from binary and nominal qualitative at- ing techniques to exploit the large volume of daily captured tributes to ordinal and ratio scaled quantitative attributes. The and widely unused clinical data, promise to open up new per- overall objective of the therapy RS is to predict the numerically spectives. In contrast to expert systems, which derive recom- decoded outcomes 𝑦 of 7 systemic pharmaceutical therapy op- mendations or suggestions using knowledge stored in rule sets tions based on 𝑥 in order to provide the treating physician with (if-then rules), data-driven approaches are supposed to be ca- a ranked list of therapies. pable of extracting knowledge automatically from the avail- able data [1]. However, in order to facilitate a high degree of acceptance among medical practitioners, such approaches 2.2 Collaborative Filtering for Therapy are required to provide reliable and interpretable decision sup- Decision Support port. This work aims at developing a CDSS which supplies the attending physician with individualized and patient-specific Deriving recommendations based on the local neighborhood of a target user is a straightforward and efficient approach de- *Corresponding author: Felix Gräßer, Institute of Biomedical noted as Collaborative Filtering (CF) [2]. CF identifies users Engineering, Technical University Dresden, Dresden, Germany, with similar taste by comparing purchase histories or product e-mail: felix.graesser@tu-dresden.de ratings and derives potentially most preferred products. This Hagen Malberg, Institute of Biomedical Engineering, Technical approach was transferred to therapy recommendation in our University Dresden, Dresden, Germany previous work [3]. Here, consultations where compared using Sebastian Zaunseder, Department of Information Technology, University of Applied Sciences and Arts Dortmund, Dortmund, representations as introduced above in order to derive treat- Germany Open Access. © 2019 Felix Grä ä ß ß er, Hagen Malberg, Sebastian Zaunseder published by De Gruyter. T his work is licensed under the Creative C ommons Attribution 4.0 License. F. Gräßer et al., Neighborhood Optimization for Therapy Decision Support 0.5 - 0.9 0.8 0.7 lem where each instance is associated with a distinct class. In 0.7 0.7 0.6 - 0.5 0.6 the present setting this corresponds to a priori assumptions re- garding similarity or dissimilarity. Each consultation is charac- terized by a numeric outcome indicator associated with the ap- plied treatment option and unknown outcome for all other op- tions which have not been applied (unobserved ground truth). - 0.7 0.8 0.9 0.8 Consequently, assumptions regarding similarity or dissimilar- ity between a pair of consultations can only be derived from those consultations which applied therapies in common and for which in both cases outcome is known. Figure 2 assumes 𝑡𝑟𝑎𝑖𝑛 a training consultation representation 𝑥 which is associ- ated with a treatment which showed good response (> 0.5). 𝑡𝑒𝑠𝑡 Fig. 1: Outcomes 𝑦^ of treatment options 𝑡 are estimated for a 𝑡𝑟𝑎𝑖𝑛 Thus, neighboring consultations 𝑥 are labeled as simi- test consultation 𝑛 based on all outcomes observed in the treat- 𝑡𝑟𝑎𝑖𝑛 lar to 𝑥 if the same treatment is present in the outcome- ment history of the 𝐾 most similar training data consultations. 𝑡𝑟𝑎𝑖𝑛 𝑡𝑟𝑎𝑖𝑛 consultation vector 𝑎 holding the treatment history of 𝐴 accumulates the numerically encoded outcomes of all pre- 𝑡𝑟𝑎𝑖𝑛 viously applied treatment options, i.e. the treatment history of the 𝑥 and if this treatment also has shown good outcome. 𝑡𝑟𝑎𝑖𝑛 training consultations. In 𝐴 rows represent treatment options Conversely, neighboring consultations are labeled as dissim- and columns training data consultations. 𝑡𝑟𝑎𝑖𝑛 ilar to consultation 𝑛 if the same treatment is present in 𝑎 but this treatment has shown bad response (≤ 0.5). Regarding 𝑡𝑟𝑎𝑖𝑛 neighboring consultation representations 𝑥 for which is ment recommendations with potentially good outcome. 𝑡𝑒𝑠𝑡 true that the in 𝑛 applied therapy was never applied, no infor- Therefore, the numerically decoded outcomes 𝑦 ∈ [0, 1] mation regarding the similarity label is available. of potential treatment options, ranging from bad to good re- sponse, are estimated for a test patient and consultation 𝑛. To do so, outcomes observed in the treatment history of the 𝐾 3.2 Attribute Weighting most similar consultations to 𝑛 are averaged for each therapy option as pictured in figure 1. The outcome-consultation ma- 𝑡𝑟𝑎𝑖𝑛 We assume that individual attributes are of varying importance trix 𝐴 accumulates outcomes of ever applied treatments, concerning the similarity between consultations or even are i.e. the treatment history for each training consultation. Fi- entirely irrelevant. The baseline metric for computing similar- nally, the Root Mean Squared Error (RMSE) can be computed 𝑡𝑒𝑠𝑡 ity is the Gower coefficient as already successfully applied in between outcome estimate 𝑦^ and actually observed out- 𝑡𝑒𝑠𝑡 [3]. The Gower coefficient differentiates between data types come 𝑦 to evaluate the estimation accuracy. and facilitates attribute weighting when quantifying similarity. Both, the local neighbourhood of 𝑛 included into the esti- A widely and successfully used class of feature weighting and mation and the coefficients, for calculating the weighted av- selection algorithms which exploit the concept of similarity erage of the observed outcomes are defined by a similarity are Relief-based algorithms (RBAs) [4]. In this work, the RBA measure 𝑠 for each training consultation 𝑘. Here, a similarity 𝑡𝑒𝑠𝑡 𝑡𝑟𝑎𝑖𝑛 𝑛 approach is adapted to the aforementioned similarity assump- function 𝑠(𝑥 , 𝑥 ) defines 𝑠 for test and training con- 𝑘 𝑘 𝑡𝑒𝑠𝑡 𝑡𝑟𝑎𝑖𝑛 tions. Within an iterative process, a random target consultation sultation representations 𝑥 and 𝑥 . 𝑡𝑟𝑎𝑖𝑛 𝑡𝑟𝑎𝑖𝑛 representation 𝑥 is drawn from the training data 𝑋 and, based on this sample, each dimension 𝑤 of an attribute weight vector 𝑤 is updated according to equation 1. 3 Neighborhood Optimization 𝑑 𝑑 𝑑 𝑑 𝑤 = 𝑤 + 𝑠 − 𝑠 (1) 𝐻𝑖𝑡𝑠 𝑀𝑖𝑠𝑠𝑒𝑠 3.1 Similarity Assumptions Here, in accordance with figure 2, 𝑠 is the average simi- 𝐻𝑖𝑡𝑠 larity between target 𝑖 and the 𝐾 closest consultations which 𝑡𝑒𝑠𝑡 𝑡𝑟𝑎𝑖𝑛 The similarity function 𝑠(𝑥 , 𝑥 ) itself and the impact 𝑘 are also labeled as similar according to the definitions from of attributes incorporated into the similarity computation de- above. 𝑠 is the average similarity between target 𝑖 and 𝑀𝑖𝑠𝑠𝑒𝑠 termine the computed outcome estimate in the CF setting. In the 𝐾 closest consultations which are labeled as dissimi- this work we compare two methods that both automatically lar. As the attribute weight vector is initialized with 0, at- 𝑡𝑒𝑠𝑡 𝑡𝑟𝑎𝑖𝑛 adapt 𝑠(𝑥 , 𝑥 ) to the data at hand in order to find an 𝑘 tributes whose weights become negative are assumed to be appropriate neighborhood and averaging coefficients. Both in- irrelevant or unfavourable and are neglected when comput- vestigated approaches assume a supervised classification prob- 𝑡𝑒𝑠𝑡 𝑡𝑟𝑎𝑖𝑛 ing 𝑠(𝑥 , 𝑥 ). The optimal free parameters, namely the 𝑘 3 F. Gräßer et al., Neighborhood Optimization for Therapy Decision Support trolled using a meta parameter 𝜈 . Firstly, large average dis- 𝑡𝑟𝑎𝑖𝑛 𝑡𝑟𝑎𝑖𝑛 𝑡𝑟𝑎𝑖𝑛 tances 𝑑(𝑥 , 𝑥 ) between 𝑥 and the 𝐾 closest 𝑖 𝑘 𝑖 𝑡𝑟𝑎𝑖𝑛 consultation representations 𝑥 labeled as similar, i.e. the 0.8 𝑘 target neighbors, are penalized according to equation 3. ∑︁ 𝑡𝑟𝑎𝑖𝑛 𝑡𝑟𝑎𝑖𝑛 𝜖 (M) = 𝑑(𝑥 , 𝑥 ) (3) 𝑝𝑢𝑙𝑙 𝑖 0.8 𝑖,𝑘 0.9 Secondly, small distances between 𝑖 and consultations which are labeled as dissimilar and which invade the perimeter (plus 0.5 a unit margin) established by the target neighbors, are penal- ized according to equation 4. The hinge loss [𝑧] = 𝑚𝑎𝑥(𝑧, 0) ensures that only invading consultations contribute to the loss function. 𝑡𝑟𝑎𝑖𝑛 𝑡𝑟𝑎𝑖𝑛 Fig. 2: Neighboring consultation representations 𝑥 of 𝑥 𝑛 ∑︁ 𝑡𝑟𝑎𝑖𝑛 𝑡𝑟𝑎𝑖𝑛 𝑡𝑟𝑎𝑖𝑛 𝑡𝑟𝑎𝑖𝑛 with same treatment applied and same outcome (> 0.5) are con- 𝜖 (M) = [𝑑(𝑥 , 𝑥 ) + 1− 𝑑(𝑥 , 𝑥 )] 𝑝𝑢𝑠ℎ 𝑖 𝑘 𝑖 𝑙 sidered as similar (grey) and vice versa (white). For consultations 𝑖,𝑘,𝑙 associated with differing treatment no information about similarity (4) 𝑡𝑟𝑎𝑖𝑛 is available (𝑎 ). LMNN intents to cause the target sample 𝑥 Analogously to the RBA, free parameters such as the neigh- to be surrounded by samples of the same class while being sepa- borhood size 𝐾 , impact ratio of the two competing objectives rated from samples of different classes. 𝜈 , and the learning rate 𝜇 need to be determined. neighborhood size 𝐾 and the number of iterations are deter- mined with cross validation. 4 Evaluation and Results 4.1 Nested Cross Validation 3.3 Metric Learning As the consultations of the individual patients cannot be re- Metric learning based algorithms assume that not only im- garded to be independent (i.i.d.), a patient-wise evaluation portance of the individual attributes but also the multivari- scheme is applied in this work. Hence, to make most of the ate distribution of the data as well as correlations among at- available data and to ideally provide an unbiased estimate tributes have crucial impact on the similarity computation and of the true generalization error a nested cross-validation ap- hence impact the outcome estimation. Here, as baseline met- proach is applied for model selection and evaluation. The outer ric the Euclidean distance is employed to derive similarity be- loop (outer cv) implements a leave-one-patient-out cross val- tween the rescaled consultation representations. Mahalanobis idation which in each iteration holds out all consultation of distance additionally considers the distribution of the data by one test patient 𝑝 for evaluation. The inner loop implements a measuring distance in standard deviations along the principal 5-fold cross-validation (inner cv) including the remaining pa- components of the data when computing Euclidean distance. tients’ consultations for model selection. To avoid bias due to 𝑡𝑟𝑎𝑖𝑛 𝑡𝑟𝑎𝑖𝑛 𝑡𝑟𝑎𝑖𝑛 𝑡𝑟𝑎𝑖𝑛 ⊤ 𝑡𝑟𝑎𝑖𝑛 𝑡𝑟𝑎𝑖𝑛 𝑑(𝑥 , 𝑥 ) = (𝑥 − 𝑥 ) M(𝑥 − 𝑥 ) sample dependencies, also within the inner loop consultations 𝑛 𝑛 𝑛 𝑘 𝑘 𝑘 (2) from the same patient never enter different folds. Within this However, instead of employing the inverse covariance matrix inner loop, the cv performance is calculated for all possible as global transformation M, generalized Mahalanobis met- model variants (grid search) and the best performing model rics can incorporate additional constraints. The Large Margin parameters are selected. Finally, the RMSE is computed be- Nearest Neighbor (LMNN) algorithm proposed by [5] learns tween predicted and actually observed outcome for the hold such a generalized Mahalanobis metric and is especially in- out consultations of 𝑝 using all the remaining patients’ consul- tended for neighborhood-based classification algorithms. The tations to compute the outcome estimates. overall intention of the LMNN approach is to learn M such- like that it causes the target sample 𝑖 to be surrounded by samples 𝑘 of the same class while being separated from sam- 4.2 Outcome Estimate Accuracy ples of different classes as pictured in figure 2. The loss func- tion, which is optimized to learn M, consists of two compet- Figure 3 shows the two baseline metrics Gower coefficient ing objectives 𝜖 and 𝜖 whose relative impact is con- and Euclidean distance as well as the best performing vari- 𝑝𝑢𝑙𝑙 𝑝𝑢𝑠ℎ 4 F. Gräßer et al., Neighborhood Optimization for Therapy Decision Support Tab. 1: Mean and standard deviation of the outer cv results eval- 0.25 uating each test patient 𝑝 by applying the best performing model parameters determined in the inner cv. Method RMSE Method RMSE 0.2 Gower 0.1379 (0.1069) Euclidean 0.1347 (0.1083) RBA 0.1336 (0.1112) LMNN 0.1410 (0.1209) 0.15 5 Conclusion 0.1 Generally, the neighborhood-based CF approach for therapy 0 20 40 60 80 100 recommendation allows to estimate therapy outcome which can be utilized to provide decision support. Inspection of this neighbourhood can serve as a basis for explaining and inter- preting recommendations. The inner cross validation results Fig. 3: Mean and value range of the cross-validation RMSE be- show that the estimation of the outcome score varies depend- tween estimated and observed outcome comparing the two base- line metrics Gower coefficient ( ) and Euclidean distance ing on the method employed. Utilizing data type dependent ( ) and the two optimization strategies RBA ( ) and LMNN functions for computing distance or similarity as done by the ( ), respectively. RMSE is computed for a neighborhood size Gower coefficient proves to be beneficial in comparison with range 𝐾 ∈ [1, 100]. Euclidean distance. This approach can be further improved by assigning appropriate weights to attributes. Learning a trans- formation matrix 𝑀 which, besides only scaling individual at- ant of each of the proposed neighborhood optimization ap- tributes is also capable of rotating the basis of the consultation proaches, respectively. Regarding attribute weighting (RBA), representations outperforms the attribute weighting approach. the best performance was obtained at a local neighborhood However, attribute weighting bears, in contrast to the LMNN 𝐾 = 15 and iterating once over each training consul- 𝑅𝐵𝐴 algorithm, the additional potential to reveal insights into de- tation. Regarding the metric learning approach (LMNN), the termining factors regarding outcome and, equally to the Gower best performance was obtained with a local neighborhood size coefficient, is applicable to representations with missing values 𝐾 = 10. Furthermore, the inner cv yields best results 𝐿𝑀𝑁𝑁 which is a pervasive challenge in the medical domain. Never- for setting the impact ratio of the two competing objectives as theless, the differences in the inner and outer cv results show 𝜈 = 0.5, and the learning rate 𝜇 = 0.001. For each method, that the limited data set in combination with the applied eval- the average inner cv results for all test patients 𝑝 and over a uation strategy causes the inner evaluation loop to be biased range of included nearest neighbors 𝐾 = [1, 100] is shown. and doesn’t provide a reliable indicator for model selection. As can be seen, the RMSEs vary among test partitions. When choosing the neighborhood size 𝐾 it must be considered that RMSE can only be computed if there is any overlap of ther- apies applied in the test consultation and in the therapy his- References tory of incorporated neighbors which is not always given for small 𝐾 . Therefore, a lower boundary for the selected neigh- [1] Berner ES. Clinical Decision Support Systems. Springer borhood size 𝐾 ≥ 10 is defined which none of the found International Publishing; 2016. [2] Ricci F, Rokach L, Shapira B, Kantor P. Recommender Sys- 𝐾 fall below. The outer cv results (table 1), however, do not tems Handbook. Springer US; 2011. show the same performance. Two opposing phenomena can be [3] Gräßer F, Beckert S, Küster D, Schmitt J Malberg, H, Za- observed. Firstly, in comparison with the inner cv, within the unseder S, et al. Therapy Decision Support Based on Rec- outer loop there is a larger training data set to select the most ommender System Methods. Journal of Healthcare Engi- similar consultations from, which results in overall better per- neering; 2017. forming baseline models. Secondly, as the optimized similarity [4] Urbanowicz RJ, Meeker M, La Cava W, Olson RS, Moore J. Relief-Based Feature Selection: Introduction and Review. functions are learned on the entire training folds, the functions CoRR; 2017. are subject to overfitting and the inner cv results may be bi- [5] Weinberger KQ, Blitzer JC, Saul LK. Distance Metric Learn- ased. ing for Large Margin Classification. Journal of Machine Learning Research; 10: 207–244. RMSE http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Current Directions in Biomedical Engineering de Gruyter

Neighborhood Optimization for Therapy Decision Support

Loading next page...
 
/lp/de-gruyter/neighborhood-optimization-for-therapy-decision-support-C4LgDMC3v0
Publisher
de Gruyter
Copyright
© 2019 by Walter de Gruyter Berlin/Boston
eISSN
2364-5504
DOI
10.1515/cdbme-2019-0001
Publisher site
See Article on Publisher Site

Abstract

Current Directions in Biomedical Engineering 2019;5(1):1–4 Felix Gräßer*, Hagen Malberg, and Sebastian Zaunseder Neighborhood Optimization for Therapy Decision Support https://doi.org/10.1515/cdbme-2019-0001 treatment recommendations. To that end, this work transfers methodologies from the field of Recommender Systems (RS) Abstract: This work targets the development of a research to the CDSS domain. RS are widely applied in other neighborhood-based Collaborative Filtering therapy recom- domains such as e-commerce or music and movie streaming mender system for clinical decision support. The proposed services. In those applications, sophisticated and specialized algorithm estimates outcome of pharmaceutical therapy op- approaches were developed over the recent years to provide tions in order to derive recommendations. Two approaches, a target user with personalized product recommendations [2]. namely a Relief-based algorithm and a metric learning ap- Such methods can be capable of meeting both the stated relia- proach are investigated. Both adapt similarity functions to bility and interpretability requirements. Specifically, this work the underlying data in order to determine the neighborhood deals with finding a similarity function optimized for the data incorporated into the filtering process. The implemented ap- at hand, which is fundamental for the widely used class of proaches are evaluated regarding the accuracy of the outcome neighborhood-based RS algorithms. estimations. The metric learning approach can outperform the Relief-based algorithms. It is, however, inferior regarding ex- plainability of the generated recommendations. 2 Materials and Methods Keywords: Clinical Decision Support System, CDSS, Ther- 2.1 Psoriasis Data apy Recommender System, Neighborhood Optimization. The CDSS algorithms proposed in this work are evaluated on 1 Introduction the basis of a clinical dataset consisting of 1242 consultation representations 𝑋 from 239 patients suffering from various Clinical decision support systems (CDSS) are intended to types of the skin disease Psoriasis [3]. Each consultation rep- provide assistance for personalized diagnosis and treatment resentation 𝑥 incorporates the individual therapy history, as decisions [1]. Suchlike systems are expected to play an in- well as demographic and condition related data, adding up creasingly important role in future healthcare. Especially data- to a total of 125 attributes. The level of measurement of the driven approaches, employing data mining and machine learn- present data ranges from binary and nominal qualitative at- ing techniques to exploit the large volume of daily captured tributes to ordinal and ratio scaled quantitative attributes. The and widely unused clinical data, promise to open up new per- overall objective of the therapy RS is to predict the numerically spectives. In contrast to expert systems, which derive recom- decoded outcomes 𝑦 of 7 systemic pharmaceutical therapy op- mendations or suggestions using knowledge stored in rule sets tions based on 𝑥 in order to provide the treating physician with (if-then rules), data-driven approaches are supposed to be ca- a ranked list of therapies. pable of extracting knowledge automatically from the avail- able data [1]. However, in order to facilitate a high degree of acceptance among medical practitioners, such approaches 2.2 Collaborative Filtering for Therapy are required to provide reliable and interpretable decision sup- Decision Support port. This work aims at developing a CDSS which supplies the attending physician with individualized and patient-specific Deriving recommendations based on the local neighborhood of a target user is a straightforward and efficient approach de- *Corresponding author: Felix Gräßer, Institute of Biomedical noted as Collaborative Filtering (CF) [2]. CF identifies users Engineering, Technical University Dresden, Dresden, Germany, with similar taste by comparing purchase histories or product e-mail: felix.graesser@tu-dresden.de ratings and derives potentially most preferred products. This Hagen Malberg, Institute of Biomedical Engineering, Technical approach was transferred to therapy recommendation in our University Dresden, Dresden, Germany previous work [3]. Here, consultations where compared using Sebastian Zaunseder, Department of Information Technology, University of Applied Sciences and Arts Dortmund, Dortmund, representations as introduced above in order to derive treat- Germany Open Access. © 2019 Felix Grä ä ß ß er, Hagen Malberg, Sebastian Zaunseder published by De Gruyter. T his work is licensed under the Creative C ommons Attribution 4.0 License. F. Gräßer et al., Neighborhood Optimization for Therapy Decision Support 0.5 - 0.9 0.8 0.7 lem where each instance is associated with a distinct class. In 0.7 0.7 0.6 - 0.5 0.6 the present setting this corresponds to a priori assumptions re- garding similarity or dissimilarity. Each consultation is charac- terized by a numeric outcome indicator associated with the ap- plied treatment option and unknown outcome for all other op- tions which have not been applied (unobserved ground truth). - 0.7 0.8 0.9 0.8 Consequently, assumptions regarding similarity or dissimilar- ity between a pair of consultations can only be derived from those consultations which applied therapies in common and for which in both cases outcome is known. Figure 2 assumes 𝑡𝑟𝑎𝑖𝑛 a training consultation representation 𝑥 which is associ- ated with a treatment which showed good response (> 0.5). 𝑡𝑒𝑠𝑡 Fig. 1: Outcomes 𝑦^ of treatment options 𝑡 are estimated for a 𝑡𝑟𝑎𝑖𝑛 Thus, neighboring consultations 𝑥 are labeled as simi- test consultation 𝑛 based on all outcomes observed in the treat- 𝑡𝑟𝑎𝑖𝑛 lar to 𝑥 if the same treatment is present in the outcome- ment history of the 𝐾 most similar training data consultations. 𝑡𝑟𝑎𝑖𝑛 𝑡𝑟𝑎𝑖𝑛 consultation vector 𝑎 holding the treatment history of 𝐴 accumulates the numerically encoded outcomes of all pre- 𝑡𝑟𝑎𝑖𝑛 viously applied treatment options, i.e. the treatment history of the 𝑥 and if this treatment also has shown good outcome. 𝑡𝑟𝑎𝑖𝑛 training consultations. In 𝐴 rows represent treatment options Conversely, neighboring consultations are labeled as dissim- and columns training data consultations. 𝑡𝑟𝑎𝑖𝑛 ilar to consultation 𝑛 if the same treatment is present in 𝑎 but this treatment has shown bad response (≤ 0.5). Regarding 𝑡𝑟𝑎𝑖𝑛 neighboring consultation representations 𝑥 for which is ment recommendations with potentially good outcome. 𝑡𝑒𝑠𝑡 true that the in 𝑛 applied therapy was never applied, no infor- Therefore, the numerically decoded outcomes 𝑦 ∈ [0, 1] mation regarding the similarity label is available. of potential treatment options, ranging from bad to good re- sponse, are estimated for a test patient and consultation 𝑛. To do so, outcomes observed in the treatment history of the 𝐾 3.2 Attribute Weighting most similar consultations to 𝑛 are averaged for each therapy option as pictured in figure 1. The outcome-consultation ma- 𝑡𝑟𝑎𝑖𝑛 We assume that individual attributes are of varying importance trix 𝐴 accumulates outcomes of ever applied treatments, concerning the similarity between consultations or even are i.e. the treatment history for each training consultation. Fi- entirely irrelevant. The baseline metric for computing similar- nally, the Root Mean Squared Error (RMSE) can be computed 𝑡𝑒𝑠𝑡 ity is the Gower coefficient as already successfully applied in between outcome estimate 𝑦^ and actually observed out- 𝑡𝑒𝑠𝑡 [3]. The Gower coefficient differentiates between data types come 𝑦 to evaluate the estimation accuracy. and facilitates attribute weighting when quantifying similarity. Both, the local neighbourhood of 𝑛 included into the esti- A widely and successfully used class of feature weighting and mation and the coefficients, for calculating the weighted av- selection algorithms which exploit the concept of similarity erage of the observed outcomes are defined by a similarity are Relief-based algorithms (RBAs) [4]. In this work, the RBA measure 𝑠 for each training consultation 𝑘. Here, a similarity 𝑡𝑒𝑠𝑡 𝑡𝑟𝑎𝑖𝑛 𝑛 approach is adapted to the aforementioned similarity assump- function 𝑠(𝑥 , 𝑥 ) defines 𝑠 for test and training con- 𝑘 𝑘 𝑡𝑒𝑠𝑡 𝑡𝑟𝑎𝑖𝑛 tions. Within an iterative process, a random target consultation sultation representations 𝑥 and 𝑥 . 𝑡𝑟𝑎𝑖𝑛 𝑡𝑟𝑎𝑖𝑛 representation 𝑥 is drawn from the training data 𝑋 and, based on this sample, each dimension 𝑤 of an attribute weight vector 𝑤 is updated according to equation 1. 3 Neighborhood Optimization 𝑑 𝑑 𝑑 𝑑 𝑤 = 𝑤 + 𝑠 − 𝑠 (1) 𝐻𝑖𝑡𝑠 𝑀𝑖𝑠𝑠𝑒𝑠 3.1 Similarity Assumptions Here, in accordance with figure 2, 𝑠 is the average simi- 𝐻𝑖𝑡𝑠 larity between target 𝑖 and the 𝐾 closest consultations which 𝑡𝑒𝑠𝑡 𝑡𝑟𝑎𝑖𝑛 The similarity function 𝑠(𝑥 , 𝑥 ) itself and the impact 𝑘 are also labeled as similar according to the definitions from of attributes incorporated into the similarity computation de- above. 𝑠 is the average similarity between target 𝑖 and 𝑀𝑖𝑠𝑠𝑒𝑠 termine the computed outcome estimate in the CF setting. In the 𝐾 closest consultations which are labeled as dissimi- this work we compare two methods that both automatically lar. As the attribute weight vector is initialized with 0, at- 𝑡𝑒𝑠𝑡 𝑡𝑟𝑎𝑖𝑛 adapt 𝑠(𝑥 , 𝑥 ) to the data at hand in order to find an 𝑘 tributes whose weights become negative are assumed to be appropriate neighborhood and averaging coefficients. Both in- irrelevant or unfavourable and are neglected when comput- vestigated approaches assume a supervised classification prob- 𝑡𝑒𝑠𝑡 𝑡𝑟𝑎𝑖𝑛 ing 𝑠(𝑥 , 𝑥 ). The optimal free parameters, namely the 𝑘 3 F. Gräßer et al., Neighborhood Optimization for Therapy Decision Support trolled using a meta parameter 𝜈 . Firstly, large average dis- 𝑡𝑟𝑎𝑖𝑛 𝑡𝑟𝑎𝑖𝑛 𝑡𝑟𝑎𝑖𝑛 tances 𝑑(𝑥 , 𝑥 ) between 𝑥 and the 𝐾 closest 𝑖 𝑘 𝑖 𝑡𝑟𝑎𝑖𝑛 consultation representations 𝑥 labeled as similar, i.e. the 0.8 𝑘 target neighbors, are penalized according to equation 3. ∑︁ 𝑡𝑟𝑎𝑖𝑛 𝑡𝑟𝑎𝑖𝑛 𝜖 (M) = 𝑑(𝑥 , 𝑥 ) (3) 𝑝𝑢𝑙𝑙 𝑖 0.8 𝑖,𝑘 0.9 Secondly, small distances between 𝑖 and consultations which are labeled as dissimilar and which invade the perimeter (plus 0.5 a unit margin) established by the target neighbors, are penal- ized according to equation 4. The hinge loss [𝑧] = 𝑚𝑎𝑥(𝑧, 0) ensures that only invading consultations contribute to the loss function. 𝑡𝑟𝑎𝑖𝑛 𝑡𝑟𝑎𝑖𝑛 Fig. 2: Neighboring consultation representations 𝑥 of 𝑥 𝑛 ∑︁ 𝑡𝑟𝑎𝑖𝑛 𝑡𝑟𝑎𝑖𝑛 𝑡𝑟𝑎𝑖𝑛 𝑡𝑟𝑎𝑖𝑛 with same treatment applied and same outcome (> 0.5) are con- 𝜖 (M) = [𝑑(𝑥 , 𝑥 ) + 1− 𝑑(𝑥 , 𝑥 )] 𝑝𝑢𝑠ℎ 𝑖 𝑘 𝑖 𝑙 sidered as similar (grey) and vice versa (white). For consultations 𝑖,𝑘,𝑙 associated with differing treatment no information about similarity (4) 𝑡𝑟𝑎𝑖𝑛 is available (𝑎 ). LMNN intents to cause the target sample 𝑥 Analogously to the RBA, free parameters such as the neigh- to be surrounded by samples of the same class while being sepa- borhood size 𝐾 , impact ratio of the two competing objectives rated from samples of different classes. 𝜈 , and the learning rate 𝜇 need to be determined. neighborhood size 𝐾 and the number of iterations are deter- mined with cross validation. 4 Evaluation and Results 4.1 Nested Cross Validation 3.3 Metric Learning As the consultations of the individual patients cannot be re- Metric learning based algorithms assume that not only im- garded to be independent (i.i.d.), a patient-wise evaluation portance of the individual attributes but also the multivari- scheme is applied in this work. Hence, to make most of the ate distribution of the data as well as correlations among at- available data and to ideally provide an unbiased estimate tributes have crucial impact on the similarity computation and of the true generalization error a nested cross-validation ap- hence impact the outcome estimation. Here, as baseline met- proach is applied for model selection and evaluation. The outer ric the Euclidean distance is employed to derive similarity be- loop (outer cv) implements a leave-one-patient-out cross val- tween the rescaled consultation representations. Mahalanobis idation which in each iteration holds out all consultation of distance additionally considers the distribution of the data by one test patient 𝑝 for evaluation. The inner loop implements a measuring distance in standard deviations along the principal 5-fold cross-validation (inner cv) including the remaining pa- components of the data when computing Euclidean distance. tients’ consultations for model selection. To avoid bias due to 𝑡𝑟𝑎𝑖𝑛 𝑡𝑟𝑎𝑖𝑛 𝑡𝑟𝑎𝑖𝑛 𝑡𝑟𝑎𝑖𝑛 ⊤ 𝑡𝑟𝑎𝑖𝑛 𝑡𝑟𝑎𝑖𝑛 𝑑(𝑥 , 𝑥 ) = (𝑥 − 𝑥 ) M(𝑥 − 𝑥 ) sample dependencies, also within the inner loop consultations 𝑛 𝑛 𝑛 𝑘 𝑘 𝑘 (2) from the same patient never enter different folds. Within this However, instead of employing the inverse covariance matrix inner loop, the cv performance is calculated for all possible as global transformation M, generalized Mahalanobis met- model variants (grid search) and the best performing model rics can incorporate additional constraints. The Large Margin parameters are selected. Finally, the RMSE is computed be- Nearest Neighbor (LMNN) algorithm proposed by [5] learns tween predicted and actually observed outcome for the hold such a generalized Mahalanobis metric and is especially in- out consultations of 𝑝 using all the remaining patients’ consul- tended for neighborhood-based classification algorithms. The tations to compute the outcome estimates. overall intention of the LMNN approach is to learn M such- like that it causes the target sample 𝑖 to be surrounded by samples 𝑘 of the same class while being separated from sam- 4.2 Outcome Estimate Accuracy ples of different classes as pictured in figure 2. The loss func- tion, which is optimized to learn M, consists of two compet- Figure 3 shows the two baseline metrics Gower coefficient ing objectives 𝜖 and 𝜖 whose relative impact is con- and Euclidean distance as well as the best performing vari- 𝑝𝑢𝑙𝑙 𝑝𝑢𝑠ℎ 4 F. Gräßer et al., Neighborhood Optimization for Therapy Decision Support Tab. 1: Mean and standard deviation of the outer cv results eval- 0.25 uating each test patient 𝑝 by applying the best performing model parameters determined in the inner cv. Method RMSE Method RMSE 0.2 Gower 0.1379 (0.1069) Euclidean 0.1347 (0.1083) RBA 0.1336 (0.1112) LMNN 0.1410 (0.1209) 0.15 5 Conclusion 0.1 Generally, the neighborhood-based CF approach for therapy 0 20 40 60 80 100 recommendation allows to estimate therapy outcome which can be utilized to provide decision support. Inspection of this neighbourhood can serve as a basis for explaining and inter- preting recommendations. The inner cross validation results Fig. 3: Mean and value range of the cross-validation RMSE be- show that the estimation of the outcome score varies depend- tween estimated and observed outcome comparing the two base- line metrics Gower coefficient ( ) and Euclidean distance ing on the method employed. Utilizing data type dependent ( ) and the two optimization strategies RBA ( ) and LMNN functions for computing distance or similarity as done by the ( ), respectively. RMSE is computed for a neighborhood size Gower coefficient proves to be beneficial in comparison with range 𝐾 ∈ [1, 100]. Euclidean distance. This approach can be further improved by assigning appropriate weights to attributes. Learning a trans- formation matrix 𝑀 which, besides only scaling individual at- ant of each of the proposed neighborhood optimization ap- tributes is also capable of rotating the basis of the consultation proaches, respectively. Regarding attribute weighting (RBA), representations outperforms the attribute weighting approach. the best performance was obtained at a local neighborhood However, attribute weighting bears, in contrast to the LMNN 𝐾 = 15 and iterating once over each training consul- 𝑅𝐵𝐴 algorithm, the additional potential to reveal insights into de- tation. Regarding the metric learning approach (LMNN), the termining factors regarding outcome and, equally to the Gower best performance was obtained with a local neighborhood size coefficient, is applicable to representations with missing values 𝐾 = 10. Furthermore, the inner cv yields best results 𝐿𝑀𝑁𝑁 which is a pervasive challenge in the medical domain. Never- for setting the impact ratio of the two competing objectives as theless, the differences in the inner and outer cv results show 𝜈 = 0.5, and the learning rate 𝜇 = 0.001. For each method, that the limited data set in combination with the applied eval- the average inner cv results for all test patients 𝑝 and over a uation strategy causes the inner evaluation loop to be biased range of included nearest neighbors 𝐾 = [1, 100] is shown. and doesn’t provide a reliable indicator for model selection. As can be seen, the RMSEs vary among test partitions. When choosing the neighborhood size 𝐾 it must be considered that RMSE can only be computed if there is any overlap of ther- apies applied in the test consultation and in the therapy his- References tory of incorporated neighbors which is not always given for small 𝐾 . Therefore, a lower boundary for the selected neigh- [1] Berner ES. Clinical Decision Support Systems. Springer borhood size 𝐾 ≥ 10 is defined which none of the found International Publishing; 2016. [2] Ricci F, Rokach L, Shapira B, Kantor P. Recommender Sys- 𝐾 fall below. The outer cv results (table 1), however, do not tems Handbook. Springer US; 2011. show the same performance. Two opposing phenomena can be [3] Gräßer F, Beckert S, Küster D, Schmitt J Malberg, H, Za- observed. Firstly, in comparison with the inner cv, within the unseder S, et al. Therapy Decision Support Based on Rec- outer loop there is a larger training data set to select the most ommender System Methods. Journal of Healthcare Engi- similar consultations from, which results in overall better per- neering; 2017. forming baseline models. Secondly, as the optimized similarity [4] Urbanowicz RJ, Meeker M, La Cava W, Olson RS, Moore J. Relief-Based Feature Selection: Introduction and Review. functions are learned on the entire training folds, the functions CoRR; 2017. are subject to overfitting and the inner cv results may be bi- [5] Weinberger KQ, Blitzer JC, Saul LK. Distance Metric Learn- ased. ing for Large Margin Classification. Journal of Machine Learning Research; 10: 207–244. RMSE

Journal

Current Directions in Biomedical Engineeringde Gruyter

Published: Sep 1, 2019

Keywords: Clinical Decision Support System; CDSS; Therapy Recommender System; Neighborhood Optimization

References