Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Multicriteria similarity models for medical diagnostic support algorithms

Multicriteria similarity models for medical diagnostic support algorithms The paper presents a general procedure model for the identification of diagnostic medical patterns based on multicriteria assessment of similarity. A general similarity detection area was defined, in which a pattern recognition optimization problem was formulated. An exemplary algorithm supporting the process of determining the initial medical diagnosis based on the identified disease symptoms and risk factors is presented. The presented algorithm allows for determining a set of diseases from which there is none more probable, and their ranking. Keywords: disease pattern; indicators and relations of similarity; pattern recognition; similarity; Tversky similarity model medical diagnosis. patient characteristics. Additionally, the patient may have a few diseases simultaneously, which may interfere with or even cancel each other out. In these cases, the specificity of recognition of medical patterns is mainly due to the fact that similarities between the patient's health condition and several diseases should be searched for. The typical diagnostic process is often a multi-stage sequential process that starts with the patient's interview with a physician during his/her first visit. The initial medical diagnosis is usually the basis for further diagnostic stages involving additional specialized tests. The sequence of these studies and the number and scope of those needed can be difficult to identify properly. This is undoubtedly a very responsible, complex and a difficult optimization problem that needs to be performed by a doctor. It has an impact on the effectiveness, duration and cost of treatment. The general scheme of the diagnosis process can be summarized as follows: Set B of initial disease diagnosis is the basis for further diagnostics. Its content determines the relevance, time and cost of the entire diagnosis process and treatment. Figure 1 is, of course, overly simplistic, because it does not include particular situations, such as the case of co-existing diseases, and a case when as in set B no disease will be diagnosed. The whole procedure of diagnosis can be divided into two stages: the initial diagnosis and the specialized diagnosis. The implementation of both phases is based on the same idea. It is the pattern recognition procedure, carried out on the basis of detection (determination) of similarity of the patient's health condition, described by a set of relevant parameters, with earlier defined disease entity patterns. The difference in procedures is mainly due to the technology used. A key module of all algorithms supporting diagnostic processes is therefore the module for determining similarity. In the natural process of diagnosing, the "degree of similarity" is determined by the doctor mainly through intuition on the basis of disease symptoms, risk factors, the results of specialized tests and known patterns (descriptions) of disease entities. Thus, the intuitive similarity is understood as multi-aspect taking into account subjective importance and priority diagnostics. The purpose of this paper is to try to define a mathematical model *Corresponding author: Andrzej Ameljaczyk, Faculty of Cybernetics, Military University of Technology, 2 Kaliskiego Street, 00-908 Warsaw, Warsaw, Poland, E-mail: aameljanczyk@wat.edu.pl Introduction The medical diagnosis process is an extremely complex undertaking, on which the method of treatment and the final patient's condition depends. Generally, there is a problem in the area of pattern recognition. Such a problem involves the determination of a set of patterns of disease entities, a disease entity most similar to the defined health condition. The specificity of recognition of diagnostic medical patterns are mainly a result of the fact that the patient's health condition (for various reasons) is difficult to reliably and accurately determine. The health condition of the patient in a given moment can be specified by several symptoms and risk factors, and many other medical parameters, values are often possible to determine only after time-consuming and expensive specialized studies performed in medical laboratories. The degree of occurrence of specific symptoms or risk factors is difficult to evaluate both by a physician and the patient mainly because of subjective perception and individual 2Ameljaczyk: Multicriteria similarity models for medical diagnostic support algorithms Set of ascertained disease symptoms Patient Set of ascertained risk factors Mechanism of initial disease diagnosis Set B of initial diagnosed diseases List of additional specialized tests Diagnostic inference mechanism Verification of initial diagnosis the most similar X inf elements and the set of X min elements, from which there is none more similar in set X [1]. Such a formulation of the task of examining the similarity is very general, but difficult for practical use due to the difficulty in defining the "natural similarity relation R". Any practical application of the theory of similarity is based on the study of object similarity based on the similarity of their models. Defining the similarity model of objects a, bA is often the defining of characteristics (indicators) of similarity [1, 2 ]. Generally, this may be the similarity function p defined as follows: p: A × A , N 1 B =1 Yes Further actions in the clinical path No Figure 1Diagnosis process scheme. of multi-aspect (multicriteria) similarity that allows the building of an adequate and reliable module for determining similarity in computer algorithms, that supports the process of diagnosis and the development of methods for determining quality characteristics (including reliability). The result of this approach will be a diagnostic algorithm that allows the determination of proposals for initial diagnosis based on identified disease symptoms and risk factors. Results obtained during computer simulation will also be presented concerning medical diagnosis that supports procedures and its qualitative characteristics such as the uniqueness of the obtained diagnosis, its clarity and credibility. These characteristics possess an additional piece of information about the "value" of the obtained the diagnosis in terms of its usefulness for further diagnosis. p(a, b) = [p1(a, b), p2(a, b),..., pn(a, b),..., pN(a, b)] N ­ will be called a multicriteria model (image, rating) similarity of object a to b. pn(a, b) 1 ­ will be the value of the nth characteristic (nth indicator) similarity. If N = 1 then we are dealing with a so-called one aspect similarity model (Tversky, Jaccarda, Tanimoto, Dice models [1, 3]). Such models have many disadvantages and limitations. Hence the most frequently used models are when N > 1 [1]. The preciseness and reliability of the similarity models depend, of course, on the definition of the model function p and generally increases along with the increase of N. Multicriteria similarity space P will be called a set of image similarities of a pair of objects from set A × A. Let y, zP P = p(A × A) = {p(a, b) |(a, b)A × A}. In this space, the relation of similarity detection R will be defined, so that (y, z) R, if and only if p­1(y) × p­1(z)R. The task of examining the similarity model (based on the similarity model) will be called a pair: P , R [1]. The procedure of identifying patterns can be defined as follows: ( ) Let A ­ object space XA ­ set of objects (patterns) aA ­ selected object in space A. By using function p (multicriteria evaluation of similarity) we can define function pa(x) of the similarity of object a with pattern xX, as follows: pa(x) = p(a, x) Multicriteria similarity model ­ medical pattern recognition tool The general theory of similarity has an extremely rich literature. The task of the study of the natural similarity between objects of a set X can be defined as a pair (X, R), where X is a non-empty set of objects, and R is a relation of similarity defined as follows: (x, y) R, if and only if "x is similar to y." The solution of (X, R) is the set with , xX. Detection (identifying) space of patterns for object aA will be called set: Y a = pa ( X ) = { y R N | y = pa ( x ), x X } Ya = pa(X) = {y |y = pa(x), xX} Ameljaczyk: Multicriteria similarity models for medical diagnostic support algorithms3 a a a a where pa ( x ) = ( p1 ( x ) , p2 ( x ) ,..., pn ( x ) ,..., PN ( x ))R N a while pn ( x ) = pn (a , x ) ­ the value of the nth similarity indicator of object a to pattern xX. The task of optimization connected with the identification of patterns will be written as follows: ( X , pa , R) or (Y a , R) Where Ya = pa(X) = {y for short. |y = pa(x), xX}. The relation of R similarity detection will be determined as follows: R = {( pa ( x 1) , pa ( x 2 ))R N × R N object a is more similar to pattern x1 than to pattern x2} By solving [1] we get 1. -1 R X infi(a) = ( pa ) (Y R (a)) = x X pa ( x )Y R (a) nfi nf the set of patterns to which object a is most similar R (set Y inf (a) is the set of the smallest elements of a set Y [1]), -1 R R R X min(a) = ( pa ) (Y min(a)) = {x X pa ( x )Y min(a)} set of patterns where there are none more similar R in set X to a (Y min (a) is a set of minimal elements of set Ya [1]), -1 disease entities. They are a modification of Jaccard distances (similarities). The computer-supported medical diagnosis process is based on programed algorithms for diagnostic inference. The base designs of such algorithms are the models of the patient's health condition and models (patterns) of disease entities. The result of the algorithm implementation is a suggestion (proposal) of consecutive diagnostic steps within the executed clinical path. The general idea of the supporting mechanism, depending on the adopted modeling concept (e.g., Bayesian networks [4, 5], fuzzy sets [2, 6, 7], rough sets [3] web models or pattern concepts [3]), relies on identifying the list of the most likely diagnoses, and then selecting the optimal set of additional specialized tests. A set of symptoms is associated with each disease entity, a set of risk factors as well as a set of "disease" medical parameter values attainable as a result of specialized tests. A typical model (pattern) of a disease entity should therefore include three segments: ­ symptom descriptions (symptoms) typical for the given disease [8­10], ­ risk factor descriptions related with the disease [8, 11], ­ description of disease range of "values of medical parameters" [9, 11, 12]. Formally, the mathematical model M(m) of the disease entity m = {1,..., M} can be presented as follows [2]: M(m) = (Sm, Rm, Pm) (1) infiR X a=( pa ) ( nf RY a ) ­ set of ideal patterns (utopian, virtual [1, 3] ) for object a ( inf RY a ­ the lower limit of set Ya). The above multicriteria similarity model is a very generalized model. Depending on the accepted p similarity function definition and the R similarity detection relation we can achieve, as special cases, different concepts of similarity, such as metric concepts: Tversky and Jaccard similarity and multi-aspect (multicriteria) models with different similarity relations. where Sm ­ a set of symptom numbers (symptoms) of a disease m m m m S m = s1 ,...,sk ,...,sk ( m ) S , m M . (2) Set S is a set of numbers of all symptoms of disease entities incorporated in the repository (of course S ). K(m) ­ number of disease entity symptoms m . Rm ­ set of the number of disease risk factors m . m m m Rm = r1 ,...rl ,...,rl ( m ) R, m M . Algorithm for determining the initial medical diagnosis based on the two-criteria metric model similarity An algorithm is presented that determines preliminary medical diagnosis based on the idea of multi-aspect similarity described in the previous section of the study. Similarity indicators are defined as properly understood distances of "patient health condition" from patterns of (3) Set R is a set of numbers of all disease entity risk factors incorporated in the repository (R ). L(m) ­ number of disease entity risk factors m . Pm ­ the set of the number of disease entity medical parameters m . m m m P m = p1 ,...,pn ,...,pN(m) P , m M . (4) 4Ameljaczyk: Multicriteria similarity models for medical diagnostic support algorithms Set P is a set of numbers of all medical parameters (whose values may be determined during specialized medical tests) of disease entities incorporated in the repository (P ). N(m) ­ the number of disease entity medical parameters m . The identification of each disease entity of individual disease symptoms, risk factors and the value of the relevant medical parameters have different meanings (they have different "gravity") [8­11, 13, 14]. Therefore let the numbers (defined by experts): m m ( sk )[0,1] , sk S m (R) = {mM|Ro(x)Rm}. The next step will be determining the total set of possible diseases of initial diagnosis. The initial estimate can be set Mo = Mo ( S ) M0 ( R ) or more radically: (8) (S) (R). (9) (r m)[0,1] , r m Rm l l ( p )[0,1] , p P m n m n m (5) means "the degree of importance" of individual parameters from the area of symptoms, risk factors, and additional tests in the diagnosis of disease entity number m . Usually as a result of the first visit to a physician, symptoms are found of disease, as well as the presence of possible disease risk factors [8]. Examples of disease symptoms may be, for example, swollen lymph nodes, skin lesions, fever, loss of appetite, diarrhea, night sweats, weight loss, dizziness, headaches, abdominal pain/tenderness, bleeding, etc. Risk factors include, for example: old age, smoking, physical inactivity, permanent stress, obesity, alcohol abuse, family history of a given disease, a fatty diet, sedentary lifestyle, stomach obesity, type 2 diabetes, a tendency to depression, etc. Let us assume that as a result of the initial stage patient x was diagnosed with disease symptoms from set So(x)S and a set of risk factors Ro(x)R S O ( x ) ={sS w ( x , s) >0} RO ( x ) ={r R w ( x ,r )>0} (6) Such an approach in establishing the initial diagnosis, however, is risky because of the potential risk factors or symptoms of several diseases simultaneously and the difficulty of precisely defining them. With patient data xX on the presence of disease symptoms and risk factors in the form of numbers w(x, s), sSo(x), and w(x, r), rRo(x) we define the "distance of the patients health condition" from patterns of potential diseases included in sets o(S) and o(R). This can be done as follows. The model of the current health condition of patient xX, defined on the basis of occurring disease symptoms and risk factors will take the form of a pair: f(x) = [fS(x), fR(x)], xX where: fS(x) = [w(x, s); sSo(x)], fR(x) = [w(x, r); rRo (x)] symbols s*(m) and r*(m) will be indicated as disease patterns number m accordingly from the terms of symptoms and risk d1[fs(x), s*(m)] factors [2, 15]. With symbol d1(fS(x), s*(m)) m we will denote the distance (similarity) of the patient's health status x (resulting from occurring symptoms) from the disease pattern m , defined on the basis of diseases symptoms, and analogously we will denote it with symbol d2[fR(x), r*(m)], m of the patients health condition distance x, (due to the occurrence of risk factors) from the disease pattern. m defined on the basis of risk factors. Set M[S0(x)] of the "most probable" disease due to disease symptoms will be calculated as follows: M (S0 (x))= m* M0 | d1 [ fs (x) , s* ( m* )] =minM0 (S) d1 [ fs (x) , s* ( m )] . m (10) thus w(x, s) ­ degree of "occurrence severity" of symptom sS (determined by the physician on a scale of [0, 1], during the first visit), and similarly w(x, r) - the degree of severity of the presence of risk factors number r of the examined patient (also on a scale of [0, 1]). Set of o(S) suggested diseases with a set of occurring symptoms will be specified as follows: (S) = {mM|So(x)Sm}. o (7) In turn, the set (Ro(x)) of "most probable" diseases due to the occurring risk factors will be determined as follows: M (R0 ( x )) = m* M0 | d2 [ fR ( x ) ,r * ( m* )] =minM0 (R) d2 [ fR ( x ) ,r * ( m )] m Similarly set o(R) of diseases associated with occurring risk factors will be determine as follows: The intersection of these sets, however, is mostly empty set [1]. Ameljaczyk: Multicriteria similarity models for medical diagnostic support algorithms5 Thecommon part of these sets * = (So(x)) [Ro (x)] is usually an empty set [1]. An interesting proposal for the determination of the set of diseases, the most probable simultaneously from the point of view of the set of occurring symptoms and risk factors, offers a multicriteria optimization theory and theory of space with relation [1, 16]. By determining the appropriate "diagnostic preferences" model we can define the following problem in the form of d2(m) · 11 · · 6 10 12 · · 5 · · 7 · 8 · 18 · 16 · 3 (M ,d (m) , R) . (11) * y · 17 13 · 15 · 14 · d1(m) Where function d(m) is a vector function of the distance (similarity) of the patient's health condition to the disease entity pattern number m d(m) = (d1(m), d2(m)), m . [2]: d 1 ( x ,m) = 1- d 2 ( x ,m) = 1- smS m(x) k o r mR l (12) R Figure 2Determining set MN diseases from which there is none more probable. These distances for patient x were defined as follows R R Set M N is the counter image [1] of the Pareto set YN . 0 0 M R = d -1 (Y R ) = m M o d (m )Y R . N N N w ( x , sm) ( sm), m M k k w ( x ,r m) (r m), mM l l (13) (16) m (x) o R ­ diagnostics preferences model (e.g., Pareto model [1]). In practice, the following three diagnostic preferences options are most frequently taken into account: 1. disease symptoms and risk factors are equally important (Pareto relationship) 2. disease symptoms are more important (hierarchical relationship) 3. risk factors are more important (hierarchical relationship). In the case of two criteria (13) and a relatively "small numerous" set Mo the above problem is very easy to solve graphically. An illustration of such a case is Figure 2. The image [1] of the set of diseases Mo in context of distance from the patient's health condition is set Y (Figure 2). Y = d( The ultimate determining factor may be in this situation the so-called "compromise solution" [16], which typically leads to a clear solution. In the above example, the initial estimate of the set = {1,..., 18} diseases. Set of possible diseases is set o R M N constitutes disease entities numbers {4, 6, 8, 9, 14} R (counter image of the set mM N ). The patient, therefore, has a "suspected occurrence" of diseases with numbers R mM N . When calculating the distance of images of these * diseases from the "utopian" (virtual) image y , the "most probable" (due to the ascertained symptoms and risk factors) disease, we can create a ranking of diseases for further diagnostic actions. * * Utopian diseases coordinates y * =( y1 , y 2 ) are determined as follows: * y1 = mind 1 (m ) , mMo * y 2 = mind 2 (m ) . mMo (17) ) = {d(m)R2|m }. (14) The solution of problem (11) will be the so-called Pareto set [1, 16], which is a set of diseases from the set of initial estimate o, from which there is "none more probable". This set will be denoted with the following symbol: R M N = { m0 Mo | does not exist m d(m)d(m0)}. ­{m0}, such that (15) The closest "most probable disease" resulting from the observed symptoms and risk factors is disease entity number 4. Practically, however, the "entire Pareto set" and the ranking of its elements should be presented to the physician for a final decision. The developed algorithm for determining the initial diagnosis as well as its most important properties underwent simulation testing using a designed simulator [15]. The simulation application was written in based on Microsoft .NET Framework. In the implementation process the C# programming language was used. The development environment, which was used to create the software, was 6Ameljaczyk: Multicriteria similarity models for medical diagnostic support algorithms Microsoft Visual Studio 2010. The simulator repository is generated dynamically (by determining appropriate parameters specified by the user). There also exists the possibility of using an MSSQL database to collect data concerning diseases and to use them in the simulation process. The presented software is a windowed application using Windows Forms GUI. In order to use the application it is required to install .NET Framework version 3 or higher. The results of individual simulations are depicted through the use of an additional Microsoft Charts library, allowing dynamic generation of different types of graphs. At the end of the simulation graphs are generated showing the area of probability detection and in particular the set of diagnoses from which there is none more probable. Each disease (specifically the image of its probability in relation to the patient's health condition) is shown on the graph as a dark colored point. "Diseases from which there is none more terminology" (the so-called Pareto front) are marked with a bright hue. The utopian point determined by the algorithm (ideal point) [1, 16] (located in the lower left corner of Graph 1) is the image of a "ideal similarity" of the patient's health condition to the "perfectly matching" "utopian" disease, which unfortunately is not in the repository. The user has the possibility to define at what stages of simulation graphs are generated with results (how many should there be). There is also the possibility of saving the results to a text file. The number (name) of the disease entity represented on the graph by a specified point can be acquired by clicking on the cursor. 1.0 0.9 0.8 0.7 0.6 d2 (m) 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 d1 (m) 0.7 0.8 The obtained diagram of similarity space can be provided with additional results of numerical characteristics of the obtained Pareto image [1]. Conclusion The procedure presented in the study can be considered as an initial diagnostic process that initiates each clinical path. It leads to the generation of a set (relatively speaking not numerous) of so-called diseases, from which there is none more probable. The next stage in the process of diagnosis is the selection of the "optimal set" of additional outpatient tests (clinical) which allows a final diagnostic decision to be made and then an optimal therapy is selected. A very important issue of the modeling process is the selection of the function of similarity form (distances d1 and d2), and the decision to adopt a suitable preferences model R. Specific mathematical formulas defining the so-called "distance functions" result from the accepted concepts of modeling [2, 3, 5, 17]. For example, in models based on the theory of Bayesian networks are suitable conditioned probability distributions [18]. In models based on the theory of fuzzy sets [6, 7, 19, 20] they are functions belonging to the set of initial diagnosis, and in models based on patterns, defined metrics in the so-called area of life, respectively [2, 3]. Diagnostic preferences models in specific situations do not have to be based on Pareto-type relations or "lexicography". These can be pessimist (optimist) model-type R Graph 1Determining set MN of diseases, from which there are none "more probable". Ameljaczyk: Multicriteria similarity models for medical diagnostic support algorithms7 relations or also so-called "collective preferences relations" in the case of diagnosing within the "medical consultation" from [1, 2]. This study's aim was to present a preliminary diagnosis model in such a way that it would be possible to use very rich and effective sets of capabilities in further studies that multicriteria optimization theory offers. The possibility of visualization of obtained results may be of great practical importance as an additional diagnostic tool for the family doctor. Graphical representation on a computer screen of the set of possible diagnoses (including a set of diagnoses, from which there is none more likely), and the values of quality indicators allow the physician to easily assess the suitability (reliability) of the acquired diagnoses later in the diagnostic procedures. The algorithm also allows for creating diagnosis rankings (creating a list diagnoses from the most probable to the least possible). The item diagnosis on the list of diagnoses is also linked to a number indicating the distance from the model of the patient's health condition. Simulation susceptibility is a very important property of the algorithm since it makes it easier to conduct research of the quality of the diagnosis process. It also gives the possibility of training or testing the physician's diagnostic skills. Suitably designed test data allows quick testing of the algorithm sensitivity against medical errors during the determination of symptoms, and risk factors but mainly their degree of intensification. Received November 13, 2012; revised January 9, 2013; accepted January 11, 2013; previously published online February 23, 2013 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Bio-Algorithms and Med-Systems de Gruyter

Multicriteria similarity models for medical diagnostic support algorithms

Bio-Algorithms and Med-Systems , Volume 9 (1) – Mar 1, 2013

Loading next page...
 
/lp/de-gruyter/multicriteria-similarity-models-for-medical-diagnostic-support-40jPQIwMqB

References

References for this paper are not available at this time. We will be adding them shortly, thank you for your patience.

Publisher
de Gruyter
Copyright
Copyright © 2013 by the
ISSN
1895-9091
eISSN
1896-530X
DOI
10.1515/bams-2013-0001
Publisher site
See Article on Publisher Site

Abstract

The paper presents a general procedure model for the identification of diagnostic medical patterns based on multicriteria assessment of similarity. A general similarity detection area was defined, in which a pattern recognition optimization problem was formulated. An exemplary algorithm supporting the process of determining the initial medical diagnosis based on the identified disease symptoms and risk factors is presented. The presented algorithm allows for determining a set of diseases from which there is none more probable, and their ranking. Keywords: disease pattern; indicators and relations of similarity; pattern recognition; similarity; Tversky similarity model medical diagnosis. patient characteristics. Additionally, the patient may have a few diseases simultaneously, which may interfere with or even cancel each other out. In these cases, the specificity of recognition of medical patterns is mainly due to the fact that similarities between the patient's health condition and several diseases should be searched for. The typical diagnostic process is often a multi-stage sequential process that starts with the patient's interview with a physician during his/her first visit. The initial medical diagnosis is usually the basis for further diagnostic stages involving additional specialized tests. The sequence of these studies and the number and scope of those needed can be difficult to identify properly. This is undoubtedly a very responsible, complex and a difficult optimization problem that needs to be performed by a doctor. It has an impact on the effectiveness, duration and cost of treatment. The general scheme of the diagnosis process can be summarized as follows: Set B of initial disease diagnosis is the basis for further diagnostics. Its content determines the relevance, time and cost of the entire diagnosis process and treatment. Figure 1 is, of course, overly simplistic, because it does not include particular situations, such as the case of co-existing diseases, and a case when as in set B no disease will be diagnosed. The whole procedure of diagnosis can be divided into two stages: the initial diagnosis and the specialized diagnosis. The implementation of both phases is based on the same idea. It is the pattern recognition procedure, carried out on the basis of detection (determination) of similarity of the patient's health condition, described by a set of relevant parameters, with earlier defined disease entity patterns. The difference in procedures is mainly due to the technology used. A key module of all algorithms supporting diagnostic processes is therefore the module for determining similarity. In the natural process of diagnosing, the "degree of similarity" is determined by the doctor mainly through intuition on the basis of disease symptoms, risk factors, the results of specialized tests and known patterns (descriptions) of disease entities. Thus, the intuitive similarity is understood as multi-aspect taking into account subjective importance and priority diagnostics. The purpose of this paper is to try to define a mathematical model *Corresponding author: Andrzej Ameljaczyk, Faculty of Cybernetics, Military University of Technology, 2 Kaliskiego Street, 00-908 Warsaw, Warsaw, Poland, E-mail: aameljanczyk@wat.edu.pl Introduction The medical diagnosis process is an extremely complex undertaking, on which the method of treatment and the final patient's condition depends. Generally, there is a problem in the area of pattern recognition. Such a problem involves the determination of a set of patterns of disease entities, a disease entity most similar to the defined health condition. The specificity of recognition of diagnostic medical patterns are mainly a result of the fact that the patient's health condition (for various reasons) is difficult to reliably and accurately determine. The health condition of the patient in a given moment can be specified by several symptoms and risk factors, and many other medical parameters, values are often possible to determine only after time-consuming and expensive specialized studies performed in medical laboratories. The degree of occurrence of specific symptoms or risk factors is difficult to evaluate both by a physician and the patient mainly because of subjective perception and individual 2Ameljaczyk: Multicriteria similarity models for medical diagnostic support algorithms Set of ascertained disease symptoms Patient Set of ascertained risk factors Mechanism of initial disease diagnosis Set B of initial diagnosed diseases List of additional specialized tests Diagnostic inference mechanism Verification of initial diagnosis the most similar X inf elements and the set of X min elements, from which there is none more similar in set X [1]. Such a formulation of the task of examining the similarity is very general, but difficult for practical use due to the difficulty in defining the "natural similarity relation R". Any practical application of the theory of similarity is based on the study of object similarity based on the similarity of their models. Defining the similarity model of objects a, bA is often the defining of characteristics (indicators) of similarity [1, 2 ]. Generally, this may be the similarity function p defined as follows: p: A × A , N 1 B =1 Yes Further actions in the clinical path No Figure 1Diagnosis process scheme. of multi-aspect (multicriteria) similarity that allows the building of an adequate and reliable module for determining similarity in computer algorithms, that supports the process of diagnosis and the development of methods for determining quality characteristics (including reliability). The result of this approach will be a diagnostic algorithm that allows the determination of proposals for initial diagnosis based on identified disease symptoms and risk factors. Results obtained during computer simulation will also be presented concerning medical diagnosis that supports procedures and its qualitative characteristics such as the uniqueness of the obtained diagnosis, its clarity and credibility. These characteristics possess an additional piece of information about the "value" of the obtained the diagnosis in terms of its usefulness for further diagnosis. p(a, b) = [p1(a, b), p2(a, b),..., pn(a, b),..., pN(a, b)] N ­ will be called a multicriteria model (image, rating) similarity of object a to b. pn(a, b) 1 ­ will be the value of the nth characteristic (nth indicator) similarity. If N = 1 then we are dealing with a so-called one aspect similarity model (Tversky, Jaccarda, Tanimoto, Dice models [1, 3]). Such models have many disadvantages and limitations. Hence the most frequently used models are when N > 1 [1]. The preciseness and reliability of the similarity models depend, of course, on the definition of the model function p and generally increases along with the increase of N. Multicriteria similarity space P will be called a set of image similarities of a pair of objects from set A × A. Let y, zP P = p(A × A) = {p(a, b) |(a, b)A × A}. In this space, the relation of similarity detection R will be defined, so that (y, z) R, if and only if p­1(y) × p­1(z)R. The task of examining the similarity model (based on the similarity model) will be called a pair: P , R [1]. The procedure of identifying patterns can be defined as follows: ( ) Let A ­ object space XA ­ set of objects (patterns) aA ­ selected object in space A. By using function p (multicriteria evaluation of similarity) we can define function pa(x) of the similarity of object a with pattern xX, as follows: pa(x) = p(a, x) Multicriteria similarity model ­ medical pattern recognition tool The general theory of similarity has an extremely rich literature. The task of the study of the natural similarity between objects of a set X can be defined as a pair (X, R), where X is a non-empty set of objects, and R is a relation of similarity defined as follows: (x, y) R, if and only if "x is similar to y." The solution of (X, R) is the set with , xX. Detection (identifying) space of patterns for object aA will be called set: Y a = pa ( X ) = { y R N | y = pa ( x ), x X } Ya = pa(X) = {y |y = pa(x), xX} Ameljaczyk: Multicriteria similarity models for medical diagnostic support algorithms3 a a a a where pa ( x ) = ( p1 ( x ) , p2 ( x ) ,..., pn ( x ) ,..., PN ( x ))R N a while pn ( x ) = pn (a , x ) ­ the value of the nth similarity indicator of object a to pattern xX. The task of optimization connected with the identification of patterns will be written as follows: ( X , pa , R) or (Y a , R) Where Ya = pa(X) = {y for short. |y = pa(x), xX}. The relation of R similarity detection will be determined as follows: R = {( pa ( x 1) , pa ( x 2 ))R N × R N object a is more similar to pattern x1 than to pattern x2} By solving [1] we get 1. -1 R X infi(a) = ( pa ) (Y R (a)) = x X pa ( x )Y R (a) nfi nf the set of patterns to which object a is most similar R (set Y inf (a) is the set of the smallest elements of a set Y [1]), -1 R R R X min(a) = ( pa ) (Y min(a)) = {x X pa ( x )Y min(a)} set of patterns where there are none more similar R in set X to a (Y min (a) is a set of minimal elements of set Ya [1]), -1 disease entities. They are a modification of Jaccard distances (similarities). The computer-supported medical diagnosis process is based on programed algorithms for diagnostic inference. The base designs of such algorithms are the models of the patient's health condition and models (patterns) of disease entities. The result of the algorithm implementation is a suggestion (proposal) of consecutive diagnostic steps within the executed clinical path. The general idea of the supporting mechanism, depending on the adopted modeling concept (e.g., Bayesian networks [4, 5], fuzzy sets [2, 6, 7], rough sets [3] web models or pattern concepts [3]), relies on identifying the list of the most likely diagnoses, and then selecting the optimal set of additional specialized tests. A set of symptoms is associated with each disease entity, a set of risk factors as well as a set of "disease" medical parameter values attainable as a result of specialized tests. A typical model (pattern) of a disease entity should therefore include three segments: ­ symptom descriptions (symptoms) typical for the given disease [8­10], ­ risk factor descriptions related with the disease [8, 11], ­ description of disease range of "values of medical parameters" [9, 11, 12]. Formally, the mathematical model M(m) of the disease entity m = {1,..., M} can be presented as follows [2]: M(m) = (Sm, Rm, Pm) (1) infiR X a=( pa ) ( nf RY a ) ­ set of ideal patterns (utopian, virtual [1, 3] ) for object a ( inf RY a ­ the lower limit of set Ya). The above multicriteria similarity model is a very generalized model. Depending on the accepted p similarity function definition and the R similarity detection relation we can achieve, as special cases, different concepts of similarity, such as metric concepts: Tversky and Jaccard similarity and multi-aspect (multicriteria) models with different similarity relations. where Sm ­ a set of symptom numbers (symptoms) of a disease m m m m S m = s1 ,...,sk ,...,sk ( m ) S , m M . (2) Set S is a set of numbers of all symptoms of disease entities incorporated in the repository (of course S ). K(m) ­ number of disease entity symptoms m . Rm ­ set of the number of disease risk factors m . m m m Rm = r1 ,...rl ,...,rl ( m ) R, m M . Algorithm for determining the initial medical diagnosis based on the two-criteria metric model similarity An algorithm is presented that determines preliminary medical diagnosis based on the idea of multi-aspect similarity described in the previous section of the study. Similarity indicators are defined as properly understood distances of "patient health condition" from patterns of (3) Set R is a set of numbers of all disease entity risk factors incorporated in the repository (R ). L(m) ­ number of disease entity risk factors m . Pm ­ the set of the number of disease entity medical parameters m . m m m P m = p1 ,...,pn ,...,pN(m) P , m M . (4) 4Ameljaczyk: Multicriteria similarity models for medical diagnostic support algorithms Set P is a set of numbers of all medical parameters (whose values may be determined during specialized medical tests) of disease entities incorporated in the repository (P ). N(m) ­ the number of disease entity medical parameters m . The identification of each disease entity of individual disease symptoms, risk factors and the value of the relevant medical parameters have different meanings (they have different "gravity") [8­11, 13, 14]. Therefore let the numbers (defined by experts): m m ( sk )[0,1] , sk S m (R) = {mM|Ro(x)Rm}. The next step will be determining the total set of possible diseases of initial diagnosis. The initial estimate can be set Mo = Mo ( S ) M0 ( R ) or more radically: (8) (S) (R). (9) (r m)[0,1] , r m Rm l l ( p )[0,1] , p P m n m n m (5) means "the degree of importance" of individual parameters from the area of symptoms, risk factors, and additional tests in the diagnosis of disease entity number m . Usually as a result of the first visit to a physician, symptoms are found of disease, as well as the presence of possible disease risk factors [8]. Examples of disease symptoms may be, for example, swollen lymph nodes, skin lesions, fever, loss of appetite, diarrhea, night sweats, weight loss, dizziness, headaches, abdominal pain/tenderness, bleeding, etc. Risk factors include, for example: old age, smoking, physical inactivity, permanent stress, obesity, alcohol abuse, family history of a given disease, a fatty diet, sedentary lifestyle, stomach obesity, type 2 diabetes, a tendency to depression, etc. Let us assume that as a result of the initial stage patient x was diagnosed with disease symptoms from set So(x)S and a set of risk factors Ro(x)R S O ( x ) ={sS w ( x , s) >0} RO ( x ) ={r R w ( x ,r )>0} (6) Such an approach in establishing the initial diagnosis, however, is risky because of the potential risk factors or symptoms of several diseases simultaneously and the difficulty of precisely defining them. With patient data xX on the presence of disease symptoms and risk factors in the form of numbers w(x, s), sSo(x), and w(x, r), rRo(x) we define the "distance of the patients health condition" from patterns of potential diseases included in sets o(S) and o(R). This can be done as follows. The model of the current health condition of patient xX, defined on the basis of occurring disease symptoms and risk factors will take the form of a pair: f(x) = [fS(x), fR(x)], xX where: fS(x) = [w(x, s); sSo(x)], fR(x) = [w(x, r); rRo (x)] symbols s*(m) and r*(m) will be indicated as disease patterns number m accordingly from the terms of symptoms and risk d1[fs(x), s*(m)] factors [2, 15]. With symbol d1(fS(x), s*(m)) m we will denote the distance (similarity) of the patient's health status x (resulting from occurring symptoms) from the disease pattern m , defined on the basis of diseases symptoms, and analogously we will denote it with symbol d2[fR(x), r*(m)], m of the patients health condition distance x, (due to the occurrence of risk factors) from the disease pattern. m defined on the basis of risk factors. Set M[S0(x)] of the "most probable" disease due to disease symptoms will be calculated as follows: M (S0 (x))= m* M0 | d1 [ fs (x) , s* ( m* )] =minM0 (S) d1 [ fs (x) , s* ( m )] . m (10) thus w(x, s) ­ degree of "occurrence severity" of symptom sS (determined by the physician on a scale of [0, 1], during the first visit), and similarly w(x, r) - the degree of severity of the presence of risk factors number r of the examined patient (also on a scale of [0, 1]). Set of o(S) suggested diseases with a set of occurring symptoms will be specified as follows: (S) = {mM|So(x)Sm}. o (7) In turn, the set (Ro(x)) of "most probable" diseases due to the occurring risk factors will be determined as follows: M (R0 ( x )) = m* M0 | d2 [ fR ( x ) ,r * ( m* )] =minM0 (R) d2 [ fR ( x ) ,r * ( m )] m Similarly set o(R) of diseases associated with occurring risk factors will be determine as follows: The intersection of these sets, however, is mostly empty set [1]. Ameljaczyk: Multicriteria similarity models for medical diagnostic support algorithms5 Thecommon part of these sets * = (So(x)) [Ro (x)] is usually an empty set [1]. An interesting proposal for the determination of the set of diseases, the most probable simultaneously from the point of view of the set of occurring symptoms and risk factors, offers a multicriteria optimization theory and theory of space with relation [1, 16]. By determining the appropriate "diagnostic preferences" model we can define the following problem in the form of d2(m) · 11 · · 6 10 12 · · 5 · · 7 · 8 · 18 · 16 · 3 (M ,d (m) , R) . (11) * y · 17 13 · 15 · 14 · d1(m) Where function d(m) is a vector function of the distance (similarity) of the patient's health condition to the disease entity pattern number m d(m) = (d1(m), d2(m)), m . [2]: d 1 ( x ,m) = 1- d 2 ( x ,m) = 1- smS m(x) k o r mR l (12) R Figure 2Determining set MN diseases from which there is none more probable. These distances for patient x were defined as follows R R Set M N is the counter image [1] of the Pareto set YN . 0 0 M R = d -1 (Y R ) = m M o d (m )Y R . N N N w ( x , sm) ( sm), m M k k w ( x ,r m) (r m), mM l l (13) (16) m (x) o R ­ diagnostics preferences model (e.g., Pareto model [1]). In practice, the following three diagnostic preferences options are most frequently taken into account: 1. disease symptoms and risk factors are equally important (Pareto relationship) 2. disease symptoms are more important (hierarchical relationship) 3. risk factors are more important (hierarchical relationship). In the case of two criteria (13) and a relatively "small numerous" set Mo the above problem is very easy to solve graphically. An illustration of such a case is Figure 2. The image [1] of the set of diseases Mo in context of distance from the patient's health condition is set Y (Figure 2). Y = d( The ultimate determining factor may be in this situation the so-called "compromise solution" [16], which typically leads to a clear solution. In the above example, the initial estimate of the set = {1,..., 18} diseases. Set of possible diseases is set o R M N constitutes disease entities numbers {4, 6, 8, 9, 14} R (counter image of the set mM N ). The patient, therefore, has a "suspected occurrence" of diseases with numbers R mM N . When calculating the distance of images of these * diseases from the "utopian" (virtual) image y , the "most probable" (due to the ascertained symptoms and risk factors) disease, we can create a ranking of diseases for further diagnostic actions. * * Utopian diseases coordinates y * =( y1 , y 2 ) are determined as follows: * y1 = mind 1 (m ) , mMo * y 2 = mind 2 (m ) . mMo (17) ) = {d(m)R2|m }. (14) The solution of problem (11) will be the so-called Pareto set [1, 16], which is a set of diseases from the set of initial estimate o, from which there is "none more probable". This set will be denoted with the following symbol: R M N = { m0 Mo | does not exist m d(m)d(m0)}. ­{m0}, such that (15) The closest "most probable disease" resulting from the observed symptoms and risk factors is disease entity number 4. Practically, however, the "entire Pareto set" and the ranking of its elements should be presented to the physician for a final decision. The developed algorithm for determining the initial diagnosis as well as its most important properties underwent simulation testing using a designed simulator [15]. The simulation application was written in based on Microsoft .NET Framework. In the implementation process the C# programming language was used. The development environment, which was used to create the software, was 6Ameljaczyk: Multicriteria similarity models for medical diagnostic support algorithms Microsoft Visual Studio 2010. The simulator repository is generated dynamically (by determining appropriate parameters specified by the user). There also exists the possibility of using an MSSQL database to collect data concerning diseases and to use them in the simulation process. The presented software is a windowed application using Windows Forms GUI. In order to use the application it is required to install .NET Framework version 3 or higher. The results of individual simulations are depicted through the use of an additional Microsoft Charts library, allowing dynamic generation of different types of graphs. At the end of the simulation graphs are generated showing the area of probability detection and in particular the set of diagnoses from which there is none more probable. Each disease (specifically the image of its probability in relation to the patient's health condition) is shown on the graph as a dark colored point. "Diseases from which there is none more terminology" (the so-called Pareto front) are marked with a bright hue. The utopian point determined by the algorithm (ideal point) [1, 16] (located in the lower left corner of Graph 1) is the image of a "ideal similarity" of the patient's health condition to the "perfectly matching" "utopian" disease, which unfortunately is not in the repository. The user has the possibility to define at what stages of simulation graphs are generated with results (how many should there be). There is also the possibility of saving the results to a text file. The number (name) of the disease entity represented on the graph by a specified point can be acquired by clicking on the cursor. 1.0 0.9 0.8 0.7 0.6 d2 (m) 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 d1 (m) 0.7 0.8 The obtained diagram of similarity space can be provided with additional results of numerical characteristics of the obtained Pareto image [1]. Conclusion The procedure presented in the study can be considered as an initial diagnostic process that initiates each clinical path. It leads to the generation of a set (relatively speaking not numerous) of so-called diseases, from which there is none more probable. The next stage in the process of diagnosis is the selection of the "optimal set" of additional outpatient tests (clinical) which allows a final diagnostic decision to be made and then an optimal therapy is selected. A very important issue of the modeling process is the selection of the function of similarity form (distances d1 and d2), and the decision to adopt a suitable preferences model R. Specific mathematical formulas defining the so-called "distance functions" result from the accepted concepts of modeling [2, 3, 5, 17]. For example, in models based on the theory of Bayesian networks are suitable conditioned probability distributions [18]. In models based on the theory of fuzzy sets [6, 7, 19, 20] they are functions belonging to the set of initial diagnosis, and in models based on patterns, defined metrics in the so-called area of life, respectively [2, 3]. Diagnostic preferences models in specific situations do not have to be based on Pareto-type relations or "lexicography". These can be pessimist (optimist) model-type R Graph 1Determining set MN of diseases, from which there are none "more probable". Ameljaczyk: Multicriteria similarity models for medical diagnostic support algorithms7 relations or also so-called "collective preferences relations" in the case of diagnosing within the "medical consultation" from [1, 2]. This study's aim was to present a preliminary diagnosis model in such a way that it would be possible to use very rich and effective sets of capabilities in further studies that multicriteria optimization theory offers. The possibility of visualization of obtained results may be of great practical importance as an additional diagnostic tool for the family doctor. Graphical representation on a computer screen of the set of possible diagnoses (including a set of diagnoses, from which there is none more likely), and the values of quality indicators allow the physician to easily assess the suitability (reliability) of the acquired diagnoses later in the diagnostic procedures. The algorithm also allows for creating diagnosis rankings (creating a list diagnoses from the most probable to the least possible). The item diagnosis on the list of diagnoses is also linked to a number indicating the distance from the model of the patient's health condition. Simulation susceptibility is a very important property of the algorithm since it makes it easier to conduct research of the quality of the diagnosis process. It also gives the possibility of training or testing the physician's diagnostic skills. Suitably designed test data allows quick testing of the algorithm sensitivity against medical errors during the determination of symptoms, and risk factors but mainly their degree of intensification. Received November 13, 2012; revised January 9, 2013; accepted January 11, 2013; previously published online February 23, 2013

Journal

Bio-Algorithms and Med-Systemsde Gruyter

Published: Mar 1, 2013

There are no references for this article.