Access the full text.

Sign up today, get DeepDyve free for 14 days.

Bio-Algorithms and Med-Systems
, Volume 8 – Jan 1, 2012

/lp/de-gruyter/techniques-of-nominal-data-analyses-d1vvFYNLcr

- Publisher
- de Gruyter
- Copyright
- Copyright © 2012 by the
- ISSN
- 1895-9091
- eISSN
- 1896-530X
- DOI
- 10.2478/bams-2012-0016
- Publisher site
- See Article on Publisher Site

Several advanced techniques for statistical analysis of nominal data were discussed to show how interesting associations between examined variables can be obtained using correspondence analysis, logistic regression and log-linear models. All these techniques are introduced on the example of medical data connected with the patients being on the curative diet. The studied data concerns four nominal variables: breaking the diet, overcoming diseases requiring the curative diet, sex and age of patient. Applied analyses were used for searching which of factors mentioned above influence on breaking the diet. The most popular technique in this case, chi-squared test of independence, indicates that only the age and illnesses overcoming before are related with breaking the diet whereas the sex is factor which does not have any relationship with the diet breaking. However, the deeper analysis revealed that we can not omit this variable in our research. Application of more compound statistical methods show the importance of age and sex in breaking the curative diet in detail. Presented methodology can be successfully applied not only in medicine but to data coming from different branches of science as well. KEYWORDS: logistic regression, generalized linear model, correspondence analysis, log-linear analysis Introduction The possible statistical methods used for analysis of nominal data has dramatically increased nowadays [1] particularly in biomedical area. The predominant role in this analysis still plays chi-squared test of independence, however in many cases its results are not sufficient for researchers. Usually, chi-squared statistics is only the starting point for applying more sophisticated techniques such as correspondence analysis, logistic regression or loglinear models. This work demonstrates usefulness of these advanced methods on the example of data connected with the disease-related diet. Many diseases require from the patient being on the proper diet which is often essential element of the successful therapy. The diet in this situation aims at protection of damaged organ, helping in recovering or providing these constituents which were lost by the organism during the disease. Proper diet as well as healthy eating habits are very hard for keeping for the long time not only for the ordinary people but the patients as well. Therefore, they decide to stop at all or to break for a while the diet even if it is highly unfavourable for their health. The consequences of breaking the diet can be very serious starting from the simple weakening of the organism up to overloading of the whole immune system. The main goal of this article is to determine the factors which are responsible for the breaking diet by the help of simple (chi-squared test of independence) and more advanced (correspondence analysis, logistic regression and log-linear models) statistical methods. The data set used here is only the illustrative example of demonstrating how interesting conclusions can be drawn when we do not limit our statistical analysis to chi-squared statistics and interpretation of contingency tables only. Materials and Methods 1. Description of analysed data The data set used in this article for the calculations is a part of more extensive examination concerning patients with ordered curative diet, formed for analysis of various medical problems. The patients come from one of the Clinic of Collegium Medicum of the Jagiellonian University. For our purposes we have selected 1000 patients from this huge data set and the following four nominal variables: 1. Resignation informs about the breaking the curative diet. It has the value 1 - when the person stop the diet and 0 in the opposite case. 2. Diseases inform if the patient suffered in the past from the disease requiring special diet. It has value 1 if disease appeared and 0 in opposite side. 3. Sex defining the sex of polled people (M - male, F - female) 4. Age the patient age determined by four age groups: below 20, 20-40, 40-60, above 60 years old. Original data was created in the program Excel. The fragment of this data is shown in Figure 1. For more complicated analyses the aggregation of cases with identical features were made instead of analysing original data. The resulted sheet is smaller and it contains only 16 groups (Age (4) Sex (2) Illnesses (2)). Resignation Not Yes Not Yes Not Yes Not Not Not Yes Not Not Not Illnesses Yes Yes Not Not Yes Yes Yes Not Yes Yes Yes Not Yes Sex F M F M F M F F M F F M M Age > 60 > 60 > 60 < 20 < 20 20 - 40 > 60 20 - 40 < 20 > 60 > 60 Figure 1. The fragment of the spread sheet containing original data to the analysis 2. Statistical calculations All the calculations were carried out with the STATISTICA 10 PL package. The graphic form of the results enclosed in this paper is in accordance to standard graphics used in this program. The vast majority of figures included in this paper are the screenshots of sheets and graphs obtained in this package. Result Analysis with Chi-squared test of independence We are interested in whether the breaking of the diet is dependent on the age, sex and illnesses requiring special diets. We start our studies from arrangement of the data into the contingency tables and then we analyse these tables with the chi-squared test of independence. Finally, we calculate statistics available via STATISTICA 10 PL package (e.g. Cramer's V, Phi coefficients), which give us the information about the association between qualitative variables. Conclusions from analysis with test chi-squared The Chi-squared test results enable preliminary statistical and technical evaluation of collected data: 1. Both the sheet and the graph displayed in the Figure 2 show significant relation between illness (appearance of disease-related-diet before the research) and breaking (resignation) the diet (p = 0.0000 = 0.24). In the case of patients who were suffered from disease-related-diet earlier, most of them did not break the curative diet whereas for the healthy people this difference is not so great. Calculated odds ratio (OR) equalled to 2.85 shows, that odds of breaking the diet among patients who were healthy is over 2.5 times greater than odds of breaking the diet among persons who were ill before. Wy k res interak c ji: Rez y gnac ja x Choroby 900 800 700 600 Lic z no c i 500 400 300 200 100 Not Illnes ses Yes Res ignation Not Res ignation Yes Figure 2. The sheet of results and the graph of the chi-squared test for Resignation and Illnesses variables. 2. The next sheet and the graph shown in figure 3 show that there is significant relation between the age structure and breaking the diet (p = .0000 = 0.22). It seems, that number of patients who do not continue the diet increases with the age in the range from 20 to 40 years, after that the fall takes place. Most of the elderly people (above 60 years) breaks the diet. The different dependence is seen for those patients who do not break the diet. In this case the increase with age is more expressive. Wykres interakcji: Rezygnacja x Wiek 450 400 350 300 Licznoci 20 - 40 Age >60 Resignation Not Resignation Yes Figure 3. The sheet of results and the graph of the chi-squared test for the resignation and the age variables. 3. Unfortunately, the dependence between the sex and breaking the diet was not confirmed by chi-squared statistics (p = 0.40087). This conclusion would suggest that the sex variable could be omitted in further analysis. The next parts of this article shows weather this assumption is correct. Application of the correspondence analysis The chi-square statistics which are discussed above inform about the statistical significance and strength of the connections between qualitative variables. However these statistics do not describe the character of connections between categories of analysed qualitative variables. Correspondence analysis is a descriptive and exploratory technology which provides information about the structure of relationships between columns and rows of a contingency table. Analysis of statistics and graphs obtained in the frame of this method allows for simple and intuitive conclusions about presented connections between variable's categories. The procedure of correspondence analysis runs in 6 stages. The majority of stages are held for the categories of one variable (rows) at first then - for the categories of the second variable (columns). The most important steps are: 1) determination of row and column profiles, 2) determination of row and column masses as well as average row and column profiles 3) calculation of the distances between rows (columns) using the certificate chi-squared, in order to make analysis of profiles we have to calculate the distances between them with the weighted Euclidean metrics called the certificate chi-squared. This certificate is connected with the inertia term as well, which plays in the correspondence analysis the same role as the variance term in statistics. The inertia is a measure of profiles dispersion around the corresponding average profiles. If the inertia is close to zero, then the difference between profiles and the average profile is small what means that the dispersion around the average profile is little. And the same, the great value of the inertia means the great dispersion around the average profile. The connection of the inertia with the value of the chisquared test manifests that the smaller inertia, the smaller chance of appearance of significant relationships between rows and columns in the contingency table, 4) the presentation of row (column) profiles in the space generated by units (rows) of correspondence matrix, 5) the reduction of space dimensions to those one which the best corresponds to analysed profiles. In this way we define the system of coordinates into which points corresponding to consecutive rows and columns will be projected. In practice, we considerate two or three singular vectors for both columns, and rows. We can present then the information about resemblance between rows (and columns) on simple two- or three-dimensional graph. Next we make the rotation of this system in order to maximize the variance, which is explained by consecutive coordinates of this space, 6) the creation of the common graph of row and column profiles by the help of main coordinates. When we determinate the appropriate number of dimensions, we can calculate coordinates of row and column profiles in the new system of coordinates (main coordinates). It allows for creating the plot showing the location of points which represent rows and columns from the contingency table. It is possible to use this shared graph for finding groups (unspecified and priori) of illustrating relationships between rows and columns. To make it clearer in the sense of the type and the direction of possible connections between features we create two graphs of row and column profiles. Figure 4 shows the new system of row and column profiles two analysed features. The panel on the left contains the location of the points representing two features: the breaking diet (resignation variable) and the presence of earlier disease-related diet (illnesses variable) whereas the panel on the right shows the location of the points representing the resignation and the age variables. Wykres 1W wspólrzd. wierszy i kolumn dla wymiaru: 1 W. wlasna: ,05765 (100,00% bezwladn. ) Wklad do chi-kwadrat: 92,644 0,5 0,4 0,3 0,2 0,1 0,0 -0,1 -0,2 -0,3 Wykres 1W wspólrzd. wierszy i kolumn dla wymiaru: 1 W. wlasna: ,04828 (100,00% bezwladn. ) Wklad do chi-kwadrat: 77,578 0,5 0,4 Yes Not 0,3 0,2 Yes 20 - 40 Warto wspólrz. Warto wspólrz. 0,1 0,0 -0,1 -0,2 Not Not Yes -0,3 -0,4 >60 Resignation Ilnesses Resignation Age Figure 4. One-dimensional graphs for individual axes (the first axis of the correspondence analysis is marked by broken line). The first axis (broken line in Figure 4), having the greatest contribution to explanation of the inertia, distinguishes two groups in both graphs of the fig. 4. Some divisions are obvious (breaking and not breaking the diet; being ill and healthy). The groups which are related with age, confirm observations derived from analysis of the chi-squared test of independence: relatively smaller group of people who break the diet is amongst the oldest groups (between 40 and 60 years and above 60 years old). As we can see, this procedure can be less useful for small contingency tables, although it is particularly useful for big tables facilitating their presentation and interpretation. We will receive more interesting results by examining connections between all analysed qualitative variables. The multidimensional analysis of the correspondence enables it. It is natural widening of simple analysis of the correspondence to issues about the number of variables greater than two. The demonstrative graph of column coordinates is shown in Figure 5. Wykres 2W wspólrzdnych kolumn; wymiar 1 x 2 Tabela wejc. (wiersze*kol.): 10 x 10 (Tabl.Burta) 1,2 Age: Resignation:Yes 2; W. wlasna: ,26002 (17,33% bezwladn. ) 1,0 0,8 0,6 Sex:M 0,4 0,2 0,0 -0,2 -0,4 -0,6 Age: -0,8 -1,0 -2,0 -1,5 -1,0 Wymiar -0,5 0,0 0,5 1,0 1,5 Age:>60 Sex:F Age:20 - 40 Illnesses:Yes Resignation:Not Illnesses:Not Wymiar 1; W. wlasna: ,42254 (28,17% bezwladn. ) Figure 5. Graph of column coordinates for the multidimensional situation. The first axis (vertical) divides the group of people who break the diet (on the left) from persons who do not break the diet (points Resignation: Yes and Resignation: Not). The overcame diseases which require the special diet reduce the chance of breaking the current diet. The point which represents Illnesses: Yes is located very close to the point Resignation: Not whereas the point Illnesses: Not is located close to the point Resignation: Yes. Points describing the oldest groups lie on the same side (right) of the graph, that points which represent the group of people who do not break the diet. However, that point representing the group of years is more distant from the point Resignation: Not than the point representing the oldest group. It is probably connected with the sex which seems to be a factor modifying the effects observed earlier. The second axis (horizontal) is resulted mainly with the division by the sex. Men seem to be associated with people aged in the range years. The point representing this age group lies close to the point Sex: M. On the other hand, the group of woman is connected with the oldest group (Age: > 60) and the youngest one (Age: < 20). The age group between 20 and 40 years lies halfway on the horizontal axis. Conclusions from the correspondence analysis The results obtained via this analysis: confirm conclusions derived from the analysis of chi-squared test of independence, indicate the appearance of interaction between analysed variables, Sex, although unrelated directly with breaking the diet, seems to influence the way of relationships between the remaining variables and variable Resignation. Model of the logistic regression More information about the examined phenomenon (i.e. the description of character and powers of the connection between variables) can be concluded from building models similar to models of repeated regressions. An excellent model for analysing discussed issue is the model of logistic regression. In general, this model is a certain mathematical model which can be used for description of the influence of a few variables X1 X2,..., Xk on the dichotomous variable Y and can be expressed in the following form: ( ( ) ) where: i the coefficients of the regression for i= 0,..., k xi independent variables (measurable or qualitative) for i = 1, 2, ..., k The regression coefficients are estimated with the maximum likelihood method. The statistical significance of estimated parameters is confirmed by the t-test or Wald test whereas the goodness of fit of the model is made with LR statistics. The -2log statistics of maximum likelihood was calculated for the studied model and for the model with the constant term only. The following results were obtained: Figure 6. The sheet of results for the logistic regression model. The modelling results presented in the table above (Figure 6) enable the statistical and technical evaluation of the built model. It turned out that we received the model very well fitted to data (the p-value for LR statistics is equal to 0.0000) of the following form: ( ) ( ( ) ) Conclusions from the logistic regression analysis All structural parameters of the model are statistically significant different from zero value. Parameters' evaluations estimated from the sample inform that: 1. The previous illnesses requiring the special diet are a hindering factor of breaking the current diet. The odds of the breaking the diet at persons who were ill are 2 times smaller (OR = 1/0.435 = 2.299) than the odds of breaking the diet in group of those who were healthy. 2. The sex is a factor stimulating breaking the diet. The odds of breaking the diet in the group of the man are about 1.4 times greater (OR = 1.384) than the odds of breaking the diet amongst the women. 3. Variable Age (determining four age groups , 20 - 40, , > 60) was inserted into the model creating three dummy variable Age1 (for < 20), Age2 (for 20 - 40) and Age3 (for ) representing suitable age groups. The oldest group (> 60) was accepted as the datum level. It turned out then that a. The odds of breaking the diet by patients aged in the range of years are about 1.5 times greater (OR = 1.476) than the odds of breaking the diet by elderly people (> 60 years), b. The odds of breaking the diet by persons aged between 20 and 40 years are about 2.5 times greater (OR = 2.481) than the odds of breaking the diet by elderly people (> 60 years), c. The odds of breaking the diet by the youngest persons (< 20 years) are about 3.3 times greater (OR = 3.285) than the odds of the resignation from the diet by elderly people. Application of the generalized linear model One of the problems of using the classical model of repeated regression in probability prediction is the range of its values. The probability assumes values from 0 to 1, however the probability calculated values can assume any real number for any model of regression. The first step to overcome these problems was consideration of odds instead of the probabilities. An "odds" is a ratio of the probability that some event will appear to the probability that this event will not appear. For the given event A the odds definition is as follows: ( ) ( ) ( ) ( ) ( ) For example, if p(A) = 0,8 then the odds of case A are equal S(A) = 0,8/(1-0,8) = 4. It means that probability of the appearance of case A is 4 times greater than probability of not appearance of this case. We may also say, that the odds of appearance of case A are 4 to 1. The odds assume values from 0 to the infinity and its logarithm assumes any real value. Therefore we can predict the logarithm of odds instead of probabilities. It leads to the logit transformation which has the following form: ( ( ) ) Because the logit assumes any real values we can search for the connection with independent variables in the linear form seen on the right hand-side of the above equation. It leads us to the generalized linear model. We will continue further analysis of our example by applying the general linear model this time. We can analyse models similar to the following models: A one-way analysis of variance in the form of: Logit (i) = + i Two-way analysis of variance of in the form of: Logit (ij) = + i + j + ()ij Regression of the simple linear form: Logit (i) = + xi Multiple Regression form: Logit (i) = i + 1xi1 +... + kxik Analysis of the covariance of in the form of: Logit (i) = i + xi It results that the generalized linear model with the binomial distribution and the function logit as the link function allows for the examination of interactions and more advanced models for our model data. The values of necessary coefficients are estimated with the maximum likelihood method. The statistical significance of evaluated individual variables is determined by Wald test whereas the goodness of fit of the model to data is estimated by statistics Devation D. It is a measure of the disagreement between observed and fitted values. The excellent fitting is achieved when D = 0. For large samples, D is approximated by chi-squared distribution with (n-p) degrees of freedom, where n is the number of groups and p is the number of parameters. D statistics enables to compare nested models as well. However, we have to remember that we can apply this statistics only in the case of grouped data. An alternative statistics can be 2 Pearson statistics but without the possibility of comparing models. Using the previous results as an inspiration we will carry on two-way analysis including Age, Illnesses and interaction between them. The sheet of results shown in Figure 7 concerns two-way model in the form of +i + j + ()ij where - logit probabilities of breaking the diet (variable Resignation) for persons aged above 60 years who did not suffer in the past from the disease requiring the special diet variable Illnesses (level of the reference), i - net effect of age groups in comparison to the group above 60 years (> 60) in the same category of variable Illnesses, j - net effect of those people who "were healthy in the past" in the reference to those who "were ill". Figure 7. Sheet of the results for two-way analysis (Age and Illness). Additionally the odds ratios were calculated. On the basis of achieved results we may state that observed earlier changes of the odds of breaking the diet with the change of age depend on the past illnesses. For people who were ill in the past the odds ratio slightly differ for individual age groups. However, for people who were healthy the odds ratio increases drastically when the age decreases achieving the odds ratio (OR) value in the youngest group = 5.84. It means that the odds of breaking the diet amongst persons < 20 years are almost six times bigger than the odds of the breaking the diet amongst persons above 60 years. As we can see, the implementation of interaction sheds new light on the discussed issue. However, the determined model is not the best because it does not take into account the effect of the sex. Lots of models were applied to analysis of described data: one- and multifactor analysis and covariance models. It is impossible to present the results of all analyses in detail. Just for the example we discussed above only twoway analysis. Results of the remainder analysis are gathered in the table below (see fig. 8). For convenience, statistically significant deviations are marked by bold font. Multifactor models: The table shown in Figure 8 contains characterizations of fitting of considered multifactor models. To make this table more readable, the following symbols were introduced: The variable Age - letter W and symbol in the model The variable Sex - letter P and the symbol in the model The variable Illness - letter C and the symbol in the model. Model W+P W +C P+C WP + W + P WC + W + C PC + P + C W+P+C WP + W + P + C WC + W + P + C PC + W + P + C WP + WC + W + P + C WP + PC + W + P + C WC + PC + W + P + C WP + WC + PC + W+P + C Logit () ++ ++ ++ + + + () + + + () + + + () +++ + + + + () + + + + () + + + + () + + + + () + () + + + + () + () + + + + () + () + + + + () + () + () Deviation 79,42 35,9 72,8 71 20,1 67,5 29,9 23,1 12,4 23 5,9 13,7 9,7 2,2 Degrees of the freedom 11 11 13 8 8 12 10 7 7 9 4 6 6 3 Figure 8. Selected multifactor models. It turned out that, the best and the simplest model is two-factor model including interaction between age and illnesses that appeared in the past (W C + W + P + C). Graphical interpretation of this situation is shown in Figure 9. 0,8 0,6 0,4 0,2 0,0 -0,2 -0,4 Logit MN KN -0,6 -0,8 -1,0 -1,2 -1,4 -1,6 -1,8 -2,0 MT KT 20 - 40 >60 Illnesses: Not Illnesses: Not Illnesses: Yes Illnesses: Yes Sex: F Sex: M Sex: F Sex: M Age Figure 9. Logit Graphs with including interaction between Age and Illnesses variables. The graph demonstrates four logits curves of breaking the diet in relation to age for groups determined by sex and past illnesses. Depicted curves are marked by the capitol letters M and K for the sex and letters T and N for persons sick in the past. The lowest curve (red one) marked by KT represents women who were ill earlier and shows the simple fall along the increase with the age in case of breaking the diet. The next curve marked by MT regards men who were ill earlier. This curve is parallel to previous one because the effect of the sex is additive with the age. The invariable difference between these two curves corresponds to the rise of OR when we are going from odds for women to odds for men. The third curve, marked by letters KN, regards women who did not suffer from illness which requires the special diet. The distance between this curve and first one describes the effect connected with the appearing of disease-related-diet. This effect sharply decreases with the age. The biggest OR is equal 5.8 and compares the youngest group with the oldest one (reference group). The fourth curve, marked with MN, presents men who were not ill in the past. The distance between this curve and previous one is an effect of the sex. This effect is the same independently whether persons were ill or not and it also decreases with age. Does it mean that we can finish our prospecting for the appropriate model? Not necessarily. Our model does not take into account (as the correspondence analysis suggested) the interaction with the Sex. The loglinear analysis helps us to make a decision if it is necessary or not. Application of the log-linear models We could treat quite differently the modelling of the associations which are interesting for us. Statisticians developed the special analysis of multidimensional contingency tables which enables to test the statistical significance of the contribution of different factors (included in the table) and their interaction as well. Seeking the appropriate model in multi-way tables (bigger than two-way) of contingency is often a difficult task. Log-linear analysis gives a lot of possibilities which are helpful in such searching. That is why we will also analyse the collected data with log-linear analysis. The most important results of the log-linear analysis are gathered in two following figures 10 and 11. Figure 10. Sheet containing the results of all interactions. First sheet (Figure 10) suggests that we should analyse models taking in account not only "resignation" but also the interaction at the most two factorial of remaining factors. The other sheet presents the results of tests for all marginal and partial models. We can see which two-dimensional and three-dimensional relations are essential. Figure 11. Sheet of results of tests of the marginal and partial associations. The partial association informs whether appropriate interaction is essential, when all other effects of the same order are already included in the model. Let us look for example at Effect 13 (see Figure 11). This effect represents the association or interaction between factor 1 - Disease and 3 Resignation. When we remove this effect from the model together with all other two-dimensional relations, difference in the chi-squared value (maximum likelihood method) amounts 50.6 with 1 degree of freedom (df = 1). This value is statistically significant at the p-value level 0.0000. Therefore fitting the model becomes significantly worse, when we eliminate this two-dimensional interaction from the model; so we will leave it. The test of the marginal association of effect 13 refers to the difference between the model not-considering of any two-dimensional interactions and the model which contains interaction 13 (and no other two-dimensional interactions). As we can see, the model fitting improves indeed when we add the association between factor 1 - Diseases and 3 - Resignation (chi-squared statistics = 91.08, df = 1, p = 0.0000). Selection of effects to the model First of all, we are interested in factors which are associated with 3 Resignation. From the sheet containing test results of all marginal and partial models we conclude that we should include in the model the following effects: 13, 23, 134 (marked by bold fonts in the table presented in Figure 11). Effect 123 (italics) which is relation between 1 - Illnesses and 2 - Sex and 3 - Resignation is not statistically significant, when we estimate it with all other associations. It can be explained by some different effects. That is why we do not include it in the model now. The most problematic effects are connected with age: 34, 234 (font crossed out). They are not significance all alone, but in the presence of other effects of the same order become important. Such a variable is called a confounder (or confounding variable) and should be included in the model. So the effect 234 i.e. association between 2 - Age, 3- Sex and 4 - Resignation is included in the model. It shows that the best model is a two-factor model with the interaction between the age and the sex. We continue further analysis by using the general linear model. We analyse the model + i + j + k + ()ij + ()ik, where - logit probabilities of breaking the diet for persons above 60 years whose were not ill in the past (level of the reference), i - net effect of age groups in comparing to the group "> 60" for the same category "being ill", j - net effect of "people being ill" in the reference to those who were not ill. k - net effect of the sex Graphical interpretation of this situation is shown in Figure 12. Wykres rozrzutu Liniowy wzgldem Wiek; kategorie wzgldem Ple i Choroby 1,5 MN 1,0 0,5 KN 0,0 Liniowy MT -0,5 -1,0 KT -1,5 Sex: F Sex: F Sex: M Sex: M Illnesses Not Illnesses Yes Illnesses Not Illnesses Yes -2,0 20 - 40 Wiek >60 Figure 12. Graph logits for the multifactor analysis (Age, Sex and Illnesses). This plot (Figure 12) shows the fitted values derived from the more compound model. Breaking the diet for women being ill in the past slightly increases up to 35 years and then falls whereas still rise for men in this period. The fall is stronger for women who earlier did not suffer from illness requiring the special diet (KN curve) than for women who being ill in the past. However, we have a constant fall in breaking the diet with age for men, but for men who did not suffer from illness requiring the special diet in the past this fall is very sudden (MN curve). The discussed curves for women and men no longer run parallel (see for comparison Figure 9) because the effect of the sex was included in interaction. The effect of interaction between the age and the sex is more visible for the youngest group (< 20) than for the older one. It also seems to be greater for those people who were suffered from illnesses which require special diet. Concluding, the inclusion of the sex in interaction fully explains existing associations. Conclusions Methods introduced in this article and their results were shown in order to illustrate different techniques of nominal data analysis. The following remarks can be made as the summary: Technical applications connected with analysed data Age is a significant factor of breaking the curative diet. Its influence is manifested by noticeable fall of breaking the diet with the age increase. This fall depends on the sex and overcoming illnesses. For the group of people not suffering from the diseases requiring the diet this fall is acute starting form the odds ratio equal almost six for the youngest group (< 20) For the group of patients who were ill earlier changes depend on the sex. For women, we have an increase first and then decrease of breaking the diet. For men we have a fall all the time for breaking the diet with the age increase. For men breaking the diet visibly rises especially in the case of the youngest patients. In case of issues in which different associations appear between examining variables, it is worthwhile to apply different techniques of modelling. It turned out that the traditional chi-squared test of independence is not enough and the most interesting relationships were obtained by the General conclusions help of more sophisticated analyses than simple interpretation of contingency tables. Performed analysis was used to calculate medical data but it seems that it can be successfully used in the case of issues originating from other fields of empirical studies. Acknowledgement The author would like to thank to the Neurology and the Cardiology Clinic for the possibility of applying the medicinal data to analyses described in this article.

Bio-Algorithms and Med-Systems – de Gruyter

**Published: ** Jan 1, 2012

Loading...

You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!

Read and print from thousands of top scholarly journals.

System error. Please try again!

Already have an account? Log in

Bookmark this article. You can see your Bookmarks on your DeepDyve Library.

To save an article, **log in** first, or **sign up** for a DeepDyve account if you don’t already have one.

Copy and paste the desired citation format or use the link below to download a file formatted for EndNote

Access the full text.

Sign up today, get DeepDyve free for 14 days.

All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.