Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

A Bayesian Framework for Functional Mapping through Joint Modeling of Longitudinal and Time-to-Event Data

A Bayesian Framework for Functional Mapping through Joint Modeling of Longitudinal and... Hindawi Publishing Corporation International Journal of Plant Genomics Volume 2012, Article ID 680634, 12 pages doi:10.1155/2012/680634 Research Article A Bayesian Framework for Functional Mapping through Joint Modeling of Longitudinal and Time-to-Event Data 1, 2, 3 2 4, 5 4, 5 2, 3 Kiranmoy Das, Runze Li, Zhongwen Huang, Junyi Gai, and Rongling Wu Department of Statistics, Temple University, Philadelphia, PA 19122, USA Department of Statistics, The Pennsylvania State University, University Park, PA 16802, USA Center for Statistical Genetics, The Pennsylvania State University, Hershey, PA 17033, USA Department of Agronomy, Henan Institute of Science and Technology, Xinxiang 453003, China National Center for Soybean Improvement, National Key Laboratory of Crop Genetics and Germplasm Enhancement, Soybean Research Institute, Nanjing Agricultural University, Nanjing 210095, China Correspondence should be addressed to Kiranmoy Das, kiranmoy.das@temple.edu Received 20 January 2012; Revised 13 March 2012; Accepted 19 March 2012 Academic Editor: Pierre Sourdille Copyright © 2012 Kiranmoy Das et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The most powerful and comprehensive approach of study in modern biology is to understand the whole process of development and all events of importance to development which occur in the process. As a consequence, joint modeling of developmental processes and events has become one of the most demanding tasks in statistical research. Here, we propose a joint modeling framework for functional mapping of specific quantitative trait loci (QTLs) which controls developmental processes and the timing of development and their causal correlation over time. The joint model contains two submodels, one for a developmental process, known as a longitudinal trait, and the other for a developmental event, known as the time to event, which are connected through a QTL mapping framework. A nonparametric approach is used to model the mean and covariance function of the longitudinal trait while the traditional Cox proportional hazard (PH) model is used to model the event time. The joint model is applied to map QTLs that control whole-plant vegetative biomass growth and time to first flower in soybeans. Results show that this model should be broadly useful for detecting genes controlling physiological and pathological processes and other events of interest in biomedicine. 1. Introduction of how a cell, tissue, or organ grows and develops across the time-space scale. To study biology, a classic approach is dimension reduction A statistical dynamic model, called functional mapping, in which a biological phenomenon or process is dissected is one of the products of such developments [1, 2]. The merit into several discrete features over time and space. Most ef- of functional mapping lies in its biological relevance to study forts in the past decades have been made to understand the tempo-spatial pattern of change for the trait and further biological details of individual features and then use knowl- predict the physiological or pathological status of trait edge from each feature to draw an inference about biology phenotype. Functional mapping has proven to be powerful as a whole. There has been increasing recognition of the for elucidating the dynamic genetic architecture of complex limitation of this approach because it fails to detect a rule that phenotypic traits by identifying when specific genes (known governs the transition from one feature to next, thus leading as quantitative trait loci or QTLs) involved turn on and to a significant loss of information behind the development turn off and how long they are expressed in a time course. of a biological trait. More recently, tremendous devel- opments in statistics and computer science have enabled With the advent of new automatic techniques that collect scientists to model and compute the dynamic behavior of a dynamic data in a cost-effective way, functional mapping biological phenomenon and construct a comprehensive view can be anticipated to play an increasingly important role 2 International Journal of Plant Genomics in shedding light on the genetic control mechanisms of prevent or accelerate the outcome by genetic approaches. complex traits or diseases. Our model is constructed with a Bayesian paradigm and The statistical foundation of functional mapping is lon- model parameters are estimated by the MCMC algorithm. gitudinal data analysis or functional data analysis. There has Local polynomials are used to model the mean trajectory and been a considerable body of literature on statistical modeling generalized-linear-model- (GLM-) based approach is used to of time-varying mean and covariance structure using various model the covariance matrix. The model is validated using a parametric, nonparametric, and semiparametric methods real example in which whole-plant biomass as a longitudinal [3–7]. A joint mean-covariance model was proposed by Pou- trait measured at a series of discrete time points and the time rahmadi [8, 9], which shows some advantages over modeling to first flower as a time-to-event are jointly modeled through the mean and covariance separately. Since the publication functional mapping. The statistical properties of the model of the pioneering work by Laird and Ware [10], random applied to estimate QTL temporal effects in this example and effects model have been extensively used for longitudinal its practical usefulness are investigated by simulation studies. data analysis [11]. All these statistical approaches have been incorporated into functional mapping [12, 13], aiming to 2. Joint Modeling Framework provide the most parsimonious estimates of QTL effects for a given data set. A Bayesian algorithm for functional mapping 2.1. Model for the Longitudinal Trait. Genetic mapping has been proposed recently by Liu and Wu [14]. should be based on a segregating population, such as the The complexity of biology lies in the fact that no bio- backcross, F , or recombinant inbred lines (RILS), initiated logical trait is isolated, rather every trait is affected by other with two inbred lines each carrying an alternative allele. traits through genes and environmental factors. For example, An RIL population is generated by self-crossing the hybrids when a plant grows into a particular stage, reproductive of the two inbred lines continuously for 7-8 successive behavior, such as flowering, starts to emerge as one of the generations, which leads to two homozygous genotypes for important events in plant development. The time to first alternative alleles at each locus. Methods for other designs flower is highly associated with the amount of vegetative can be derived similarly. Suppose a backcross has n progeny growth, depending on the environment where the plant is which is genotyped to construct a linkage map, aiming at grown. Likewise, the time to recurrence of prostate cancer in locating putative QTLs that trigger significant effects on a humans is related with dynamic changes of prostate specific longitudinal trait and its associated event. For each progeny, antigen level. How to jointly model longitudinal and time- the trait is measured repeatedly at T different time points and to-event data within functional mapping has become an a time-to-event is also recorded. At a specific time point t, important issue for studying the common genetic basis of the phenotypic value of the trait for progeny i affected by a these processes and predicting events based on longitudinal putative QTL can be expressed by a linear model as follows: traits. Simultaneous modeling of longitudinal traits and time y (t) = x u (t) + r (t) + e (t), to events has been an active area in biostatistics during the i ij j i i (1) j=1 past twenty years. A linear random effects model and EM estimation approach are proposed by Henderson et al. [15] where x is an indicator variable for a possible QTL genotype ij for joint modeling. Guo and Carlin [16] made a comparative of progeny i and defined as 1 if a particular QTL genotype study between separate models and a joint model, showing j is indicated and 0 otherwise (j = 1for QTLgenotype that a joint model is more powerful when there is a strong QQ and 2 for genotype qq), u (t) is the mean phenotypic correlation between the trait and the event. Wang and Taylor value of QTL genotype j for progeny i at time t, r (t) is the [17] developed a Bayesian method and MCMC algorithm subject specific random effect, and e (t) is the residual error for joint modeling of longitudinal and event time data assumed to follow a normal distribution with mean zero and and applied their algorithm on AIDS data. A review article covariance matrix Σ. by Tsiatis and Davidian [18] nicely summarizes the recent The central theme of functional mapping is to model developments for such joint modeling. the mean and covariance structures for the longitudinal trait By simply estimating the correlation between longitu- efficiently. Here, we model the mean vector by polynomial dinal traits and event time, Lin and Wu [19] developed a function and the covariance matrix by an approach that first model that connects these two aspects within func- guarantees the positive definiteness of the estimated covari- tional mapping. However, they developed a likelihood- ance matrix. Without loss of generality, assume the response based framework where the covariance structure for the vector for progeny i, y = (y (1),... , y (T)), has mean 0 and i i i longitudinal trait was modelled by the known AR(1), and covariance matrix Σ. The response at time t, y (t), can be model parameters were estimated using maximum likeli- predicted by its predecessors as the follows: hood estimation. Taking advantage of event models, such as semiparametric Cox proportional hazard model, Weibull t−1 model, accelerated failure time (AFT) model, we here pro- y (t) = φ y (t ) +  (t), (2) i t,t i i pose a sophisticated model for joint modeling of longitudinal t =1 traitand time to eventtolocatethe QTLs whichcontrol the event via a dynamic trait. The detection of those QTLs where φ is the corresponding regression coefficient,  (t) t,t i that are common to these types of traits may help to is the prediction error for progeny i with mean = 0, International Journal of Plant Genomics 3 and σ (t) is its variance. Assuming that  (t)’s are uncor- Assume the vectors of subject specific random effects related (Pourahmadi [8]), we get cov( ) = D, a diagonal θ follow m-variate normal distribution with mean 0 and i i 2 2 matrix with σ (t) being the tth diagonal element, where covariance matrix σ I and they are independent of the = ( (1),... , (T)) is the vector of prediction errors. residual errors. Note that under this assumption, y |θ will i i i i i (r) T (m) T Hence, the matrix representation of the above autoregression follow MVN(X β + X θ , Σ), and the marginal distribu- i i i becomes (r) (m) (m)T tion of y will be MVN(X β , Σ + σ X X ). i j i i = My,(3) i i 2.2. Model for the Event Time. We use s to denote the event where M is a lower triangular matrix with 1’s in diagonal ele- time of progeny i. Since in the current situation, the event ments and −φ  in the (t, t )th position. The above equation t,t time is recorded for all progeny; no progeny is censored. simply gives Assuming a Cox proportional hazard model for this event, T T we get for progeny i, cov( ) = M cov y M = MΣM = D, (4) i i λ (t) = λ (t) exp γμ (t) , (11) i 0 ij which is related to the modified Cholesky decomposition of Σ [20]. where λ (t) denotes the baseline hazard at time t, μ (t) 0 ij Equation (4) will be considered as the basis for modeling is the mean longitudinal trait at time t for given θ when the covariance structure, since this guarantees the estimated progeny i is of QTL genotype j and the regression coefficient covariance matrix to be positive definite. Following Pourah- γ represents the effect of the trait on the event time. The madi [8, 9], we model the mean vector, unconstrained var- survival function for progeny i can be expressed in terms of iance parameters log σ (t), and dependence parameter φ , t,t t the hazard function as s (t) = exp [− λ (u)du]. i i using a polynomial function of a particular order, expressed The longitudinal model described above is linked to the as hazard model by γ.If γ = 0, then the event is independent 2 r of the trait, and hence we should better fit separate models u (t) = β + β t + β t +··· + β t,(5) j j0 j1 j2 jr for the trait and the event. However, when γ is different 2 m r (t) = θ + θ t + θ t +··· + θ t,(6) from zero, a joint model performs better than the separate i i0 i1 i2 im models [17]. For simplicity, the baseline hazard is assumed 2 2 g log σ = η + η t + η t +··· + η t,(7) t 0 1 2 g to be a step function, λ (t) = λ over a partition of the 0 0k observed time scale [0, max(s )] into K (possibly evenly 2 h φ  = δ + δ (t − t ) + δ (t − t ) +··· + δ (t − t ) , λ λ λ λ t,t 0 1 2 h spaced) intervals, (t = 0, t , t ,... , t ). The value of K is 0 1 2 K (8) (t = 1, 2,... , t − 1). usually not too large, possibly smaller than 10. The optimal (r, m, g, h) is determined from the informa- 2.3. Likelihood for the Joint Model. Since the QTL genotype tion criteria (AIC/BIC). We note that different genotypes of a progeny is unknown, we use a mixture model to describe are assumed to have the same covariance structure but the likelihood of the progeny in terms of its possible under- different means. Note that the above method of modeling lying QTL genotypes [21]. The joint likelihood of unknown the covariance structure for a longitudinal response is more parameters Θ given the longitudinal trait y = (y ) and i=1 robust than the traditional first-order autoregressive (AR(1)) n event time s = (s ) for all n progeny can be expressed as i=1 or compound symmetry (CS) structure since real data might ⎛ ⎞ not show a parametric dependence structure. We refer to the n 2 ⎝ ⎠ L Θ | y, s = ω π y , s | Q = j, θ proposed approach as GLM-based approach to estimate the j|i i i i i i=1 j=1 covariance matrix. ⎛ ⎧ n 2 Denote β = (β , β ,... , β )for QTLgenotype j and ⎨ j0 j1 jr = ω f y | Q = j, θ θ = (θ , θ ,... , θ )for subject i. Then, the conditional j|i i i i i i0 i1 im i=1 j=1 mean function for progeny i carrying QTL genotype j (j = 1 ⎫ ⎞ or 2) for the given subject specific random effect (θ )can be s ⎬ i i × λ (s ) exp − λ (u)du , expressed as i i i (r) T (m) T (12) μ = X β + X θ , (9) i i i ij j where π(·) denotes the joint density of the longitudinal trait where and event time; f (.)denotes amultivariatenormalwithQTL ⎡ ⎤ 2 r 1 t t ··· t i1 genotype-specific mean μ modeled as (9)and covariance i1 i1 ij ⎢ 2 r ⎥ 1 t t ··· t ⎢ i2 ⎥ i2 i2 matrix Σ; hazard function λ (s ) is modeled as (11); and ω i i j|i ⎢ ⎥ . . . . ⎢ . ⎥ . . . . . denotes the conditional probability of QTL genotype j given ⎢ ⎥ . . . . (r) ⎢ ⎥ X = . (10) that the marker information of projeny i and Q is the QTL i ⎢ 2 r ⎥ i 1 t t ··· t iτ ⎢ iτ iτ ⎥ genotype for the i-th subject. ⎢ ⎥ . . . . ⎢ ⎥ . . . . . ⎣ . ⎦ . . . . The QTL genotype is inferred from marker genotypes of 2 r the linkage map. Let M = (M ,... , M ) be the m-marker 1 t t ··· t i i1 im iT iT iT 4 International Journal of Plant Genomics genotypes for progeny i, D the position of the putative QTL Assuming that priors for different genotypes are inde- measured by its distance from the very first marker of an pendent, we can express the above posterior distribution as ordered linkage group, and D the distances between marker k 2 π β, σ , γ, D, η, δ, λ | y, s 1and k. Assume that the QTL is located between marker k 2 ∗ ∝ π y, s | β, σ , γ, D , η, δ, λ and k +1. Then, the conditional probability of QTL genotype 0 ⎡ ⎤ (16) j given the genotype of these two markers that flank the QTL 2 ∗ ⎣ ⎦ is expressed as × π β π σ π γ π(D )π η π(δ)π(λ ). j=1 ω = Prob Q = j | D , M , M , D , D . (13) j|i i ik i(k+1) k k+1 The full conditional distributions for the model param- eters, as derived in the Appendix, are used to estimate the Note that, given the QTL locations D , D ,and D ,one k k+1 parameters using the MCMC algorithm. Note that the full can compute d , the distance of the QTL from marker k and 1 conditional distribution for β is expressed as a product of a d , the distance of the QTL from marker k +1 [22]. Using normal distribution term which comes from the longitudinal the Haldane map function, one can compute recombination trait and two other terms from the hazard model. To update fractions between marker k and QTL (r ), between QTL and β , therefore, we use a Metropolis-Hastings (MH) algorithm marker k +1 (r ), and between markers k and k +1 (r)as with a normal proposal density since it is a part of its poste- follows: rior distribution. We also note that in the full conditionals of η and δ, normal distribution coming from the longitudinal 1 1 −2d −2d 1 2 part of the data is the main determinant. Hence for η and r = 1 − e , r = 1 − e , 1 2 2 2 δ, we consider normal proposals with the current value of (14) −2d the parameter as the mean and covariance matrix as Σ and r = 1 − e , η Σ , respectively. Selection of a good proposal density for γ is a bit tricky and we follow the recommendation given by where d = D − D is the distance between marker k k+1 k Wang and Taylor [17]. By evaluating several choices for a and k + 1. Wu et al. [22] provide a procedure for deriving good proposal, we consider a normal distribution with mean the conditional probabilities of QTL genotypes given marker as the current state of the parameter and a suitable standard interval genotypes for the backcross, F ,and RILpopula- deviation in such a way that the proposed density gets well tions, respectively. mixed with the target distribution (acceptance rate between Unknown parameters Θ in likelihood (12) contain QTL 0.25 to 0.40). Because of conjugacy, we can directly simulate genotype-specific parameters β , σ , the parameters that from the full conditional of λ ’s. 0k model the variance structure and dependence structure η = ∗ The parameter D which specifies the location of the (η , η ,... , η )and δ = (δ , δ ,... , δ ), as shown in models 0 1 g 0 1 h QTL is updated following the idea of Satagopan et al. [23] (7)and (8), respectively, the effect of the longitudinal trait by using the MH algorithm. A new value of D ,which we on the event γ,QTL position D and the baseline hazards ∗new ∗ denote by D , is generated from Uniform (max(0, D − λ . 0k ψ), min(D + ψ, D )), where ψ is the tuning parameter. ∗ ∗new Denote this proposed distribution by q(D , D ). The 2.4. Posterior Distribution and Sampling Procedure. We proposed value will be considered as the new value of the derive a Bayesian approach for estimating the unknown pa- chain with probability rameters. This will first need to specify the prior distributions ∗ ∗new for the parameters and, given the data and the priors, derive α(D , D ) the posterior distribution over all the unknown parameters. ∗new ∗new ∗ π D | y, s, β, γ, η, δ, λ q(D , D ) For β ,weplace amultivariatenormalprior with zero = min 1, . ∗ ∗ ∗new π D | y, s, β, γ, η, δ, λ q(D , D ) mean and covariance matrix Σ . An inverse gamma prior (17) with parameters (α ,α ) is considered for σ . We consider 1 2 ∗new ∗new ∗new a uniform prior on γ and uniform (0, D ) prior for the We note that π(D | ...) ∝ π(Q | D , M)π(D ) = ∗ n ∗new ∗new ∗ parameter D . Independent Gamma (a, b) priors are taken π(Q | D , M )π(D ) and, similarly, π(D | i i i=1 ∗ ∗ for λ ,for k = 1,... , K. Priors for η and δ are taken as 0k ...) = π(Q | D , M )π(D ). i i i=1 MVN(0,Σ )and MVN(0,Σ ), respectively. η δ Because of the independence among n progeny, Q is With the above priors and likelihood function, we have updated by updating for each Q separately. For each progeny the joint posterior distribution for the parameters. In this i, the full conditional density is in the form of a multinomial case, it is quite straightforward to get the full conditional pos- with the following cell probabilities: terior distributions. Assume that the priors are independent π Q = j π y , s | Q = j i i i i for different parameters. Thus, we get the posterior density Π Q = j | y , s = . i i i 2 ∗ π Q = j π y , s | Q = j of β, σ , γ, D , η, δ, λ as i i i i j =1 (18) 2 ∗ 2 ∗ π β, σ , γ, D , η, δ, λ | y, s ∝ π y, s | β, σ , γ, D , η, δ, λ 0 0 We can sample the QTL genotype directly from this full 2 ∗ × π β π σ π γ π(D )π η π(δ)π(λ ). conditional density at each cycle. Details of the estimation procedure can be found in Satagopan et al. [23]. (15) International Journal of Plant Genomics 5 45 [17]. Uniform prior was taken for D .For η and δ,we considered multivariate normal priors with zero means and diagonal covariance matrices with diagonal elements 30 and 35 20, respectively. To investigate the effect of prior distributions on the estimation method, we did a sensitivity analysis. Considering different sets of priors, we fitted our model 25 many times and it turned out the estimation is almost insensitive to the choice of priors. Hence the choice of our priors (even with huge variances for β, η,and δ)doesnot affect much the estimation of the model parameters. We fitted our joint model as described in Section 2,by MCMC sampling. We ran chains 120,000 times. To remove the effect of the starting values, we excluded first 20,000 burn-in iterations. With the remaining 100,000 iterations, we estimated the posterior distributions and the parameters were estimated by the posterior mean and also calculated the Time sample standard deviations for the posterior densities. For Figure 1: Whole-plant biomass growth trajectories for 184 soybean the MH algorithm, the acceptance rates were 0.31, 0.26, 0.25, RILs. The time to first flower is indicated by a vertical line on 2 and 0.30 for σ , γ, η,and δ,respectively. each biomass growth curve. The black curve is the mean growth Since our model is complex, we perform several standard trajectory. diagnostic tests to assess the convergence of the Markov chains. First, we use the method proposed by Brooks and Gelman [25]. Considering five different chains with different starting points and discarding the burn-in iterations, we 3. Application computed multivariate potential scale reduction factors 3.1. Material and Analysis. The new model was applied to (MPSRF) to assess the convergence of the chains. Starting analyze a real data set for QTL mapping in soybeans. The points for the model parameters were drawn from the respective priors. The computed values of this statistic get mapping population contains 184 RILs derived from two cultivars, Nannong 1138-2 and Kefeng no. 1. A genetic stabilized near 1 after 60,000 iterations for our model linkage map of this population was first established by parameters, which indicates convergence of the chains. Zhang et al. [24] with 452 makers including RFLP, SSR, Second, we perform Geweke test which compares the EST distributed among the 21 linkage group. This map earlier part of the markov chain to the later part for assessing was recently updated by adding some new SSR makers and convergence. After deleting the burn-in iterations, from the dumping some unreliable markers. The new map contains remaining 100,000 iterations, we take out two subsequences; 834 molecular makers covering a length of 2,308 cM in 24 the first 50,000 and the last 50,000 iterations. Also consistent spectral density estimates at zero frequency are calculated to linkage groups, with an average genetic distance of 2.85 cM between adjacent markers. Those markers with missing compute the z-scores. The calculated Pvalues for our model information were excluded from the analysis, leading to a parameters are above 0.18, which indicates a small absolute z-scores assessing the convergence of our chains. A detailed total of 780 markers involved in our analysis. The plants and their parents were grown in a sample discussion of this method can be found in Geweke [26]. lattice design with two replicates at Jiangpu Soybean Exper- Finally, we perform the Heidelberger and Welch test as iment Station, Nanjing Agricultural University, China. After proposed by Heidelberger and Welch [27]. This test has two 20 days of seedling emergence, plant biomass (in gms.) were parts: a stationary test and a half-width test. Our chains after deletion of the burn-in iterations, pass the stationary test. measured once every 5–10 days until most plants stopped height growth. A total of 8 measurements were taken for the To assess whether the number of iterations is adequate to biomass and the time to get the first flower in that growing estimate the parameters accurately, we calculate relative half- width (RHW). We consider the default α value (0.05) and season was also recorded for each plant. Figure 1 shows the raw data, both the trait (biomass) and the event time for 184 predetermined tolerance value is taken as 0.1. For all our plants. model parameters, the calculated RHW is in between 0.045 and 0.081. This indicates that we have enough iterations to Prior distributions for the model parameters were taken as follows. For genotype specific fixed effect β ,amultivariate estimate the model parameters with 95% confidence under normal prior was used with zero mean and a diagonal tolerance of 0.1. covariance matrix with all diagonal elements 100. For σ , we took IG(3, 1) prior which has mean = 0.5withsmall 3.2. Results. Figure 1 is the growth trajectories of whole- variance (0.25). A uniform prior U(−3.0,−0.1) was taken plant biomass over time for all RILs, in which the times for γ following Wang and Taylor [17]. Observed time scale to first flower for each RIL are also indicated. There are for the event time data was partitioned into 5 parts; that great variability found for these two traits among RILs. is, we took K = 5 and considered independent gamma In the exploratory data analysis, we computed BIC values (0.04, 1.0) priors for λ ,... , λ , following Wang and Taylor for different orders for the triplet (r, m, g, h), showing that 01 05 Biomass 6 International Journal of Plant Genomics 0.15 Table 1: BIC values for selecting the optimum (r, m, g, h)tomodel 1 23 4 5 6 the mean-covariance structures for whole-plant biomass growth 0.1 trajectories and the time to first flower in soybeans. 0.05 (r, m, g, h) BIC 0.15 7 8 9 10 11 12 (3,3,3,2) 7.18 0.1 (3,2,3,2) 4.15 0.05 (3,3,2,2) 4.26 0.15 (3,2,2,2) 3.91 13 14 15 16 17 18 0.1 (2,2,2,2) 2.03 0.05 (2,3,2,2) 2.87 (2,2,3,2) 3.01 0.15 19 20 21 22 23 24 (2,3,3,2) 4.26 0.1 (3,3,3,3) 4.66 0.05 (3,2,3,3) 3.17 Genetic distance (cm) (3,3,2,3) 3.93 (3,2,2,3) 4.05 Figure 2: Marginal posterior plot for QTL locations over 24 linkage (2,2,2,3) 3.34 groups. Marker locations are indicated by ticks on the x-axis. (2,3,2,3) 5.16 (2,2,3,3) 4.88 (2,3,3,3) 3.16 rep rep rep Θ be the set of all model parameters and D = (y , s ) be the replicated data. Then given the data D = (y, s), the rep rep posterior predictive distribution of D is given by p(D | the BIC value is smallest for the order (2, 2, 2, 2), that rep D) = p(D | Θ)p(Θ | D)dΘ. is, second order polynomials can best fit the the mean, One can simulate from the posterior predictive distribu- variance, and dependence structures (Table 1). By scanning tion using the following two steps. First from the posterior for the existence of QTLs over the genetic linkage map, we distributions of the model parameters, simulate m values obtained the posterior distribution of the model parameters (vectors) of Θ.Nextfor each valueof Θ,simulateavalue and estimate marginal posterior distributions of the QTL rep ∗ (vector) D from the likelihood. The m values (vectors) of locations (D ) for all 24 linkage groups (Figure 2). rep D drawn in this way will essentially come from posterior We observed posterior peaks in linkage groups 1, 4, 15, rep predictive distribution p(D | D). 19, 20, 21, and 23. To draw inference about the existence of a We simulate 100 draws (m=100) from the posterior putative QTL in each of these groups, we computed the Bayes predictive distribution and then apply proposed joint anal- Factor (BF), defined as ysis and estimate the model parameters using MCMC as described earlier. We compute BF for each of those 7 linkage P(Y, S | κ = 0) BF = , (19) groups by considering the problem of testing the existence of P(Y, S | κ = 1) no QTL (null) versus the existence of one QTL (alternative). where κ denotes the number of QTLs in that particular Table 3 shows the average BF (with estimated SE) for each group. Following Jeffrey’s scale, the BFS with value smaller linkage group and the existence of QTL in groups 1, 20, and than 1 gives strong evidence against the null hypothesis and 23 is quite evident. higher than 10 gives enough evidence for the null hypothesis. Heritability (broad-sense) for the traits is estimated from Note that in this case, we are testing the existence of no the data, as the proportion of phenotypic variance attribu- QTL (null) versus the existence of one QTL (alternative) table to genetic variance. The estimated heritability in our for each linkage group. Here, no QTL in a group means case is 32.6%. Also we compute the percentage of variance β = β , for that group. So, in order to compute BF we explained by three identified QTLs. It turns out the QTLs 1 2 run our MCMC twice, first under the null and then under identified in linkage groups 1, 20, and 23 explain 6.8%, the alternative. The calculated BF for the above 7 linkage 14.3% and 11.4% of the total variance, respectively. groups were 0.2781, 11.274, 13.493, 10.610, 0.5913, 11.475, We show the marginal posterior plots with 95% credible and 0.7953, respectively, implying the existence of QTL in intervals for the parameter γ in Figure 3. Note that for all groups 1, 20, and 23. Table 2 provides genotype-specific three groups, the estimates and the confidence intervals are mean parameters for the QTLs located on linkage groups in the negative part which indicates a negative relationship 1, 20, and 23, along with their 95 percent credible intervals between the trait and the event time. Biologically this is (C.I.). sensible since the plants with higher body mass will take less Since the nature of our model is complex and our esti- time to have the first flower compared to the plants with mation is based on MCMC, we perform posterior predictive lower body masses. check for the aforementioned 7 linkage groups. We simulate Figure 4 illustrates genotypic differences in whole-plant observations to get the posterior predictive distribution. Let biomass trajectory and the time to first flower for the three Density Density Density Density International Journal of Plant Genomics 7 Table 2: Estimates of the parameters that describe genotype-specific biomass growth trajectories and QTL locations on linkage groups 1, 20, and 23, with 95% credible intervals. Group 1 Group 20 Group 23 Parameter estimate C.I. estimate C.I. estimate C.I. β −0.4762 (−0.5081, −0.4442) −1.1524 (−1.1641, −1.1405) 0.4190 (0.3897, 0.4484) β 3.3214 (3.3023, 3.3404) 2.6829 (2.6463, 2.7194) 0.5971 (0.5566, 0.6377) β 0.1548 (0.1363, 0.1731) 0.2295 (0.2020, 0.2570) 0.5438 (0.5176, 0.5701) β −5.4762 (−5.4788, −5.4734) −3.1004 (−3.0231, −2.9768) −2.3571 (−2.3701, −2.3442) β 8.4464 (8.4381, 8.4546) 5.3250 (5.3159, 5.3340) 4.4660 (4.4294, 4.5028) β −0.4702 (−0.5011, −0.4392) −0.0250 (−0.0618, 0.0118) 0.0410 (0.0390, 0.0432) Marker Interval Sat-356–B30T GNE097b–A199H LC4-4T–Sat-280 D 30.810 (29.1934, 31.7150) 49.600 (48.7515, 50.0726) 39.472 (38.8143, 40.1863) 1.2 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 −− 3 21− 0 −− 3 2.5 −2 −1.5 −1 −0.5 (a) (b) 2.5 1.5 0.5 2 1.5 1 0.5 −− − − (c) Figure 3: Marginal posterior plot for γ for linkage groups 1 (a), 20 (b), and 23 (c). Posterior density Posterior density Posterior density 8 International Journal of Plant Genomics Table 3: Posterior predictive check for 7 linkage groups. Linkage group BF (from actual data) BF with SE (from posterior predictive distribution) 1 0.2781 0.2659 (0.15) 4 11.274 12.086 (1.76) 15 13.493 12.962 (2.17) 19 10.610 11.138 (1.56) 20 0.5913 0.6281 (0.73) 21 11.475 12.183 (1.18) 23 0.7953 0.7682 (0.89) qq QQ 25 qq 25 QQ 10 10 T T T 2 1 2 T 1 2 3 4 5678 12345678 Time Time (a) (b) QQ qq T T Time (c) Figure 4: Whole-plant biomass growth trajectories and times to first flower (T and T )for twodifferent genotypes at each of the QTLs 1 2 detected on linkage groups 1 (a), 20 (b), and 23 (c). Genotypes QQ inherit two alleles from parent Nannong 1138-2, whereas genotype qq inherits two alleles from parent Kefeng no. 1. QTLs detected. At the QTL on linkage group 1, the allele (Q) alters its direction of genetic effect. Affected by the first two inherited from parent Nannong 1138-2 leads to increased QTLs, poor vegetative biomass growth in plants stimulates biomass growth and earlier flowering than the allele (q)from early flowering (Figures 4(a) and 4(b)). Yet, the QTL on parent Kefeng no. 1. The inverse pattern is observed for the linkage group 23 makes the fast-growing genotype to flower QTL on linkage group 20. The QTL on linkage group 23 earlier than the slow-growing genotype (Figure 4(c)). Biomass Biomass Biomass International Journal of Plant Genomics 9 Table 4: Simulation results for genotypic-mean parameters and QTL locations under different covariance structures, AR(1)-, CS- and GLM- based approach. AR(1) CS GLM-based approach Parameter Actual value Estimate MCSE Estimate MCSE Estimate MCSE β −1.1524 −1.1872 0.0871 −1.0982 0.1094 −1.1667 0.0361 β 2.6829 2.5391 0.0502 2.6103 0.0495 2.7001 0.0103 β 0.2295 0.2288 0.0805 0.2301 0.0302 0.2291 0.0113 β −3.1004 −3.1204 0.0307 −3.093 0.1025 −3.1255 0.1011 β 5.3250 5.3140 0.0291 5.2998 0.1130 5.3433 0.0405 β −0.0250 −0.0241 0.1302 −0.0257 0.1035 −0.0244 0.0603 D 43.00 39.32 2.3561 40.61 1.5694 42.48 1.1572 4. Simulation Study linkage groups 1, 20, and 23, we simulate data under the “null” model. As mentioned earlier, under the “null” model, We performed simulation studies to study the statistical β = β . For group 1, we consider the null model β = 1 2 1 properties of the joint model. We assumed an RIL design β = (−0.4762, 3, 3214, 0.1548), for group 20 it is β = 2 1 of 200 progeny and simulated 11 evenly spaced markers β = (−1.1524, 2.6829, 0.2295), and for group 23, our null on a linkage group of length 100 cM. A QTL is located at model is given as β = β = (0.4190, 0.5971, 0.5438). The 1 2 43 cM from the very first marker of the linkage group. To computed BFS for these three groups are 34.55, 46.79, and reflect a practical problem, we used parameter estimates 41.86, respectively, suggesting strong evidence for the null. of the soybean QTL detected in linkage group 20 as true values to simulate the data, allowing the covariance structure. 5. Discussion Time-dependent phenotypic values were assumed to follow a multivariate normal distribution and the event times were Tools to reveal the secret of life should reflect the dynamic taken the same as the soybean data. To make a comparison, nature of life. More recently, a series of statistical models we analyzed the simulated data using our nonparametric have been developed to map quantitative trait loci (QTLs) GLM-based covariance structure and the traditional AR(1) that control the dynamic process of a complex trait [1, 2, and CS covariance structures. 22, 28]. These so-called functional mapping models integrate The prior distributions for the model parameters were mathematical aspects of biological processes into a statistical taken in the same way as discussed in Section 3.1. A uniform framework derived to map complex trait QTLs and have prior on (0,100) was considered for D . For each situation, proved to be useful for detecting and identifying genes and we ran Markov chains 120,000 iterations and initial 20,000 genetic interactions involved in quantitative genetic variation burn-in iterations were discarded. Model parameters were for plant height, plant rooting ability, and animal body mass. estimated from the posterior distributions on the basis Functional mapping is also flexible to incorporate complex of remaining 100,000 iterations. The computed BF was biological phenomena, such as genotype-environment inter- 0.649 giving a strong evidence against the null hypothesis. actions and allometric scaling providing powerful means for Table 4 shows the means of the Bayesian estimates of model addressing biological questions of fundamental importance. parameters with their respective Monte Carlo standard In this paper, we develop a new version of functional errors, (MCSE). It can be seen that the estimates are quite mapping that can map QTLs for developmental events close to the actual values with a reasonably small standard affected by organismic growth trajectories in time. This errors which justifies the accuracy and precision of our version is benefited from recent statistical developments for estimation procedure. However, our GLM-based approach joint modeling of longitudinal traits and event time [16– provides better estimation of parameters than AR(1) and CS- 18, 29]. In the presence of strong correlation between a based approaches. longitudinal trait and event, a joint model performs better Figure 5 elucidates the marginal posterior plot for the than submodels separately for a single trait. In the joint QTL location under three different covariance structures. modeling framework for these two types of traits, we applied It is found that both AR(1)- and CS-based models provide a GLM-based approach to model the covariance structure the peaks at wrong locations, whereas GLM-based nonpara- and local polynomials for the mean curves. Bayesian esti- metric covariance structure locates QTL more accurately in mation method using the MCMC algorithm was used since which case the length of the credible interval is narrower than it is computationally much simpler than a likelihood-based those obtained from the former two structures. This provides approach. Simulation results show the effectiveness of GLM- numerical evidence that the proposed GLM-based model has based covariance model compared to traditional parametric better precision of QTL localization. compound symmetry or autoregressive structure. We perform further simulation studies to assess the Our joint model, embedded within functional mapping, reliability of BF in our data application. For each of the promotes the study of testing how QTLs pleiotropically 10 International Journal of Plant Genomics 0.3 0.3 0.25 0.25 0.2 0.2 0.15 0.15 0.1 0.1 0.05 0.05 39.3 40.6 34 36 38 40 42 44 46 36 38 40 42 44 46 48 QTL position QTL position (a) (b) 0.4 0.3 0.2 0.1 42.5 38 40 42 44 46 QTL position (c) Figure 5: The Bayesian estimate of QTL location (indicated by dash vertical lines) from simulation studies under different covariance structures, AR(1) (a), CS (b), and nonparametric (c). The true QTL location is at 43 cM from the very first marker of the linkage group. affect different biological processes and how one trait is longitudinal traits and events. Second, from a dynamic sys- predicted by other traits through genetic information. The tems perspective, we need to model dynamic correlations application of the new model to soybean mapping data among multiple longitudinal traits and multiple events. does not only validate its usefulness and utilization, but Third, with the availability of efficient genotyping tech- also gains new insight into the genetic and developmental niques, our model should accommodate a high-dimension regulation of trait correlations in plants. There is no doubt model selection scheme to identify significant genetic vari- that the new model can be modified to study the genetic ants from a flood of marker data. associations between HIV dynamics and the time to death as well as prostate specific antigen change and the time to recurrence of prostate. However, there is much room for Appendix modifying this model. First, to clearly describe our idea, Denote β = (β : j = 1, 2; j = j), S(t) = exp(− λ(u)du), we assume one QTL at a time for trait control. Epistatic −j j / 2 r 2 m interactions between multiple QTLs may play an important h(t) = (1, t, t ,... , t ), and g(t) = (1, t, t ,... , t ). Let m role in trait development as well as in correlations between be the number of progeny that carries QTL genotype j.We Posterior density Posterior density Posterior density International Journal of Plant Genomics 11 derive the full conditional distributions for the unknown Similarly, the full conditional distributions for the other parameters as parameters can be derived as follows: ⎡ ⎛ ⎞ ⎤ π β | y, s, σ , γ, D, η, δ, λ , β T T j −j ⎣ ⎝ ⎠ ⎦ π γ |· ∝ exp γ h (s )β + g (s )θ i i i i j=1 i=1 ∝ π β π y, s | β, σ , γ, D, η, δ, λ ⎡ ⎤ ∝ π β π s | y, β, σ , γ, D, η, δ, λ ⎣ ⎦ × S (s ) . i i 2 i=1 × π y | β, σ , γ, D, η, δ, λ ⎡ ⎤ ⎡ n mj −(1/2) ⎣ ⎦ 1 1 π η |· ∝ det (Σ) f T (r) T −1 ∝ exp − Σ β − y − X β j β j i j i=1 2 2 i=1 ⎡ ⎤ ⎡ ⎛ ⎞ ⎤ 2 j 1 1 (r) −1 × exp − η Σ η − y − X β (r) T T −1 i η i j ⎦ ⎣ ⎝ ⎠ ⎦ ×Σ y − X β × exp γ h (s )β i i i 2 2 j j j=1 i=1 i=1 ⎡ ⎤ m T (m) T (r) T (m) T −1 ×− X θ Σ y − X β − X θ . ⎣ ⎦ i i i i i ( ) × S s i i i=1 ⎡ ⎤ T −(1/2) 1 1 ⎣ ⎦ T (r) T −1 π(δ |·) ∝ det (Σ) = exp − β Σ β − y − X β j β j j 2 2 i=1 i=1 ⎤ ⎡ ⎛ ⎞ ⎤ 2 j 1 1 (r) T −1 (r) T T −1 × exp − δ Σ δ − y − X β ⎦ ⎣ ⎝ ⎠ ⎦ i δ i ( ) j ×Σ y − X β × exp γ h s β i i i i j j 2 2 j=1 i=1 i=1 ⎡ ⎤ (m) (r) (m) T −1 T T ×− X θ Σ y − X β − X θ . ⎣ ⎦ i i i i i i × S (s ) j i i i=1 ⎡ ⎤ mj −(1/2) (m) (m)T 2 2 T (r)T (r) T ⎣ ⎦ −1 −1 π σ |· ∝ det Σ + σ X X ∝ exp β Σ β + β X Σ X β i i j β j j i i j i=1 i=1 ⎤ ⎡ ⎛ ⎞ ⎤ m m α j j −α −1 2 × σ exp − (r) T T T −1 ⎦ ⎣ ⎝ ⎠ ⎦ −2 y Σ X β exp γ h (s )β σ i i i i j j i=1 i=1 2 j ⎡ ⎤ 1 (r) T j × exp − y − X β ⎣ ⎦ j=1 i=1 ( ) × S s i i i=1 −1 ⎡ ⎛ ⎞ ⎤ (m) (m)T (r) m m ⎦ j j × Σ + σ X X y − X β . i i i j (r)T (r) (r) −1 −1 T T −1 T ⎣ ⎝ ⎠ ⎦ =exp β Σ + X Σ X β −2 y Σ X β i i i i j β j j $ % i=1 i=1 −1 ⎡ ⎛ ⎞ ⎤ ⎡ ⎤ π(λ |·) ∝ Gamma d +1, φ π(λ ) 0k k 0k 0k m m j j & ' −1 ⎣ ⎝ ⎠ ⎦ ⎣ ⎦ × exp γ h (s )β S (s ) . 1 i i i i = Gamma a + d +1, φ + , k λ 0k i=1 i=1 (A.1) (A.4) Hence, we have λ λ where d is the number of events in the interval (t , t ]and k−1 k $ % π β | . :MVN β , M , V t $ % j β β  k T T ⎡ ⎛ ⎞ ⎤ ⎡ ⎤ φ = exp γ h(t)β + g(t)θ dt 0k j i m m j j λ (A.2) λ k−1 T i:s ≥t ⎣ ⎝ ⎠ ⎦ ⎣ ⎦ k ( ) ( ) × exp γ h s β S s , i i i i j (A.5) i $ % i=1 i=1 T T ( ) ( ) + exp γ h t β + g t θ dt. j i λ λ k−1 i:t <s ≤t where k−1 k ⎡ ⎤ ⎛ ⎞ ⎛ ⎞ −1 m m j j Acknowledgments ⎢ ⎥ (r)T (r) (r)T −1 −1 −1 ⎝ ⎠ ⎝ ⎠ M = ⎣ Σ + X Σ X X Σ y ⎦ , β i β i i i i=1 i=1 This work is partially supported by NSF/IOS-0923975, ⎛ ⎞ −1 Changjiang Scholars Award, “Thousand-Person Plan” (r)T (r) −1 −1 ⎝ ⎠ Award, the China National Key Basic Research Program V = Σ + X Σ X . β i i (2006CB1017, 2009CB1184, 2010CB125906), the China i=1 National Hightech R&D Program (2006AA100104), the (A.3) 12 International Journal of Plant Genomics Natural Science Foundation of China (30671266), and the [17] Y. Wang and J. M. G. Taylor, “Jointly modeling longitudinal and event time data with application to acquired immunode- China MOE 111 Project (B08025). R.Li was supported by an ficiency syndrome,” Journal of the American Statistical Associa- NIDA Grant P50-DA10075 and an NNSF of China Grant tion, vol. 96, no. 455, pp. 895–905, 2001. 11028103. The content is solely the responsibility of the [18] A. A. Tsiatis and M. Davidian, “Joint modeling of longitudinal authors and does not necessarily represent the official views and time-to-event data: an overview,” Statistica Sinica, vol. 14, of the NIDA, or the NIH. no. 3, pp. 809–834, 2004. [19] M. Lin and R. Wu, “A joint model for nonparametric functional mapping of longitudinal trajectory and time-to- References event,” BMC Bioinformatics, vol. 7, article 138, 2006. [1] C. X. Ma, G. Casella, and R. Wu, “Functional mapping of [20] H. J. Newton, IMESLAB: A Time Series Analysis Laboratory,T. quantitative trait loci underlying the character process: a Wadsworth and Brooks/Coles, Pacific Grove, Calif, USA, 1988. theoretical framework,” Genetics, vol. 161, no. 4, pp. 1751– [21] E. S. Lander and S. Botstein, “Mapping mendelian factors 1762, 2002. underlying quantitative traits using RFLP linkage maps,” [2] R. Wu and M. Lin, “Opinion: functional mapping—how to Genetics, vol. 121, no. 1, p. 185, 1989. map and study the genetic architecture of dynamic complex [22] R. Wu, W. Hou, Y. Cui et al., “Modeling the genetic architec- traits,” Nature Reviews Genetics, vol. 7, no. 3, pp. 229–237, ture of complex traits with molecular markers,” Recent Patents on Nanotechnology, vol. 1, no. 1, pp. 41–49, 2007. [3] J. Fan and R. Li, “New estimation and model selection [23] J. M. Satagopan, B. S. Yandell, M. A. Newton, and T. C. procedures for semiparametric modeling in longitudinal data Osborn, “A Bayesian approach to detect quantitative trait loci analysis,” Journal of the American Statistical Association, vol. using Markov chain Monte Carlo,” Genetics, vol. 144, no. 2, pp. 99, no. 467, pp. 710–723, 2004. 805–816, 1996. [4] J. Fan, T. Huang, and R. Li, “Analysis of longitudinal data with [24] W. K. Zhang, Y. J. Wang, G. Z. Luo et al., “QTL mapping semiparametric estimation of covariance function,” Journal of of ten agronomic traits on the soybean (Glycine max L. the American Statistical Association, vol. 102, no. 478, pp. 632– Merr.) genetic map and their association with EST markers,” 641, 2007. Theoretical and Applied Genetics, vol. 108, no. 6, pp. 1131– [5] J. O. Ramsay and B. W. Silverman, Functional Data Analysis, 1139, 2004. Springer, New York, NY, USA, 2nd edition, 2005. [25] S. P. Brooks and A. Gelman, “General methods for monitoring [6] F.Yao,H.G.Mul ¨ ler, and J. L. Wang, “Functional data analysis convergence of iterative simulations,” Journal of Computa- for sparse longitudinal data,” Journal of the American Statistical tional and Graphical Statistics, vol. 7, no. 4, pp. 434–455, 1998. Association, vol. 100, no. 470, pp. 577–590, 2005. [26] J. Geweke, “Evaluating the accuracy of sampling-based ap- [7] F. Yao, H. G. Muller, and J. L. Wang, “Functional linear proaches to calculating posterior moments,” in Bayesian Sta- regression analysis for longitudinal data,” Annals of Statistics, tistics, vol. 4, Clarendon Press, Oxford, UK, 1992. vol. 33, no. 6, pp. 2873–2903, 2005. [27] P. Heidelberger andP.D.Welch,“Simulation runlengthcon- [8] M. Pourahmadi, “Joint mean-covariance models with appli- trol in the presence of an initial transient,” Operations Re- cations to longitudinal data: unconstrained parameterisation,” search, vol. 31, no. 6, pp. 1109–1144, 1983. Biometrika, vol. 86, no. 3, pp. 677–690, 1999. [28] R. Wu, C. X. Ma, M. Lin, Z. Wang, and G. Casella, “Functional [9] M. Pourahmadi, “Maximum likelihood estimation of gener- mapping of quantitative trait loci underlying growth trajecto- alised linear models for multivariate normal covariance ma- ries using a transform-both-sides logistic model,” Biometrics, trix,” Biometrika, vol. 87, no. 2, pp. 425–435, 2000. vol. 60, no. 3, pp. 729–738, 2004. [10] N. M. Laird and J. H. Ware, “Random-effects models for [29] D. Rizopoulos, “Dynamic predictions and prospective accu- longitudinal data,” Biometrics, vol. 38, no. 4, pp. 963–974, racy in joint models for longitudinal and time-to-event data,” Biometrics, vol. 67, no. 3, pp. 819–829, 2011. [11] M. Davidian and D. M. Giltinan, “Nonlinear models for repeated measurement data: an overview and update,” Journal of Agricultural, Biological, and Environmental Statistics, vol. 8, no. 4, pp. 387–419, 2003. [12] J. S. Yap, J. Fan, and R. Wu, “Nonparametric modeling of longitudinal covariance structure in functional mapping of quantitative trait loci,” Biometrics, vol. 65, no. 4, pp. 1068– 1077, 2009. [13] K. Das, J. Li, Z. Wang et al., “A dynamic model for genome- wide association studies,” Human Genetics, vol. 129, no. 6, pp. 629–639, 2011. [14] T. Liu and R. L. Wu, “A bayesian algorithm for functional map- ping of dynamic complex traits,” Algorithms, vol. 2, pp. 667– 691, 2009. [15] R. Henderson, P. Diggle, and A. Dobson, “Joint modelling of longitudinal measurements and event time data,” Biostatistics, vol. 1, pp. 465–480, 2000. [16] X. Guo and B. P. Carlin, “Separate and joint modeling of longitudinal and event time data using standard computer packages,” American Statistician, vol. 58, no. 1, pp. 16–24, 2004. International Journal of Peptides Advances in International Journal of BioMed Stem Cells Virolog y Research International International Genomics Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 Journal of Nucleic Acids International Journal of Zoology Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 Submit your manuscripts at http://www.hindawi.com The Scientific Journal of Signal Transduction World Journal Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 International Journal of Advances in Genetics Anatomy Biochemistry Research International Research International Microbiology Research International Bioinformatics Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 Enzyme Journal of International Journal of Molecular Biology Archaea Research Evolutionary Biology International Marine Biology Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png International Journal of Plant Genomics Hindawi Publishing Corporation

A Bayesian Framework for Functional Mapping through Joint Modeling of Longitudinal and Time-to-Event Data

Loading next page...
 
/lp/hindawi-publishing-corporation/a-bayesian-framework-for-functional-mapping-through-joint-modeling-of-V4EDLOibGL
Publisher
Hindawi Publishing Corporation
Copyright
Copyright © 2012 Kiranmoy Das et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
ISSN
1687-5370
DOI
10.1155/2012/680634
Publisher site
See Article on Publisher Site

Abstract

Hindawi Publishing Corporation International Journal of Plant Genomics Volume 2012, Article ID 680634, 12 pages doi:10.1155/2012/680634 Research Article A Bayesian Framework for Functional Mapping through Joint Modeling of Longitudinal and Time-to-Event Data 1, 2, 3 2 4, 5 4, 5 2, 3 Kiranmoy Das, Runze Li, Zhongwen Huang, Junyi Gai, and Rongling Wu Department of Statistics, Temple University, Philadelphia, PA 19122, USA Department of Statistics, The Pennsylvania State University, University Park, PA 16802, USA Center for Statistical Genetics, The Pennsylvania State University, Hershey, PA 17033, USA Department of Agronomy, Henan Institute of Science and Technology, Xinxiang 453003, China National Center for Soybean Improvement, National Key Laboratory of Crop Genetics and Germplasm Enhancement, Soybean Research Institute, Nanjing Agricultural University, Nanjing 210095, China Correspondence should be addressed to Kiranmoy Das, kiranmoy.das@temple.edu Received 20 January 2012; Revised 13 March 2012; Accepted 19 March 2012 Academic Editor: Pierre Sourdille Copyright © 2012 Kiranmoy Das et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The most powerful and comprehensive approach of study in modern biology is to understand the whole process of development and all events of importance to development which occur in the process. As a consequence, joint modeling of developmental processes and events has become one of the most demanding tasks in statistical research. Here, we propose a joint modeling framework for functional mapping of specific quantitative trait loci (QTLs) which controls developmental processes and the timing of development and their causal correlation over time. The joint model contains two submodels, one for a developmental process, known as a longitudinal trait, and the other for a developmental event, known as the time to event, which are connected through a QTL mapping framework. A nonparametric approach is used to model the mean and covariance function of the longitudinal trait while the traditional Cox proportional hazard (PH) model is used to model the event time. The joint model is applied to map QTLs that control whole-plant vegetative biomass growth and time to first flower in soybeans. Results show that this model should be broadly useful for detecting genes controlling physiological and pathological processes and other events of interest in biomedicine. 1. Introduction of how a cell, tissue, or organ grows and develops across the time-space scale. To study biology, a classic approach is dimension reduction A statistical dynamic model, called functional mapping, in which a biological phenomenon or process is dissected is one of the products of such developments [1, 2]. The merit into several discrete features over time and space. Most ef- of functional mapping lies in its biological relevance to study forts in the past decades have been made to understand the tempo-spatial pattern of change for the trait and further biological details of individual features and then use knowl- predict the physiological or pathological status of trait edge from each feature to draw an inference about biology phenotype. Functional mapping has proven to be powerful as a whole. There has been increasing recognition of the for elucidating the dynamic genetic architecture of complex limitation of this approach because it fails to detect a rule that phenotypic traits by identifying when specific genes (known governs the transition from one feature to next, thus leading as quantitative trait loci or QTLs) involved turn on and to a significant loss of information behind the development turn off and how long they are expressed in a time course. of a biological trait. More recently, tremendous devel- opments in statistics and computer science have enabled With the advent of new automatic techniques that collect scientists to model and compute the dynamic behavior of a dynamic data in a cost-effective way, functional mapping biological phenomenon and construct a comprehensive view can be anticipated to play an increasingly important role 2 International Journal of Plant Genomics in shedding light on the genetic control mechanisms of prevent or accelerate the outcome by genetic approaches. complex traits or diseases. Our model is constructed with a Bayesian paradigm and The statistical foundation of functional mapping is lon- model parameters are estimated by the MCMC algorithm. gitudinal data analysis or functional data analysis. There has Local polynomials are used to model the mean trajectory and been a considerable body of literature on statistical modeling generalized-linear-model- (GLM-) based approach is used to of time-varying mean and covariance structure using various model the covariance matrix. The model is validated using a parametric, nonparametric, and semiparametric methods real example in which whole-plant biomass as a longitudinal [3–7]. A joint mean-covariance model was proposed by Pou- trait measured at a series of discrete time points and the time rahmadi [8, 9], which shows some advantages over modeling to first flower as a time-to-event are jointly modeled through the mean and covariance separately. Since the publication functional mapping. The statistical properties of the model of the pioneering work by Laird and Ware [10], random applied to estimate QTL temporal effects in this example and effects model have been extensively used for longitudinal its practical usefulness are investigated by simulation studies. data analysis [11]. All these statistical approaches have been incorporated into functional mapping [12, 13], aiming to 2. Joint Modeling Framework provide the most parsimonious estimates of QTL effects for a given data set. A Bayesian algorithm for functional mapping 2.1. Model for the Longitudinal Trait. Genetic mapping has been proposed recently by Liu and Wu [14]. should be based on a segregating population, such as the The complexity of biology lies in the fact that no bio- backcross, F , or recombinant inbred lines (RILS), initiated logical trait is isolated, rather every trait is affected by other with two inbred lines each carrying an alternative allele. traits through genes and environmental factors. For example, An RIL population is generated by self-crossing the hybrids when a plant grows into a particular stage, reproductive of the two inbred lines continuously for 7-8 successive behavior, such as flowering, starts to emerge as one of the generations, which leads to two homozygous genotypes for important events in plant development. The time to first alternative alleles at each locus. Methods for other designs flower is highly associated with the amount of vegetative can be derived similarly. Suppose a backcross has n progeny growth, depending on the environment where the plant is which is genotyped to construct a linkage map, aiming at grown. Likewise, the time to recurrence of prostate cancer in locating putative QTLs that trigger significant effects on a humans is related with dynamic changes of prostate specific longitudinal trait and its associated event. For each progeny, antigen level. How to jointly model longitudinal and time- the trait is measured repeatedly at T different time points and to-event data within functional mapping has become an a time-to-event is also recorded. At a specific time point t, important issue for studying the common genetic basis of the phenotypic value of the trait for progeny i affected by a these processes and predicting events based on longitudinal putative QTL can be expressed by a linear model as follows: traits. Simultaneous modeling of longitudinal traits and time y (t) = x u (t) + r (t) + e (t), to events has been an active area in biostatistics during the i ij j i i (1) j=1 past twenty years. A linear random effects model and EM estimation approach are proposed by Henderson et al. [15] where x is an indicator variable for a possible QTL genotype ij for joint modeling. Guo and Carlin [16] made a comparative of progeny i and defined as 1 if a particular QTL genotype study between separate models and a joint model, showing j is indicated and 0 otherwise (j = 1for QTLgenotype that a joint model is more powerful when there is a strong QQ and 2 for genotype qq), u (t) is the mean phenotypic correlation between the trait and the event. Wang and Taylor value of QTL genotype j for progeny i at time t, r (t) is the [17] developed a Bayesian method and MCMC algorithm subject specific random effect, and e (t) is the residual error for joint modeling of longitudinal and event time data assumed to follow a normal distribution with mean zero and and applied their algorithm on AIDS data. A review article covariance matrix Σ. by Tsiatis and Davidian [18] nicely summarizes the recent The central theme of functional mapping is to model developments for such joint modeling. the mean and covariance structures for the longitudinal trait By simply estimating the correlation between longitu- efficiently. Here, we model the mean vector by polynomial dinal traits and event time, Lin and Wu [19] developed a function and the covariance matrix by an approach that first model that connects these two aspects within func- guarantees the positive definiteness of the estimated covari- tional mapping. However, they developed a likelihood- ance matrix. Without loss of generality, assume the response based framework where the covariance structure for the vector for progeny i, y = (y (1),... , y (T)), has mean 0 and i i i longitudinal trait was modelled by the known AR(1), and covariance matrix Σ. The response at time t, y (t), can be model parameters were estimated using maximum likeli- predicted by its predecessors as the follows: hood estimation. Taking advantage of event models, such as semiparametric Cox proportional hazard model, Weibull t−1 model, accelerated failure time (AFT) model, we here pro- y (t) = φ y (t ) +  (t), (2) i t,t i i pose a sophisticated model for joint modeling of longitudinal t =1 traitand time to eventtolocatethe QTLs whichcontrol the event via a dynamic trait. The detection of those QTLs where φ is the corresponding regression coefficient,  (t) t,t i that are common to these types of traits may help to is the prediction error for progeny i with mean = 0, International Journal of Plant Genomics 3 and σ (t) is its variance. Assuming that  (t)’s are uncor- Assume the vectors of subject specific random effects related (Pourahmadi [8]), we get cov( ) = D, a diagonal θ follow m-variate normal distribution with mean 0 and i i 2 2 matrix with σ (t) being the tth diagonal element, where covariance matrix σ I and they are independent of the = ( (1),... , (T)) is the vector of prediction errors. residual errors. Note that under this assumption, y |θ will i i i i i (r) T (m) T Hence, the matrix representation of the above autoregression follow MVN(X β + X θ , Σ), and the marginal distribu- i i i becomes (r) (m) (m)T tion of y will be MVN(X β , Σ + σ X X ). i j i i = My,(3) i i 2.2. Model for the Event Time. We use s to denote the event where M is a lower triangular matrix with 1’s in diagonal ele- time of progeny i. Since in the current situation, the event ments and −φ  in the (t, t )th position. The above equation t,t time is recorded for all progeny; no progeny is censored. simply gives Assuming a Cox proportional hazard model for this event, T T we get for progeny i, cov( ) = M cov y M = MΣM = D, (4) i i λ (t) = λ (t) exp γμ (t) , (11) i 0 ij which is related to the modified Cholesky decomposition of Σ [20]. where λ (t) denotes the baseline hazard at time t, μ (t) 0 ij Equation (4) will be considered as the basis for modeling is the mean longitudinal trait at time t for given θ when the covariance structure, since this guarantees the estimated progeny i is of QTL genotype j and the regression coefficient covariance matrix to be positive definite. Following Pourah- γ represents the effect of the trait on the event time. The madi [8, 9], we model the mean vector, unconstrained var- survival function for progeny i can be expressed in terms of iance parameters log σ (t), and dependence parameter φ , t,t t the hazard function as s (t) = exp [− λ (u)du]. i i using a polynomial function of a particular order, expressed The longitudinal model described above is linked to the as hazard model by γ.If γ = 0, then the event is independent 2 r of the trait, and hence we should better fit separate models u (t) = β + β t + β t +··· + β t,(5) j j0 j1 j2 jr for the trait and the event. However, when γ is different 2 m r (t) = θ + θ t + θ t +··· + θ t,(6) from zero, a joint model performs better than the separate i i0 i1 i2 im models [17]. For simplicity, the baseline hazard is assumed 2 2 g log σ = η + η t + η t +··· + η t,(7) t 0 1 2 g to be a step function, λ (t) = λ over a partition of the 0 0k observed time scale [0, max(s )] into K (possibly evenly 2 h φ  = δ + δ (t − t ) + δ (t − t ) +··· + δ (t − t ) , λ λ λ λ t,t 0 1 2 h spaced) intervals, (t = 0, t , t ,... , t ). The value of K is 0 1 2 K (8) (t = 1, 2,... , t − 1). usually not too large, possibly smaller than 10. The optimal (r, m, g, h) is determined from the informa- 2.3. Likelihood for the Joint Model. Since the QTL genotype tion criteria (AIC/BIC). We note that different genotypes of a progeny is unknown, we use a mixture model to describe are assumed to have the same covariance structure but the likelihood of the progeny in terms of its possible under- different means. Note that the above method of modeling lying QTL genotypes [21]. The joint likelihood of unknown the covariance structure for a longitudinal response is more parameters Θ given the longitudinal trait y = (y ) and i=1 robust than the traditional first-order autoregressive (AR(1)) n event time s = (s ) for all n progeny can be expressed as i=1 or compound symmetry (CS) structure since real data might ⎛ ⎞ not show a parametric dependence structure. We refer to the n 2 ⎝ ⎠ L Θ | y, s = ω π y , s | Q = j, θ proposed approach as GLM-based approach to estimate the j|i i i i i i=1 j=1 covariance matrix. ⎛ ⎧ n 2 Denote β = (β , β ,... , β )for QTLgenotype j and ⎨ j0 j1 jr = ω f y | Q = j, θ θ = (θ , θ ,... , θ )for subject i. Then, the conditional j|i i i i i i0 i1 im i=1 j=1 mean function for progeny i carrying QTL genotype j (j = 1 ⎫ ⎞ or 2) for the given subject specific random effect (θ )can be s ⎬ i i × λ (s ) exp − λ (u)du , expressed as i i i (r) T (m) T (12) μ = X β + X θ , (9) i i i ij j where π(·) denotes the joint density of the longitudinal trait where and event time; f (.)denotes amultivariatenormalwithQTL ⎡ ⎤ 2 r 1 t t ··· t i1 genotype-specific mean μ modeled as (9)and covariance i1 i1 ij ⎢ 2 r ⎥ 1 t t ··· t ⎢ i2 ⎥ i2 i2 matrix Σ; hazard function λ (s ) is modeled as (11); and ω i i j|i ⎢ ⎥ . . . . ⎢ . ⎥ . . . . . denotes the conditional probability of QTL genotype j given ⎢ ⎥ . . . . (r) ⎢ ⎥ X = . (10) that the marker information of projeny i and Q is the QTL i ⎢ 2 r ⎥ i 1 t t ··· t iτ ⎢ iτ iτ ⎥ genotype for the i-th subject. ⎢ ⎥ . . . . ⎢ ⎥ . . . . . ⎣ . ⎦ . . . . The QTL genotype is inferred from marker genotypes of 2 r the linkage map. Let M = (M ,... , M ) be the m-marker 1 t t ··· t i i1 im iT iT iT 4 International Journal of Plant Genomics genotypes for progeny i, D the position of the putative QTL Assuming that priors for different genotypes are inde- measured by its distance from the very first marker of an pendent, we can express the above posterior distribution as ordered linkage group, and D the distances between marker k 2 π β, σ , γ, D, η, δ, λ | y, s 1and k. Assume that the QTL is located between marker k 2 ∗ ∝ π y, s | β, σ , γ, D , η, δ, λ and k +1. Then, the conditional probability of QTL genotype 0 ⎡ ⎤ (16) j given the genotype of these two markers that flank the QTL 2 ∗ ⎣ ⎦ is expressed as × π β π σ π γ π(D )π η π(δ)π(λ ). j=1 ω = Prob Q = j | D , M , M , D , D . (13) j|i i ik i(k+1) k k+1 The full conditional distributions for the model param- eters, as derived in the Appendix, are used to estimate the Note that, given the QTL locations D , D ,and D ,one k k+1 parameters using the MCMC algorithm. Note that the full can compute d , the distance of the QTL from marker k and 1 conditional distribution for β is expressed as a product of a d , the distance of the QTL from marker k +1 [22]. Using normal distribution term which comes from the longitudinal the Haldane map function, one can compute recombination trait and two other terms from the hazard model. To update fractions between marker k and QTL (r ), between QTL and β , therefore, we use a Metropolis-Hastings (MH) algorithm marker k +1 (r ), and between markers k and k +1 (r)as with a normal proposal density since it is a part of its poste- follows: rior distribution. We also note that in the full conditionals of η and δ, normal distribution coming from the longitudinal 1 1 −2d −2d 1 2 part of the data is the main determinant. Hence for η and r = 1 − e , r = 1 − e , 1 2 2 2 δ, we consider normal proposals with the current value of (14) −2d the parameter as the mean and covariance matrix as Σ and r = 1 − e , η Σ , respectively. Selection of a good proposal density for γ is a bit tricky and we follow the recommendation given by where d = D − D is the distance between marker k k+1 k Wang and Taylor [17]. By evaluating several choices for a and k + 1. Wu et al. [22] provide a procedure for deriving good proposal, we consider a normal distribution with mean the conditional probabilities of QTL genotypes given marker as the current state of the parameter and a suitable standard interval genotypes for the backcross, F ,and RILpopula- deviation in such a way that the proposed density gets well tions, respectively. mixed with the target distribution (acceptance rate between Unknown parameters Θ in likelihood (12) contain QTL 0.25 to 0.40). Because of conjugacy, we can directly simulate genotype-specific parameters β , σ , the parameters that from the full conditional of λ ’s. 0k model the variance structure and dependence structure η = ∗ The parameter D which specifies the location of the (η , η ,... , η )and δ = (δ , δ ,... , δ ), as shown in models 0 1 g 0 1 h QTL is updated following the idea of Satagopan et al. [23] (7)and (8), respectively, the effect of the longitudinal trait by using the MH algorithm. A new value of D ,which we on the event γ,QTL position D and the baseline hazards ∗new ∗ denote by D , is generated from Uniform (max(0, D − λ . 0k ψ), min(D + ψ, D )), where ψ is the tuning parameter. ∗ ∗new Denote this proposed distribution by q(D , D ). The 2.4. Posterior Distribution and Sampling Procedure. We proposed value will be considered as the new value of the derive a Bayesian approach for estimating the unknown pa- chain with probability rameters. This will first need to specify the prior distributions ∗ ∗new for the parameters and, given the data and the priors, derive α(D , D ) the posterior distribution over all the unknown parameters. ∗new ∗new ∗ π D | y, s, β, γ, η, δ, λ q(D , D ) For β ,weplace amultivariatenormalprior with zero = min 1, . ∗ ∗ ∗new π D | y, s, β, γ, η, δ, λ q(D , D ) mean and covariance matrix Σ . An inverse gamma prior (17) with parameters (α ,α ) is considered for σ . We consider 1 2 ∗new ∗new ∗new a uniform prior on γ and uniform (0, D ) prior for the We note that π(D | ...) ∝ π(Q | D , M)π(D ) = ∗ n ∗new ∗new ∗ parameter D . Independent Gamma (a, b) priors are taken π(Q | D , M )π(D ) and, similarly, π(D | i i i=1 ∗ ∗ for λ ,for k = 1,... , K. Priors for η and δ are taken as 0k ...) = π(Q | D , M )π(D ). i i i=1 MVN(0,Σ )and MVN(0,Σ ), respectively. η δ Because of the independence among n progeny, Q is With the above priors and likelihood function, we have updated by updating for each Q separately. For each progeny the joint posterior distribution for the parameters. In this i, the full conditional density is in the form of a multinomial case, it is quite straightforward to get the full conditional pos- with the following cell probabilities: terior distributions. Assume that the priors are independent π Q = j π y , s | Q = j i i i i for different parameters. Thus, we get the posterior density Π Q = j | y , s = . i i i 2 ∗ π Q = j π y , s | Q = j of β, σ , γ, D , η, δ, λ as i i i i j =1 (18) 2 ∗ 2 ∗ π β, σ , γ, D , η, δ, λ | y, s ∝ π y, s | β, σ , γ, D , η, δ, λ 0 0 We can sample the QTL genotype directly from this full 2 ∗ × π β π σ π γ π(D )π η π(δ)π(λ ). conditional density at each cycle. Details of the estimation procedure can be found in Satagopan et al. [23]. (15) International Journal of Plant Genomics 5 45 [17]. Uniform prior was taken for D .For η and δ,we considered multivariate normal priors with zero means and diagonal covariance matrices with diagonal elements 30 and 35 20, respectively. To investigate the effect of prior distributions on the estimation method, we did a sensitivity analysis. Considering different sets of priors, we fitted our model 25 many times and it turned out the estimation is almost insensitive to the choice of priors. Hence the choice of our priors (even with huge variances for β, η,and δ)doesnot affect much the estimation of the model parameters. We fitted our joint model as described in Section 2,by MCMC sampling. We ran chains 120,000 times. To remove the effect of the starting values, we excluded first 20,000 burn-in iterations. With the remaining 100,000 iterations, we estimated the posterior distributions and the parameters were estimated by the posterior mean and also calculated the Time sample standard deviations for the posterior densities. For Figure 1: Whole-plant biomass growth trajectories for 184 soybean the MH algorithm, the acceptance rates were 0.31, 0.26, 0.25, RILs. The time to first flower is indicated by a vertical line on 2 and 0.30 for σ , γ, η,and δ,respectively. each biomass growth curve. The black curve is the mean growth Since our model is complex, we perform several standard trajectory. diagnostic tests to assess the convergence of the Markov chains. First, we use the method proposed by Brooks and Gelman [25]. Considering five different chains with different starting points and discarding the burn-in iterations, we 3. Application computed multivariate potential scale reduction factors 3.1. Material and Analysis. The new model was applied to (MPSRF) to assess the convergence of the chains. Starting analyze a real data set for QTL mapping in soybeans. The points for the model parameters were drawn from the respective priors. The computed values of this statistic get mapping population contains 184 RILs derived from two cultivars, Nannong 1138-2 and Kefeng no. 1. A genetic stabilized near 1 after 60,000 iterations for our model linkage map of this population was first established by parameters, which indicates convergence of the chains. Zhang et al. [24] with 452 makers including RFLP, SSR, Second, we perform Geweke test which compares the EST distributed among the 21 linkage group. This map earlier part of the markov chain to the later part for assessing was recently updated by adding some new SSR makers and convergence. After deleting the burn-in iterations, from the dumping some unreliable markers. The new map contains remaining 100,000 iterations, we take out two subsequences; 834 molecular makers covering a length of 2,308 cM in 24 the first 50,000 and the last 50,000 iterations. Also consistent spectral density estimates at zero frequency are calculated to linkage groups, with an average genetic distance of 2.85 cM between adjacent markers. Those markers with missing compute the z-scores. The calculated Pvalues for our model information were excluded from the analysis, leading to a parameters are above 0.18, which indicates a small absolute z-scores assessing the convergence of our chains. A detailed total of 780 markers involved in our analysis. The plants and their parents were grown in a sample discussion of this method can be found in Geweke [26]. lattice design with two replicates at Jiangpu Soybean Exper- Finally, we perform the Heidelberger and Welch test as iment Station, Nanjing Agricultural University, China. After proposed by Heidelberger and Welch [27]. This test has two 20 days of seedling emergence, plant biomass (in gms.) were parts: a stationary test and a half-width test. Our chains after deletion of the burn-in iterations, pass the stationary test. measured once every 5–10 days until most plants stopped height growth. A total of 8 measurements were taken for the To assess whether the number of iterations is adequate to biomass and the time to get the first flower in that growing estimate the parameters accurately, we calculate relative half- width (RHW). We consider the default α value (0.05) and season was also recorded for each plant. Figure 1 shows the raw data, both the trait (biomass) and the event time for 184 predetermined tolerance value is taken as 0.1. For all our plants. model parameters, the calculated RHW is in between 0.045 and 0.081. This indicates that we have enough iterations to Prior distributions for the model parameters were taken as follows. For genotype specific fixed effect β ,amultivariate estimate the model parameters with 95% confidence under normal prior was used with zero mean and a diagonal tolerance of 0.1. covariance matrix with all diagonal elements 100. For σ , we took IG(3, 1) prior which has mean = 0.5withsmall 3.2. Results. Figure 1 is the growth trajectories of whole- variance (0.25). A uniform prior U(−3.0,−0.1) was taken plant biomass over time for all RILs, in which the times for γ following Wang and Taylor [17]. Observed time scale to first flower for each RIL are also indicated. There are for the event time data was partitioned into 5 parts; that great variability found for these two traits among RILs. is, we took K = 5 and considered independent gamma In the exploratory data analysis, we computed BIC values (0.04, 1.0) priors for λ ,... , λ , following Wang and Taylor for different orders for the triplet (r, m, g, h), showing that 01 05 Biomass 6 International Journal of Plant Genomics 0.15 Table 1: BIC values for selecting the optimum (r, m, g, h)tomodel 1 23 4 5 6 the mean-covariance structures for whole-plant biomass growth 0.1 trajectories and the time to first flower in soybeans. 0.05 (r, m, g, h) BIC 0.15 7 8 9 10 11 12 (3,3,3,2) 7.18 0.1 (3,2,3,2) 4.15 0.05 (3,3,2,2) 4.26 0.15 (3,2,2,2) 3.91 13 14 15 16 17 18 0.1 (2,2,2,2) 2.03 0.05 (2,3,2,2) 2.87 (2,2,3,2) 3.01 0.15 19 20 21 22 23 24 (2,3,3,2) 4.26 0.1 (3,3,3,3) 4.66 0.05 (3,2,3,3) 3.17 Genetic distance (cm) (3,3,2,3) 3.93 (3,2,2,3) 4.05 Figure 2: Marginal posterior plot for QTL locations over 24 linkage (2,2,2,3) 3.34 groups. Marker locations are indicated by ticks on the x-axis. (2,3,2,3) 5.16 (2,2,3,3) 4.88 (2,3,3,3) 3.16 rep rep rep Θ be the set of all model parameters and D = (y , s ) be the replicated data. Then given the data D = (y, s), the rep rep posterior predictive distribution of D is given by p(D | the BIC value is smallest for the order (2, 2, 2, 2), that rep D) = p(D | Θ)p(Θ | D)dΘ. is, second order polynomials can best fit the the mean, One can simulate from the posterior predictive distribu- variance, and dependence structures (Table 1). By scanning tion using the following two steps. First from the posterior for the existence of QTLs over the genetic linkage map, we distributions of the model parameters, simulate m values obtained the posterior distribution of the model parameters (vectors) of Θ.Nextfor each valueof Θ,simulateavalue and estimate marginal posterior distributions of the QTL rep ∗ (vector) D from the likelihood. The m values (vectors) of locations (D ) for all 24 linkage groups (Figure 2). rep D drawn in this way will essentially come from posterior We observed posterior peaks in linkage groups 1, 4, 15, rep predictive distribution p(D | D). 19, 20, 21, and 23. To draw inference about the existence of a We simulate 100 draws (m=100) from the posterior putative QTL in each of these groups, we computed the Bayes predictive distribution and then apply proposed joint anal- Factor (BF), defined as ysis and estimate the model parameters using MCMC as described earlier. We compute BF for each of those 7 linkage P(Y, S | κ = 0) BF = , (19) groups by considering the problem of testing the existence of P(Y, S | κ = 1) no QTL (null) versus the existence of one QTL (alternative). where κ denotes the number of QTLs in that particular Table 3 shows the average BF (with estimated SE) for each group. Following Jeffrey’s scale, the BFS with value smaller linkage group and the existence of QTL in groups 1, 20, and than 1 gives strong evidence against the null hypothesis and 23 is quite evident. higher than 10 gives enough evidence for the null hypothesis. Heritability (broad-sense) for the traits is estimated from Note that in this case, we are testing the existence of no the data, as the proportion of phenotypic variance attribu- QTL (null) versus the existence of one QTL (alternative) table to genetic variance. The estimated heritability in our for each linkage group. Here, no QTL in a group means case is 32.6%. Also we compute the percentage of variance β = β , for that group. So, in order to compute BF we explained by three identified QTLs. It turns out the QTLs 1 2 run our MCMC twice, first under the null and then under identified in linkage groups 1, 20, and 23 explain 6.8%, the alternative. The calculated BF for the above 7 linkage 14.3% and 11.4% of the total variance, respectively. groups were 0.2781, 11.274, 13.493, 10.610, 0.5913, 11.475, We show the marginal posterior plots with 95% credible and 0.7953, respectively, implying the existence of QTL in intervals for the parameter γ in Figure 3. Note that for all groups 1, 20, and 23. Table 2 provides genotype-specific three groups, the estimates and the confidence intervals are mean parameters for the QTLs located on linkage groups in the negative part which indicates a negative relationship 1, 20, and 23, along with their 95 percent credible intervals between the trait and the event time. Biologically this is (C.I.). sensible since the plants with higher body mass will take less Since the nature of our model is complex and our esti- time to have the first flower compared to the plants with mation is based on MCMC, we perform posterior predictive lower body masses. check for the aforementioned 7 linkage groups. We simulate Figure 4 illustrates genotypic differences in whole-plant observations to get the posterior predictive distribution. Let biomass trajectory and the time to first flower for the three Density Density Density Density International Journal of Plant Genomics 7 Table 2: Estimates of the parameters that describe genotype-specific biomass growth trajectories and QTL locations on linkage groups 1, 20, and 23, with 95% credible intervals. Group 1 Group 20 Group 23 Parameter estimate C.I. estimate C.I. estimate C.I. β −0.4762 (−0.5081, −0.4442) −1.1524 (−1.1641, −1.1405) 0.4190 (0.3897, 0.4484) β 3.3214 (3.3023, 3.3404) 2.6829 (2.6463, 2.7194) 0.5971 (0.5566, 0.6377) β 0.1548 (0.1363, 0.1731) 0.2295 (0.2020, 0.2570) 0.5438 (0.5176, 0.5701) β −5.4762 (−5.4788, −5.4734) −3.1004 (−3.0231, −2.9768) −2.3571 (−2.3701, −2.3442) β 8.4464 (8.4381, 8.4546) 5.3250 (5.3159, 5.3340) 4.4660 (4.4294, 4.5028) β −0.4702 (−0.5011, −0.4392) −0.0250 (−0.0618, 0.0118) 0.0410 (0.0390, 0.0432) Marker Interval Sat-356–B30T GNE097b–A199H LC4-4T–Sat-280 D 30.810 (29.1934, 31.7150) 49.600 (48.7515, 50.0726) 39.472 (38.8143, 40.1863) 1.2 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 −− 3 21− 0 −− 3 2.5 −2 −1.5 −1 −0.5 (a) (b) 2.5 1.5 0.5 2 1.5 1 0.5 −− − − (c) Figure 3: Marginal posterior plot for γ for linkage groups 1 (a), 20 (b), and 23 (c). Posterior density Posterior density Posterior density 8 International Journal of Plant Genomics Table 3: Posterior predictive check for 7 linkage groups. Linkage group BF (from actual data) BF with SE (from posterior predictive distribution) 1 0.2781 0.2659 (0.15) 4 11.274 12.086 (1.76) 15 13.493 12.962 (2.17) 19 10.610 11.138 (1.56) 20 0.5913 0.6281 (0.73) 21 11.475 12.183 (1.18) 23 0.7953 0.7682 (0.89) qq QQ 25 qq 25 QQ 10 10 T T T 2 1 2 T 1 2 3 4 5678 12345678 Time Time (a) (b) QQ qq T T Time (c) Figure 4: Whole-plant biomass growth trajectories and times to first flower (T and T )for twodifferent genotypes at each of the QTLs 1 2 detected on linkage groups 1 (a), 20 (b), and 23 (c). Genotypes QQ inherit two alleles from parent Nannong 1138-2, whereas genotype qq inherits two alleles from parent Kefeng no. 1. QTLs detected. At the QTL on linkage group 1, the allele (Q) alters its direction of genetic effect. Affected by the first two inherited from parent Nannong 1138-2 leads to increased QTLs, poor vegetative biomass growth in plants stimulates biomass growth and earlier flowering than the allele (q)from early flowering (Figures 4(a) and 4(b)). Yet, the QTL on parent Kefeng no. 1. The inverse pattern is observed for the linkage group 23 makes the fast-growing genotype to flower QTL on linkage group 20. The QTL on linkage group 23 earlier than the slow-growing genotype (Figure 4(c)). Biomass Biomass Biomass International Journal of Plant Genomics 9 Table 4: Simulation results for genotypic-mean parameters and QTL locations under different covariance structures, AR(1)-, CS- and GLM- based approach. AR(1) CS GLM-based approach Parameter Actual value Estimate MCSE Estimate MCSE Estimate MCSE β −1.1524 −1.1872 0.0871 −1.0982 0.1094 −1.1667 0.0361 β 2.6829 2.5391 0.0502 2.6103 0.0495 2.7001 0.0103 β 0.2295 0.2288 0.0805 0.2301 0.0302 0.2291 0.0113 β −3.1004 −3.1204 0.0307 −3.093 0.1025 −3.1255 0.1011 β 5.3250 5.3140 0.0291 5.2998 0.1130 5.3433 0.0405 β −0.0250 −0.0241 0.1302 −0.0257 0.1035 −0.0244 0.0603 D 43.00 39.32 2.3561 40.61 1.5694 42.48 1.1572 4. Simulation Study linkage groups 1, 20, and 23, we simulate data under the “null” model. As mentioned earlier, under the “null” model, We performed simulation studies to study the statistical β = β . For group 1, we consider the null model β = 1 2 1 properties of the joint model. We assumed an RIL design β = (−0.4762, 3, 3214, 0.1548), for group 20 it is β = 2 1 of 200 progeny and simulated 11 evenly spaced markers β = (−1.1524, 2.6829, 0.2295), and for group 23, our null on a linkage group of length 100 cM. A QTL is located at model is given as β = β = (0.4190, 0.5971, 0.5438). The 1 2 43 cM from the very first marker of the linkage group. To computed BFS for these three groups are 34.55, 46.79, and reflect a practical problem, we used parameter estimates 41.86, respectively, suggesting strong evidence for the null. of the soybean QTL detected in linkage group 20 as true values to simulate the data, allowing the covariance structure. 5. Discussion Time-dependent phenotypic values were assumed to follow a multivariate normal distribution and the event times were Tools to reveal the secret of life should reflect the dynamic taken the same as the soybean data. To make a comparison, nature of life. More recently, a series of statistical models we analyzed the simulated data using our nonparametric have been developed to map quantitative trait loci (QTLs) GLM-based covariance structure and the traditional AR(1) that control the dynamic process of a complex trait [1, 2, and CS covariance structures. 22, 28]. These so-called functional mapping models integrate The prior distributions for the model parameters were mathematical aspects of biological processes into a statistical taken in the same way as discussed in Section 3.1. A uniform framework derived to map complex trait QTLs and have prior on (0,100) was considered for D . For each situation, proved to be useful for detecting and identifying genes and we ran Markov chains 120,000 iterations and initial 20,000 genetic interactions involved in quantitative genetic variation burn-in iterations were discarded. Model parameters were for plant height, plant rooting ability, and animal body mass. estimated from the posterior distributions on the basis Functional mapping is also flexible to incorporate complex of remaining 100,000 iterations. The computed BF was biological phenomena, such as genotype-environment inter- 0.649 giving a strong evidence against the null hypothesis. actions and allometric scaling providing powerful means for Table 4 shows the means of the Bayesian estimates of model addressing biological questions of fundamental importance. parameters with their respective Monte Carlo standard In this paper, we develop a new version of functional errors, (MCSE). It can be seen that the estimates are quite mapping that can map QTLs for developmental events close to the actual values with a reasonably small standard affected by organismic growth trajectories in time. This errors which justifies the accuracy and precision of our version is benefited from recent statistical developments for estimation procedure. However, our GLM-based approach joint modeling of longitudinal traits and event time [16– provides better estimation of parameters than AR(1) and CS- 18, 29]. In the presence of strong correlation between a based approaches. longitudinal trait and event, a joint model performs better Figure 5 elucidates the marginal posterior plot for the than submodels separately for a single trait. In the joint QTL location under three different covariance structures. modeling framework for these two types of traits, we applied It is found that both AR(1)- and CS-based models provide a GLM-based approach to model the covariance structure the peaks at wrong locations, whereas GLM-based nonpara- and local polynomials for the mean curves. Bayesian esti- metric covariance structure locates QTL more accurately in mation method using the MCMC algorithm was used since which case the length of the credible interval is narrower than it is computationally much simpler than a likelihood-based those obtained from the former two structures. This provides approach. Simulation results show the effectiveness of GLM- numerical evidence that the proposed GLM-based model has based covariance model compared to traditional parametric better precision of QTL localization. compound symmetry or autoregressive structure. We perform further simulation studies to assess the Our joint model, embedded within functional mapping, reliability of BF in our data application. For each of the promotes the study of testing how QTLs pleiotropically 10 International Journal of Plant Genomics 0.3 0.3 0.25 0.25 0.2 0.2 0.15 0.15 0.1 0.1 0.05 0.05 39.3 40.6 34 36 38 40 42 44 46 36 38 40 42 44 46 48 QTL position QTL position (a) (b) 0.4 0.3 0.2 0.1 42.5 38 40 42 44 46 QTL position (c) Figure 5: The Bayesian estimate of QTL location (indicated by dash vertical lines) from simulation studies under different covariance structures, AR(1) (a), CS (b), and nonparametric (c). The true QTL location is at 43 cM from the very first marker of the linkage group. affect different biological processes and how one trait is longitudinal traits and events. Second, from a dynamic sys- predicted by other traits through genetic information. The tems perspective, we need to model dynamic correlations application of the new model to soybean mapping data among multiple longitudinal traits and multiple events. does not only validate its usefulness and utilization, but Third, with the availability of efficient genotyping tech- also gains new insight into the genetic and developmental niques, our model should accommodate a high-dimension regulation of trait correlations in plants. There is no doubt model selection scheme to identify significant genetic vari- that the new model can be modified to study the genetic ants from a flood of marker data. associations between HIV dynamics and the time to death as well as prostate specific antigen change and the time to recurrence of prostate. However, there is much room for Appendix modifying this model. First, to clearly describe our idea, Denote β = (β : j = 1, 2; j = j), S(t) = exp(− λ(u)du), we assume one QTL at a time for trait control. Epistatic −j j / 2 r 2 m interactions between multiple QTLs may play an important h(t) = (1, t, t ,... , t ), and g(t) = (1, t, t ,... , t ). Let m role in trait development as well as in correlations between be the number of progeny that carries QTL genotype j.We Posterior density Posterior density Posterior density International Journal of Plant Genomics 11 derive the full conditional distributions for the unknown Similarly, the full conditional distributions for the other parameters as parameters can be derived as follows: ⎡ ⎛ ⎞ ⎤ π β | y, s, σ , γ, D, η, δ, λ , β T T j −j ⎣ ⎝ ⎠ ⎦ π γ |· ∝ exp γ h (s )β + g (s )θ i i i i j=1 i=1 ∝ π β π y, s | β, σ , γ, D, η, δ, λ ⎡ ⎤ ∝ π β π s | y, β, σ , γ, D, η, δ, λ ⎣ ⎦ × S (s ) . i i 2 i=1 × π y | β, σ , γ, D, η, δ, λ ⎡ ⎤ ⎡ n mj −(1/2) ⎣ ⎦ 1 1 π η |· ∝ det (Σ) f T (r) T −1 ∝ exp − Σ β − y − X β j β j i j i=1 2 2 i=1 ⎡ ⎤ ⎡ ⎛ ⎞ ⎤ 2 j 1 1 (r) −1 × exp − η Σ η − y − X β (r) T T −1 i η i j ⎦ ⎣ ⎝ ⎠ ⎦ ×Σ y − X β × exp γ h (s )β i i i 2 2 j j j=1 i=1 i=1 ⎡ ⎤ m T (m) T (r) T (m) T −1 ×− X θ Σ y − X β − X θ . ⎣ ⎦ i i i i i ( ) × S s i i i=1 ⎡ ⎤ T −(1/2) 1 1 ⎣ ⎦ T (r) T −1 π(δ |·) ∝ det (Σ) = exp − β Σ β − y − X β j β j j 2 2 i=1 i=1 ⎤ ⎡ ⎛ ⎞ ⎤ 2 j 1 1 (r) T −1 (r) T T −1 × exp − δ Σ δ − y − X β ⎦ ⎣ ⎝ ⎠ ⎦ i δ i ( ) j ×Σ y − X β × exp γ h s β i i i i j j 2 2 j=1 i=1 i=1 ⎡ ⎤ (m) (r) (m) T −1 T T ×− X θ Σ y − X β − X θ . ⎣ ⎦ i i i i i i × S (s ) j i i i=1 ⎡ ⎤ mj −(1/2) (m) (m)T 2 2 T (r)T (r) T ⎣ ⎦ −1 −1 π σ |· ∝ det Σ + σ X X ∝ exp β Σ β + β X Σ X β i i j β j j i i j i=1 i=1 ⎤ ⎡ ⎛ ⎞ ⎤ m m α j j −α −1 2 × σ exp − (r) T T T −1 ⎦ ⎣ ⎝ ⎠ ⎦ −2 y Σ X β exp γ h (s )β σ i i i i j j i=1 i=1 2 j ⎡ ⎤ 1 (r) T j × exp − y − X β ⎣ ⎦ j=1 i=1 ( ) × S s i i i=1 −1 ⎡ ⎛ ⎞ ⎤ (m) (m)T (r) m m ⎦ j j × Σ + σ X X y − X β . i i i j (r)T (r) (r) −1 −1 T T −1 T ⎣ ⎝ ⎠ ⎦ =exp β Σ + X Σ X β −2 y Σ X β i i i i j β j j $ % i=1 i=1 −1 ⎡ ⎛ ⎞ ⎤ ⎡ ⎤ π(λ |·) ∝ Gamma d +1, φ π(λ ) 0k k 0k 0k m m j j & ' −1 ⎣ ⎝ ⎠ ⎦ ⎣ ⎦ × exp γ h (s )β S (s ) . 1 i i i i = Gamma a + d +1, φ + , k λ 0k i=1 i=1 (A.1) (A.4) Hence, we have λ λ where d is the number of events in the interval (t , t ]and k−1 k $ % π β | . :MVN β , M , V t $ % j β β  k T T ⎡ ⎛ ⎞ ⎤ ⎡ ⎤ φ = exp γ h(t)β + g(t)θ dt 0k j i m m j j λ (A.2) λ k−1 T i:s ≥t ⎣ ⎝ ⎠ ⎦ ⎣ ⎦ k ( ) ( ) × exp γ h s β S s , i i i i j (A.5) i $ % i=1 i=1 T T ( ) ( ) + exp γ h t β + g t θ dt. j i λ λ k−1 i:t <s ≤t where k−1 k ⎡ ⎤ ⎛ ⎞ ⎛ ⎞ −1 m m j j Acknowledgments ⎢ ⎥ (r)T (r) (r)T −1 −1 −1 ⎝ ⎠ ⎝ ⎠ M = ⎣ Σ + X Σ X X Σ y ⎦ , β i β i i i i=1 i=1 This work is partially supported by NSF/IOS-0923975, ⎛ ⎞ −1 Changjiang Scholars Award, “Thousand-Person Plan” (r)T (r) −1 −1 ⎝ ⎠ Award, the China National Key Basic Research Program V = Σ + X Σ X . β i i (2006CB1017, 2009CB1184, 2010CB125906), the China i=1 National Hightech R&D Program (2006AA100104), the (A.3) 12 International Journal of Plant Genomics Natural Science Foundation of China (30671266), and the [17] Y. Wang and J. M. G. Taylor, “Jointly modeling longitudinal and event time data with application to acquired immunode- China MOE 111 Project (B08025). R.Li was supported by an ficiency syndrome,” Journal of the American Statistical Associa- NIDA Grant P50-DA10075 and an NNSF of China Grant tion, vol. 96, no. 455, pp. 895–905, 2001. 11028103. The content is solely the responsibility of the [18] A. A. Tsiatis and M. Davidian, “Joint modeling of longitudinal authors and does not necessarily represent the official views and time-to-event data: an overview,” Statistica Sinica, vol. 14, of the NIDA, or the NIH. no. 3, pp. 809–834, 2004. [19] M. Lin and R. Wu, “A joint model for nonparametric functional mapping of longitudinal trajectory and time-to- References event,” BMC Bioinformatics, vol. 7, article 138, 2006. [1] C. X. Ma, G. Casella, and R. Wu, “Functional mapping of [20] H. J. Newton, IMESLAB: A Time Series Analysis Laboratory,T. quantitative trait loci underlying the character process: a Wadsworth and Brooks/Coles, Pacific Grove, Calif, USA, 1988. theoretical framework,” Genetics, vol. 161, no. 4, pp. 1751– [21] E. S. Lander and S. Botstein, “Mapping mendelian factors 1762, 2002. underlying quantitative traits using RFLP linkage maps,” [2] R. Wu and M. Lin, “Opinion: functional mapping—how to Genetics, vol. 121, no. 1, p. 185, 1989. map and study the genetic architecture of dynamic complex [22] R. Wu, W. Hou, Y. Cui et al., “Modeling the genetic architec- traits,” Nature Reviews Genetics, vol. 7, no. 3, pp. 229–237, ture of complex traits with molecular markers,” Recent Patents on Nanotechnology, vol. 1, no. 1, pp. 41–49, 2007. [3] J. Fan and R. Li, “New estimation and model selection [23] J. M. Satagopan, B. S. Yandell, M. A. Newton, and T. C. procedures for semiparametric modeling in longitudinal data Osborn, “A Bayesian approach to detect quantitative trait loci analysis,” Journal of the American Statistical Association, vol. using Markov chain Monte Carlo,” Genetics, vol. 144, no. 2, pp. 99, no. 467, pp. 710–723, 2004. 805–816, 1996. [4] J. Fan, T. Huang, and R. Li, “Analysis of longitudinal data with [24] W. K. Zhang, Y. J. Wang, G. Z. Luo et al., “QTL mapping semiparametric estimation of covariance function,” Journal of of ten agronomic traits on the soybean (Glycine max L. the American Statistical Association, vol. 102, no. 478, pp. 632– Merr.) genetic map and their association with EST markers,” 641, 2007. Theoretical and Applied Genetics, vol. 108, no. 6, pp. 1131– [5] J. O. Ramsay and B. W. Silverman, Functional Data Analysis, 1139, 2004. Springer, New York, NY, USA, 2nd edition, 2005. [25] S. P. Brooks and A. Gelman, “General methods for monitoring [6] F.Yao,H.G.Mul ¨ ler, and J. L. Wang, “Functional data analysis convergence of iterative simulations,” Journal of Computa- for sparse longitudinal data,” Journal of the American Statistical tional and Graphical Statistics, vol. 7, no. 4, pp. 434–455, 1998. Association, vol. 100, no. 470, pp. 577–590, 2005. [26] J. Geweke, “Evaluating the accuracy of sampling-based ap- [7] F. Yao, H. G. Muller, and J. L. Wang, “Functional linear proaches to calculating posterior moments,” in Bayesian Sta- regression analysis for longitudinal data,” Annals of Statistics, tistics, vol. 4, Clarendon Press, Oxford, UK, 1992. vol. 33, no. 6, pp. 2873–2903, 2005. [27] P. Heidelberger andP.D.Welch,“Simulation runlengthcon- [8] M. Pourahmadi, “Joint mean-covariance models with appli- trol in the presence of an initial transient,” Operations Re- cations to longitudinal data: unconstrained parameterisation,” search, vol. 31, no. 6, pp. 1109–1144, 1983. Biometrika, vol. 86, no. 3, pp. 677–690, 1999. [28] R. Wu, C. X. Ma, M. Lin, Z. Wang, and G. Casella, “Functional [9] M. Pourahmadi, “Maximum likelihood estimation of gener- mapping of quantitative trait loci underlying growth trajecto- alised linear models for multivariate normal covariance ma- ries using a transform-both-sides logistic model,” Biometrics, trix,” Biometrika, vol. 87, no. 2, pp. 425–435, 2000. vol. 60, no. 3, pp. 729–738, 2004. [10] N. M. Laird and J. H. Ware, “Random-effects models for [29] D. Rizopoulos, “Dynamic predictions and prospective accu- longitudinal data,” Biometrics, vol. 38, no. 4, pp. 963–974, racy in joint models for longitudinal and time-to-event data,” Biometrics, vol. 67, no. 3, pp. 819–829, 2011. [11] M. Davidian and D. M. Giltinan, “Nonlinear models for repeated measurement data: an overview and update,” Journal of Agricultural, Biological, and Environmental Statistics, vol. 8, no. 4, pp. 387–419, 2003. [12] J. S. Yap, J. Fan, and R. Wu, “Nonparametric modeling of longitudinal covariance structure in functional mapping of quantitative trait loci,” Biometrics, vol. 65, no. 4, pp. 1068– 1077, 2009. [13] K. Das, J. Li, Z. Wang et al., “A dynamic model for genome- wide association studies,” Human Genetics, vol. 129, no. 6, pp. 629–639, 2011. [14] T. Liu and R. L. Wu, “A bayesian algorithm for functional map- ping of dynamic complex traits,” Algorithms, vol. 2, pp. 667– 691, 2009. [15] R. Henderson, P. Diggle, and A. Dobson, “Joint modelling of longitudinal measurements and event time data,” Biostatistics, vol. 1, pp. 465–480, 2000. [16] X. Guo and B. P. Carlin, “Separate and joint modeling of longitudinal and event time data using standard computer packages,” American Statistician, vol. 58, no. 1, pp. 16–24, 2004. International Journal of Peptides Advances in International Journal of BioMed Stem Cells Virolog y Research International International Genomics Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 Journal of Nucleic Acids International Journal of Zoology Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 Submit your manuscripts at http://www.hindawi.com The Scientific Journal of Signal Transduction World Journal Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 International Journal of Advances in Genetics Anatomy Biochemistry Research International Research International Microbiology Research International Bioinformatics Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 Enzyme Journal of International Journal of Molecular Biology Archaea Research Evolutionary Biology International Marine Biology Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014

Journal

International Journal of Plant GenomicsHindawi Publishing Corporation

Published: May 22, 2012

References