Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Statistical Analysis of Efficient Unbalanced Factorial Designs for Two-Color Microarray Experiments

Statistical Analysis of Efficient Unbalanced Factorial Designs for Two-Color Microarray Experiments Statistical Analysis of Efficient Unbalanced Factorial Designs for Two-Color Microarray Experiments //// Hindawi Publishing Corporation Home Journals About Us About this Journal Submit a Manuscript Table of Contents Journal Menu Abstracting and Indexing Aims and Scope Article Processing Charges Articles in Press Author Guidelines Bibliographic Information Contact Information Editorial Board Editorial Workflow Free eTOC Alerts Subscription Information Open Focus Issues Focus Issue Guidelines Open Special Issues Published Special Issues Special Issue Guidelines Abstract Full-Text PDF Full-Text HTML Linked References How to Cite this Article Complete Special Issue International Journal of Plant Genomics Volume 2008 (2008), Article ID 584360, 16 pages doi:10.1155/2008/584360 Review Article <h2>Statistical Analysis of Efficient Unbalanced Factorial Designs for Two-Color Microarray Experiments</h2> Robert J. Tempelman Department of Animal Science, College of Agriculture and Natural Resources, Michigan State University, East Lansing, MI 48824-1225, USA Received 2 November 2007; Revised 22 January 2008; Accepted 25 April 2008 Academic Editor: Chunguang Du Copyright © 2008 Robert J. Tempelman. This is an open access article distributed under the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract Experimental designs that efficiently embed a fixed effects treatment structure within a random effects design structure typically require a mixed-model approach to data analyses. Although mixed model software tailored for the analysis of two-color microarray data is increasingly available, much of this software is generally not capable of correctly analyzing the elaborate incomplete block designs that are being increasingly proposed and used for factorial treatment structures. That is, optimized designs are generally unbalanced as it pertains to various treatment comparisons, with different specifications of experimental variability often required for different treatment factors. This paper uses a publicly available microarray dataset, as based upon an efficient experimental design, to demonstrate a proper mixed model analysis of a typical unbalanced factorial design characterized by incomplete blocks and hierarchical levels of variability. 1. Introduction The choice and optimization of experimental designs for two-color microarrays have been receiving increasing attention [ 1 – 13 ]. Interest has been particularly directed towards optimizing experiments that involve a factorial design construction [ 7 , 9 , 14 ] in order to study the joint effects of several factors such as, for example, genotypes, pathogens, and herbicides. It is well known by plant scientists that factorial designs are more efficient than one-factor-at-a-time studies and allow the investigation of potentially interesting interactions between two or more factors. For example, investigators may study how herbicide effects (i.e., mean differences) depend upon plant genotypes or times after application. Two-color systems such as spotted cDNA or long oligonucleotide microarrays involve hybridizations of two different mRNA samples to the same microarray, each of the two samples being labeled with a different dye (e.g., Cy3 or Cy5; Alexa555 or Alexa647). These microarrays, also simply referred to as arrays or slides, generally contain thousands of probes with generally a few (≤4) spots per probe, and most often just one spot per probe. Each probe specifically hybridizes to a matching mRNA transcript of interest within each sample. After hybridization, microarray images are scanned at two different wavelengths as appropriate for each dye, thereby providing two different fluorescence intensity measurements for each probe. Upon further preprocessing or normalization [ 15 ], these dye-specific intensities for each probe are believed to reflect the relative mRNA abundance for the corresponding transcript within the respectively labeled samples. The normalized intensities, or the Cy3/Cy5 ratio thereof, for each spot are typically logarithmically transformed to render data that is generally characterized to be approximately normally distributed. An increasingly unifying and indisputable message is that the heavily used common reference design is statistically inefficient [ 1 , 9 , 10 , 12 , 13 ]. Here, the same common reference sample or pool is reused as one of the two samples on every microarray, the other sample deriving from a treatment group of interest. Hence, inferences on differential expression are based only on indirect connections across arrays as samples from different treatments of interest are never directly connected or hybridized together on the same microarray. In contrast, most of the alternatively proposed efficient designs are incomplete block designs, the most popular being various deviations of the loop design as first proposed for microarrays by Kerr and Churchill [ 16 ]. In these designs, direct connections or hybridizations are typically reserved for the most important treatment comparisons with inference on other comparisons being generally as efficient as any based on the common reference design. The intent of this review is to reemphasize the use of mixed models as the foundation for statistical analysis of efficient factorial designs for microarrays. Mixed model analysis for microarray data was first proposed by Wolfinger et al. [ 17 ]. However, this and other previous expositions on the use of mixed model analysis for microarray data have been primarily directed towards the analysis of completely balanced designs [ 18 , 19 ] whereas many recently proposed designs for microarray studies are unbalanced with respect to, for example, different standard errors on all pairwise comparisons between treatment groups [ 10 , 13 ]. We will review various aspects of mixed model analysis for unbalanced designs, including a demonstration on publicly available data from a recent plant genomics study [ 20 ]. 2. The Connection between Mixed Models and Efficient Designs Efficient experimental designs are typically constructed such that their factors can be broadly partitioned into two categories: treatment structure factors and design structure factors [ 21 ]. The treatment structure naturally includes the factors of greatest interest; for example, herbicides, genotypes, tissues, and so forth, whose effects are deemed to be fixed. In other words, the levels of these fixed effects factors are specifically chosen by the investigator such that mean comparisons between such levels, for example, different treatments, are of primary interest. These factors also include any of whose levels are consistently reused over different experiments, such as dye labels, for example, Cy3 versus Cy5, for two-color microarrays. On the other hand, the design structure primarily includes random effects factors, whereby the levels of each such factor are considered to be randomly chosen from a conceptually infinite set of such levels [ 22 ]. For example, the specific arrays used for a microarray study are considered to be a random sample from a large, perhaps hypothetically infinite, population of arrays; similar claims would be made regarding biological replicates, for example, plants, pools thereof, or even field plots as dependent upon the experimental design [ 14 ]. Within each random-effects factor, the effects are typically specified to be normally, independently, and identically distributed (NIID) with variability in effects formally quantified by a variance component (VC). These design structure or random effects factors are typically further partitioned into two subcategories: blocking factors and experimental error factors. In two-color microarray experiments, arrays are typically blocking factors as treatments can be directly compared within arrays, although this is not true for the common reference design as previously noted. Blocking represents a longstanding and efficient experimental design strategy for improving precision of inference on treatment comparisons. Experimental error factors, such as plants or pooled samples thereof within treatments, are often necessary to be included as random effects in order to properly specify true experimental replication at the biological level rather than merely at the measurement or technical level. Such specifications are particularly required when multiple aliquots are derived from the same biological specimen for use in multiple arrays [ 20 , 23 ] or when probes for each gene transcript are spotted more than once on each array. Of course, plants may also alternatively serve as blocking factors in some designs if different tissues are compared within plants. Currently, there is much software available for microarray data analysis, some of which is only suited for studies having only a treatment structure but no pure design structure. Common examples include the analysis of data generated from single channel systems (e.g., Affymetrix) or of log ratios generated from common reference designs. When no random effects are specified, other than the residuals, the corresponding statistical models are then simply fixed-effects models. Ordinary least squares (OLS) inference is then typically used to infer upon the treatment effects in these studies. OLS is appropriate if the assumption is valid that there is only one composite residual source of variability such that the residuals unique to each observation are NIID. Conversely, statistical analysis of efficient two-color experiments having a fully integrated treatment and design structure needs to account for fixed and random effects as typical of a mixed effects model, more often simply referred to as a mixed model. Generalized least squares (GLS) analysis, also referred to as mixed-model analysis, has been recognized as optimal in terms of minimizing variance of estimates for inference on treatment comparisons. This is true not only for efficient microarray designs [ 10 , 17 , 19 , 24 ] but even for general plant science and agronomy research [ 25 – 27 ], including recent applications in plant genomics research [ 20 , 23 , 28 ]. Some of the more recently popular microarray data analysis software has some mixed model analysis capabilities [ 29 , 30 ]. Recall that some designs may be characterized by different levels of variability thereby requiring particular care in order to properly separate biological from technical replication, for example. Hence, it is imperative for the data analyst to know how to correctly construct the hypothesis test statistics, including the determination or, in some cases, the estimation of the appropriate degrees of freedom. Although, some of these issues have been discussed for balanced designs by Rosa et al. [ 19 ], they have not generally been carefully addressed for the analysis of microarray data generated from unbalanced designs. Optimally constructed experimental designs are often unbalanced with respect to inference on all pairwise treatment comparisons, such that even greater care for statistical inference is required than in completely balanced designs. For example, Wit et al. [ 13 ] proposed a method for optimizing two-color microarray designs to compare any number of treatment groups. Suppose that 9 different treatment groups are to be compared. Using the methods and software developed by Wit et al. [ 13 ], the recommended interwoven loop design that is optimized for A-optimality (lowest average squared standard errors for a particular arrangement of treatment comparisons) is provided in Figure 1 . Although Figure 1 appears to be visually symmetric with respect to the treatment labels, including that all treatment groups are dye balanced, not all treatment groups are directly hybridized against each other. Hence, inferences on all pairwise comparisons between treatment groups will not be equally precise. For example, the standard errors for the inference on treatments R2 versus R8 or R8 versus R24 will not be the same as that for treatments R8 versus S24 or R8 versus M2 due to the differences in the number and/or degree of direct and indirect connections for these two sets of comparisons in Figure 1 . Figure 1: Optimized interwoven loop design for 9 treatments using R package SMIDA (Wit et al., 2005). Each circle represents a different treatment group. Each arrow represents a single array hybridization with circle base representing the Cy3 labeled sample and tail end representing the Cy5 labeled sample. Even for some balanced factorial designs, where the standard errors for comparing mean differences for levels of a certain factor are the same for all pairwise comparisons, the experimental error structure can vary substantially for different factors. That is, substantial care is required in deriving the correct test statistics, particularly with split plot arrangements [ 14 ]. Of course, even when a completely balanced design is intended, data editing procedures that delete poor quality spots for certain genes would naturally result in unbalanced designs. 3. Case Study 3.1. Design Zou et al. [ 20 ] present an experiment where three different inoculate treatments were applied to soybean ( Glycine max. ) plants 14 days after planting. The three different inoculates included bacteria inoculation along with the avirulence gene avrB thereby conferring resistance (R), bacteria inoculation without avrB thereby conferring susceptibility (S), and a control group whereby the inoculate simply contained an MgCl 2 solution (M). Unfoliated leaves from three to four plants were drawn and pooled for each treatment at each of three different times after postinoculation; 2, 8, and 24 hours. Hence, the treatment structure was comprised of a factorial, that is, 3 inoculates ×3 times, for a total of 9 groups. A 10th group involving a fourth null inoculate with leaves harvested at 2 hours postinoculation, N2, was additionally studied by Zou et al. [ 20 ]. The complete dataset on gene expression data for all 27 684 genes represented on a set of three microarray platforms as used by Zou et al. [ 20 ] is available as accession number GSE 2961 from the NCBI gene expression omnibus (GEO) repository ( http://www.ncbi.nlm.nih.gov/geo/ ). The vast majority of the corresponding probes were spotted only once per array or slide for each platform. A graphical depiction of the 13 hybridizations that superimposes the design structure upon one replicate of the factorial treatment structure plus the additional 14th hybridization involving the 10th group N2 is illustrated in Figure 2 . Note that at least two aliquots per each pooled sample are used, each aliquot being labeled with different dyes such that each replicate pool is used in at least two different hybridizations or arrays with opposite dye assignments. In other words, this design is characterized by technical replication such that it is imperative to explicitly model samples within inoculate by time combination as the biological replicates, that is, a set of random effects for modeling experimental error. Failing to do so would confuse pseudoreplication with true replication in the statistical analysis as each of the 2+ aliquots per each pool would then be incorrectly counted as 2+ different experimental replicates. The design in Figure 2 was replicated twice by Zou et al. [ 20 ], the second replication being of the exact same dye assignment and hybridization orientation as the first, for a total of 28 hybridizations. Hence, there were 20 samples (pools of leaves) utilized in the experiment, 2 per each of the 9 inoculate by time treatment groups plus 2 samples for the N2 control. Figure 2: Experimental design for one replicate from Zou et al. (2005). Treatments included a full factorial of inoculate and time effects plus a 10th null control group at time 2 (N2). Samples indicated by circles with letters indicating inoculate assignment: bacteria resistant (R), a bacteria susceptible (S), and MgCl 2 (M) control inoculate and numbers indicating time (2, 8, or 24 hours) after inoculation. Each arrow represents a single array hybridization with circle base representing the Cy3 labeled sample and tail end representing the Cy5 labeled sample. Solid arrows refer to the A-loop design of Landgrebe et al. (2006). We arbitrarily consider gene expression measurements for just one particular gene based on the GEO submission from Zou et al. [ 20 ]: ID_REF #30 located in the metarow-metacolumn-row-column location 1-1-2-14 of each array from microarray platform GPL 1013, one of three different platforms used by Zou et al. [ 20 ] and further described in GEO. The statistical analysis of any of the remaining 27 683 genes that were spotted once on each slide across the three different platforms would be exactly the same as that for ID_REF #30, at least for those genes where no observations would be edited out for poor data quality. We use the normalized Cy3 and Cy5 data, provided as fields S532N and S635N in accession number GSE 2961 for ID_REF #30 from GEO. Hence, for the 28 hybridizations considered for two replications of Figure 2 , there were 56 fluorescence intensities (28 Cy3 and 28 Cy5) for each gene. The 56 fluorescence intensities for ID_REF #30, as retrieved from GSE 2961 in GEO, are reproduced in Table 4 . 3.2. Statistical Model For the purposes of this review, we concentrate our attention just on the subdesign characterized by the solid arrows in Figure 2 that connect the three primary inoculates (R,S, and M) together within each of the 3 different times (2, 8, and 24 hours). The remaining dashed lines in Figure 2 involve either the 10th group (N2) or connect adjacent times (2 with 8 and 8 with 24) within each of two inoculates (R and S); note that no hybridizations connecting any of the three times within inoculate M were provided with GSE 2961 on GEO. Labeling inoculate type as Factor A and time after inoculation as Factor B, the resulting subdesign is an example of the “A-loop” design presented by Landgrebe et al. [ 9 ] as illustrated in their Figure 2 (B), albeit for a factorial treatment structure in their case. In other words, the only direct connections between the 9 treatment groups within arrays involve comparisons of levels of Factor A within levels of Factor B. Using the log intensities as the response variables for further statistical analysis, an appropriate linear mixed model to specify for this A-loop design would be as follows: where is the log fluorescence intensity pertaining to the l th biological replicate assigned to the th inoculate and th time labeled with the k th dye ( k = 1 or 2), and hybridized to array within the th time. Here, μ is the overall mean, is the effect of the i th inoculate, is the effect of the j th time, is the interaction effect between the i th inoculate and j th time, and is the effect of the k th dye, all of which are defined to be fixed effects. The design structure component of ( 1 ) is defined by the random effects of for the l th pool or biological replicate ( l = 1, 2) within the ij th inoculate-time combination, for the m th array or slide within the j th time, and the residual unique to the same subscript identifiers as that for . The typical distributional assumptions in mixed models are such that each of the three sets of random effects are NIID with their own VC; that is, , , and . As clearly demonstrated by Dobbin et al. [ 31 ] and based on our experiences, dye effects should be modeled in ( 1 ), even after using global normalization procedures such as loess [ 15 ], as gene-specific dye effects are common. Nevertheless, one would not normally anticipate interaction effects between dye and other treatment factors (e.g., inoculate or time), and hence these effects are not specified in ( 1 ). It should be somewhat apparent from the A-loop design of Figure 2 why the nesting or hierarchical specifications are specified as such for the random effects. For example, although each pool or replicate is labeled twice, once with each dye, each pool is still part of or nested within the same inoculate by time combination such that samples or replicates are specified to be nested within inoculate by time. Similarly, arrays are nested within times since each array is associated with only one particular level of time; that is, different times are never directly compared or connected within arrays. Hence, one should intuitively recognize from Figure 2 that there would be greater precision for inferring upon inoculate effects than for time effects using the A-loop design. That is, the variability due to arrays is completely confounded with time differences such that it partly defines the experimental unit or replicate for time. 3.3. Classical ANOVA The complex nature of different levels of replication in the A-loop this design is further confirmed in the classical analysis of variance or ANOVA [ 21 ] for this design in Table 1 . However, as demonstrated later, classical ANOVA is not necessarily equivalent to a more optimal GLS or mixed model analysis [ 32 ]; in fact, estimates of treatment effects based on classical ANOVA are simply equivalent to OLS estimates where all factors are treated as fixed. Nevertheless, the classical ANOVA table, when extended to include expected mean squares (EMS), is instructive in terms of identifying different levels of replication and hence experimental error. Table 1: Classical ANOVA of log intensities for duplicated A-loop design component of Figure 2 for any particular gene using ( 1 ). Classical ANOVA is based on equating sums of squares (SS), also called quadratic forms, to their expectations; typically this involves equating mean squares (MS), being SS divided by their degrees of freedom (ν), to their EMS. For completely balanced designs, there is generally one universal manner in which these quadratic forms, and hence the ANOVA table, are constructed [ 19 , 22 ]. However, for unbalanced designs, such as all of or even just the A-loop component of Figure 2 , there are a number of ways of constructing different quadratic forms and hence different ways of constructing ANOVA tables for the same set of data [ 21 , 32 ]. The most common ANOVA strategy is based on the use of type III quadratic forms as in Table 1 whereby the SS for each factor is adjusted for every other factor in the model. More details on type III and alternative ANOVA quadratic forms for unbalanced data can be found in Milliken and Johnson [ 21 ] and Searle [ 33 ]. Table 1 conceptually illustrates the basic components of an ANOVA table; again, for every term, say X, in a statistical model like ( 1 ), there is a sum of squares , degrees of freedom , mean square , and expected mean square . Generally, ANOVA tests on fixed effects are of greatest interest; for example, inoculate, time, and inoculate by time interaction. The correct F ratio test statistic for any fixed effects term in the ANOVA table is constructed such that its MS and a denominator MS have the same EMS if the null hypothesis is true; that is, that there are truly no effects for that particular term. In statistical parlance, no effects for a term , whether that pertains to the main effects of a factor or the interaction effects between two or more factors, is synonymous with its corresponding noncentrality parameter being equal to zero; that is, there is no signal due to that model term [ 32 ]. Consider, for example, the test for the main effects of inoculate denoted as Factor A in Table 1 . If the main effects of inoculate are nonexistent, that is, there are no overall or marginal mean differences between any of the inoculates, then . It should be clearly noted that when , the EMS for inoculate matches with the EMS for replicate within inoculate and time, denoted as rep(inoculate * time) in Table 1 . In other words, rep(inoculate * time) is said to be the denominator or error term for the main effects of inoculate such that rep(inoculate * time) defines the experimental unit or the biological replicate for inoculate effects. Hence, the correct F statistic for testing inoculate effects, as demonstrated from Table 1 , is based on numerator and denominator degrees of freedom. It should also be observed that this same error term or experimental unit would be specified as the denominator MS term for the ANOVA -test on inoculate by time interaction effects, denoted as inoculate * time in Table 1 . That is, when the corresponding noncentrality parameter , both inoculate * time and rep(inoculate * time) share the same EMS such that the correct F statistic for testing this interaction is based on numerator and denominator degrees of freedom. It was previously noted from the A-loop design of Figure 2 that inference on the main effects of time (Factor B) should be less precise than that for the main effects of inoculate. In other words, the size of the experimental unit should be larger for time effects since arrays are nested within levels of time whereas levels of inoculate treatments are directly compared within arrays. This is further demonstrated in Table 1 by the EMS for time with , being larger than that for inoculate effects with , under the corresponding true null hypotheses of no main effects for either factor. In fact, the experimental error term for time is composite of both rep(inoculate * time) and arrays(time) such that marginal mean comparisons between the three times, 2, 8, and 24 hours, will be affected by more noise than marginal mean comparisons between the three inoculates which were directly and indirectly connected within arrays. Note that under the null hypothesis of no time effects , there is no one other MS that shares the same EMS that would allow one to readily construct an ANOVA F -statistic for the main effects of time. Satterthwaite [ 34 ] provided a solution to this problem by proposing the “synthesis” of a denominator MS , call it , as being a linear combination of q random effects MS: where are known coefficients such that has the same expectation as that for a certain model term having mean square under the null hypothesis . Then is approximately distributed as a random variable from a central F distribution with numerator and denominator degrees of freedom, where with denoting . In our example, consider the synthesized as being a linear combination of the MS for rep(inoculate * time), array(time), and residual. With reference to ( 2 ), is then a linear function of different MS with , , and . Using the EMS for these three MS provided from Table 1 as , , and , respectively, it should be readily seen that the expectation of is then That is, shares the same EMS as that for time in Table 1 when . Hence, a suitable F statistic for inferring upon the main effects of time would be . To help further illustrate these concepts, let us conduct the ANOVA on the data generated from the A-loop design of Figure 2 for ID_REF #30 from Zou et al. [ 20 ]; that is, using data from arrays 1–9 and 15–23 as provided in Table 4 . The classical ANOVA table using the method=type3 option of the popular mixed-model software SAS PROC MIXED [ 35 ] for that particular gene is provided in Table 2 ; SAS code for all statistical analysis presented in this paper is provided in Figure 3 and also available for download, along with the data in Table 4 , from http://www.msu.edu/~tempelma/ijpg2008.sas . As noted previously, the correct denominator MS term for testing the main effects of inoculate is replicate within inoculate by time. Hence, the corresponding F statistic = = , with numerator and denominator degrees of freedom leading to a P -value of 0.1172. Similarly, for the inoculate * time interaction, the appropriate F-test statistic is = , with numerator and denominator degrees of freedom leading to a P -value of 0.3435. Even without considering the control of false discovery rates (FDRs) that involve the joint control of type I errors with respect to the remaining 27 683 genes, it seems apparent that neither the main effects of inoculate nor the interaction between inoculate and time would be statistically significant for gene ID_REF #30. Table 2: Classical ANOVA of log intensities for duplicated A-loop design component of Figure 2 on ID_REF #30 from Zou et al. (2005) using output from SAS PROC MIXED (code in Figure 3 ). Figure 3: SAS code for classical ANOVA and EGLS inference. Comments describing purpose immediately provided after corresponding code between / * and * / as with a regular SAS program. EGLS based on REML would simply involve substituting method = reml for method = type3 in the third line of the code. The synthesized denominator for time effects is . The estimated degrees of freedom for this synthesized MS using ( 3 ) is then Hence, the main effects of time, appropriate F -test statistic is = , with numerator and denominator degrees of freedom leading to a P -value of 0.0683 as also reported in the SAS output provided in Table 2 . 3.4. Mixed Model Analysis Although the classical ANOVA table is indeed instructive in terms of illustrating the different levels of variability and experimental error, it is not the optimal statistical analysis method for a mixed effects model, especially when the design is unbalanced. A mixed-model or GLS analysis more efficiently uses information on the design structure (i.e., random effects) for inferring upon the fixed treatment structure effects [ 27 , 32 ]. Unfortunately, GLS, in spite of its optimality properties, is generally not attainable with real data because the VC (e.g., , , and ) must be known. Hence, the VC must generally be estimated from the data at hand. There are a number of different methods that are available for estimating VC in mixed models [ 22 ]. The classical ANOVA method is based on equating MS to their EMS in the ANOVA table. For example, using the bottom row of Table 1 , the EMS of MSE is . So then using the numerical results for ID_REF #30 from Table 2 , the type III ANOVA estimate of is simply . Now work up one row further in Table 1 to the term array(time). Equating from the same corresponding row in Table 2 to its EMS of using gives . Finally, work up one more (i.e., third to last) row in both tables. Equating from Table 2 to its EMS of using leads to . So array variability is estimated to be roughly four times larger than the biological variability which, in turn, is estimated to be somewhat larger than residual variability for ID_REF #30. Recall that with unbalanced designs, quadratic forms are not unique such that ANOVA estimators of VC will not be unique either. Nevertheless, type III quadratic forms are most commonly chosen as then the SS for each term is adjusted for all other terms, as previously noted. Although ANOVA estimates of VC are unbiased, they are not efficient nor optimal in terms of estimates having minimum standard error [ 25 ]. Restricted maximum likelihood (REML) is a generally more preferred method of VC estimation [ 22 , 36 , 37 ] and is believed to have more desirable properties. Nevertheless, the corresponding REML estimates , and for ID_REF #30 are in some qualitative agreement with the previously provided ANOVA estimates. Once the VCs are estimated, they are substituted for the true unknown VCs to provide the “estimated” GLS or EGLS of the fixed effects. It is important to note that typically EGLS = GLS for balanced designs, such that knowledge of VC is somewhat irrelevant for point estimation of treatment effects. However, the same is generally not true for unbalanced designs, such as either the A-loop design derived from Figure 2 or even the interwoven loop design from Figure 1 . Hence, different methods of VC estimation could lead to different EGLS estimates of treatment effects as we demonstrate later. Suppose that it was of interest to compare the various mean responses of various inoculate by time group combinations in the duplicated A-loop design example. Based on the effects defined in the statistical model for this design in ( 1 ), the true mean response for the i th inoculate at the j th time averaged across the two dye effects ( and ) can be written as If the levels are, say, ordered alphanumerically, the mean difference between inoculate and at time (2 hours) is specified as . Using ( 6 ), this difference written as a function of the model effects is then = . Similarly, the mean difference between time (2 hours) and time (8 hours) for inoculate could be derived as Note that these two comparisons or contrasts can be more elegantly written using matrix algebra notation. A better understanding of contrasts is useful to help determine the correct standard errors and statistics used to test these contrasts, including how to write the corresponding SAS code. Hence, a matrix algebra approach to hypothesis testing on contrasts is provided in Appendix 5 that complements the SAS code provided in Figure 3 . For now, however, we simply use the “hat” notation in referring to the EGLS estimates of these two contrasts as and respectively. As we already intuitively noted from the A-loop design of Figure 2 , inference on should be much more precise than that for since inoculates are compared within arrays whereas times are not. This distinction should then be reflected in a larger standard error for than for . Indeed, using the REML estimates of VC for EGLS inference, this is demonstrated by whereas for ID_REF #30. However, these standard errors are actually slightly understated since they do not take into account the uncertainty of the VC estimates as discussed by Kackar and Harville [ 38 ]. Kenward and Roger [ 39 ] derive a procedure to take this uncertainty into account which is part of the SAS PROC MIXED implementation using the option ddfm=kr [ 35 ] as indicated in Figure 3 . Invoking this option raises the two standard errors accordingly, albeit very slightly, to and . Now, the denominator degrees of freedom for inference on these two contrasts should also differ given that the nature of experimental error variability somewhat differs for inoculate comparisons as opposed to time comparisons as noted previously from Figure 2 . However, with EGLS, there are no SS and hence no corresponding MS or EMS expression for each main effects or interaction term in the model, such that determining the correct test statistic and degrees of freedom is somewhat less obvious than with the previously described classical ANOVA approach [ 32 ]. Giesbrecht and Burns [ 40 ] introduced a procedure for estimating the denominator degrees of freedom for EGLS inference which, again, is invoked with the ddfm=kr option of SAS PROC MIXED. Using this option along with REML estimation of VC for the analysis of ID_REF #30, the estimated degrees of freedom for is 5.28 whereas that for is 17.0. Contrasts are also used in EGLS to provide ANOVA-like F tests for the overall importance of various fixed effects; more details based on the specification of contrast matrices to test these effects are provided in Appendix 5 . For example, denote the marginal or overall mean of inoculate i averaged across the 3 times and 2 dyes as . The numerator degrees of freedom hypothesis test for the main effects of inoculates can be written as a combination of two complementary contrasts (A1) and (A2) ; that is, if both contrasts are 0, then obviously is also true such that then is true. Similarly, let us suppose that one wished to test the main effects of times (Factor B). Then, it could be readily demonstrated that the corresponding hypothesis test can also be written as a combination of complementary contrasts: (B1) and (B2) , where denotes the marginal mean for the j th level of Factor B; that is, the j th time. If both component hypotheses (B1) and (B2) are true, then is also true thereby defining the composite numerator degrees of freedom hypothesis test for the main effects of Factor B. Now the interaction between inoculate and time is a numerator degrees of freedom test as previously noted from Tables 1 and 2 , suggesting that there are 4 complementary contrasts that jointly test for the interaction of the two factors. Of course, it is also well known that the interaction degrees of freedom is typically the product of the main effects degrees of freedom for the two factors considered. Two of the four degrees of freedom for the interaction involve testing whether or not the mean difference between inoculates 1 and 3 is the same within time 1 as it is within time 3, that is, (AB1) , and whether or not the mean difference between inoculates 2 and 3 is the same within time 1 as it is within time 3; that is, (AB2) . If both hypotheses (AB1) and (AB2) are true then it should be apparent that is also true; that is, the mean difference between inoculates 1 and 2 is the same within time 1 as it is within time 3. The remaining two degrees of freedom for the interaction involve testing whether or not the mean difference between inoculates 1 and 3 is the same within time 2 as it is within time 3; that is, (AB3) , and whether or not the mean difference between inoculates 2 and 3 is the same within time 2 as it is within time 3; that is, (AB4) . If both hypotheses (AB3) and (AB4) are true then is also true. Hence, contrasts AB1, AB2, AB3, and AB4 completely define the four components or numerator degrees of freedom for the interaction between Factors A and B. That is, the test for determining whether or not the mean differences between all levels of A are the same within each level of B, and vice versa, can be fully characterized by these four complementary contrasts. The EGLS statistics used for testing the overall importance of these main effects or interactions are approximately distributed as F -random variables with the numerator degrees of freedom defined by the number of complementary components or contrasts as previously described; refer to Appendix 5 and elsewhere [ 27 , 32 , 35 ] for more details. Now, the denominator degrees of freedom for each contrast are dependent upon the design and can be determined based on that using classical ANOVA as in Table 1 or by a multivariate extension of the Satterthwaite-based procedure proposed by Fai and Cornelius [ 41 ]; again this option is available as ddfm=kr using SAS PROC MIXED (Figure 3 ). Unfortunately, much available software used for mixed model analysis of microarray data does not carefully take into consideration that various fixed effects terms of interest may have different denominator degrees of freedom when constructing F test statistics. In fact, a typical strategy of such software is to assume that (i.e., the residual degrees of freedom) is the denominator degrees of freedom for all tests. This strategy is denoted as the “residual” method for determining denominator degrees of freedom by Spilke et al. [ 36 ] who demonstrated using simulation work that the use of the residual method can substantially inflate type I error rate for EGLS inference on fixed effects; in other words, the number of false-positive results or genes incorrectly declared to be differentially expressed between treatments would be unduly increased. Spilke et al. [ 36 ] further demonstrated that use of the Kenward-Rogers’ method for degrees of freedom estimation and control for uncertainty on VC provided best control of the nominal confidence interval coverage and type I error probabilities. 3.5. Impact of Method of Variance Component Estimation on EGLS It was previously noted that the estimated standard errors for EGLS on two contrasts and were and , respectively, when REML was used to estimate the variance components for ID_REF #30. If the VC estimates are computed using type III ANOVA, then these estimated standard errors would differ accordingly; that is, and , respectively. What perhaps is even more disconcerting is that the estimates of and also differ between the two EGLS inferences; for example, using REML, and whereas using ANOVA and . The overall EGLS tests for ID_REF #30 for testing the main effects of inoculate, time and their interaction as based on the previously characterized complementary contrasts are provided separately for ANOVA versus REML estimates of VC in Table 3 ; this output is generated as type III tests using the SAS code provided in Figure 3 . From here, it should be clearly noted that conclusions upon the overall importance of various fixed effects terms in ( 1 ) as derived from EGLS inference subtly depend upon the method of VC estimation; for example, the EGLS P -values in Table 3 tend to be several points smaller using ANOVA compared to REML; furthermore, note the differences in the estimated denominator degrees of freedom between the two sets. Naturally, this begs the question as to which method of VC estimation should be used? Table 3: EGLS inference on overall importance of fixed effects for ID_REF #30 based on REML versus ANOVA (type III quadratic forms) for estimation of variance components using output from SAS PROC MIXED (code in Figure 3 ). Table 4: Dataset for ID_REF #30 for all hybridizations (14 arrays/loop x2 loops) in Figure 1 for each of two replicates per 10 inoculate by time groups, fluorescence intensities provided as y, log(base 2) intensities provided as ly. In completely balanced designs, ANOVA and REML lead to identical estimates of VC and identical EGLS inference, provided that all ANOVA estimates of VC are positive. ANOVA estimates of VC that are negative are generally constrained by REML to be zero, thereby causing a “ripple” effect on REML estimates of other VC and subsequently on EGLS inference [ 42 ]. As noted previously, REML does tend to outperform most other methods for various properties of VC estimation [ 37 ]. Furthermore, there is evidence that EGLS based on ANOVA leads to poorer control of type I error rate for inference on fixed effects compared to EGLS based on REML in unbalanced data structures [ 36 ]. However, Stroup and Littell [ 42 ] concluded that EGLS using REML may sometimes lead to inference on fixed effects that is too conservative (i.e., actual error rates less than nominal type I error rate) again due to the nonnegative REML restrictions on the VC estimates and associated ripple effects. This issue warrants further study given that it has implications for control of FDR which are most commonly used to control the rate of type I errors in microarray studies [ 43 ]. Estimation of FDR inherently depends upon the distribution of P -values for treatment effects across all genes such that even mild perturbations on this distribution have potential bias implications for control of false-positive rates. 4. Other Issues for the Design Analysis Interface 4.1. Log Ratio versus Log Intensity Modeling Recent work on the optimization and comparison of various efficient microarray designs have been based on the assumption of OLS inference; that is, no random sources of variability other than residuals are considered [ 2 , 8 , 9 , 13 ]. While this observation may seem to be counterintuitive given that the arguments laid out in this review for the need of (E)GLS to analyze efficient designs, it is important to note at least a couple of things. First, virtually all of the work on design optimization has been based on the assumption that a sample or pool is used only once; the corresponding interwoven loop designs in such cases [ 13 ] have been referred to as classical loop designs [ 10 , 19 ]. However, sometimes two or more aliquots from each sample are used in separate hybridizations [ 20 , 23 ] such as the A-loop design, example used in this review; the corresponding designs are connected loop designs [ 10 , 19 ] that require the specification of random biological replicate effects separate from residual effects as previously noted. Secondly, almost all of the design optimization work has been based on the use of Cy3/Cy5 log ratios as the response variables rather than dye-specific log intensities as used in this review. This data reduction, that is, from two fluorescence intensities to one ratio per spot on an array, certainly eliminates array as a factor to specify in a linear model. However, the use of log ratios can severely limit estimability and inference efficiency of certain comparisons. Suppose that instead of using the 36 log intensities from the duplicated A-loop design from arrays 1–9 and 15–23 of Table 4 , we used the derivative 18 Cy3/Cy5 log ratios as the response variables. For example, the two corresponding log 2 Cy3 and Cy5 fluorescence intensities for array 1 from Table 4 are 13.9946 and 14.3312. The Cy3/Cy5 log ratio is then the difference or 0.3316 corresponding to a fold change of . Using log ratios as their response variables, Landgrebe et al. [ 9 ] concluded that it was impossible to infer upon the main effects of Factor B (e.g., time) in the A-loop design. However, as we demonstrated earlier, it is possible to infer upon these effects using ANOVA or EGLS analysis on the log intensities. Jin et al. [ 18 ] similarly illustrate the utility of log intensity analysis in a split plot design that would not otherwise have been possible using log ratios. Milliken et al. [ 14 ] provide much more extensive mixed modeling details on the utility of log intensity analysis in nested or split-plot microarray designs similar to the A-loop design. The relative efficiency of some designs may be seen to depend upon the relative magnitude of biological to technical variation [ 10 , 44 ]; sometimes it is only possible to separately estimate these two sources of variability using log intensities rather than log ratios thereby requiring the use of (E)GLS rather than OLS. In fact, analysis of log intensities using mixed effects model appears to be not only more flexible than log-ratio modeling but is statistically more efficient in recovering more data information [ 1 , 45 ]. That is, as also noted by Milliken et al. [ 14 ], treatment effects are more efficiently estimated by combining intraarray and interarray information in a mixed model analysis when an incomplete block design is used, and arrays are explicitly included as random effects by analyzing log intensities rather than log ratios. 4.2. Choosing between Efficient Experimental Designs Using Mixed Models There are a number of different criteria that might be used to choose between different designs for two-color microarrays. We have already noted that the interwoven loop design in Figure 1 is A-optimal for pairwise comparisons between 9 treatment groups. A-optimality has been criticized for microarray studies because it chooses designs with improved efficiency for certain contrasts at the expense of other perhaps more relevant contrasts and further depends upon the parameterization of the linear model [ 1 , 6 , 9 ]; other commonly considered types of optimality criteria are possible and further discussed by Wit et al. [ 13 ] and Landgrebe et al. [ 9 ]. At any rate, it is somewhat possible to modify A-optimality to explicitly take into account a particular set of scientific questions [ 13 ]; furthermore, optimization with respect to one criterion will generally be nearly optimal for others. For one particular type of optimality criterion, Landgrebe et al. [ 9 ] demonstrated that the A-loop design has the best relative efficiency compared to other designs for inference on the main effects of Factor A and the interaction effects between A and B although the main effects of Factor B could not be inferred upon using an analysis of log ratios as previously noted. How does the A-loop design of Figure 2 generally compare to the interwoven loop design of Figure 1 if a factorial treatment structure is imposed on the 9 treatments as implied by the same labels as used in Figure 2 ? Suppose that Figure 1 is a connected interwoven loop design [ 10 ] in the sense that the outer loop of Figure 1 (dashed arrows) connects one biological replicate for each of 9 groups whereas the inner loop of Figure 1 (solid arrows) connects a second biological replicate for each of the 9 groups. Then this design would consume 18 biological replicates and 18 arrays, thereby providing a fair comparison with the duplicated A-loop design of Figure 2 . Recall that Figure 1 is A-optimized for pairwise comparisons between all 9 groups. It is not quite clear what implications this might have for statistical efficiency for the constituent main effects of , , and the effects of their interaction ; note, incidentally, that these degrees of freedom independently sum to 8 as required for 9 groups. As duly noted by Altman and Hua [ 1 ], pairwise comparisons between all 9 groups may be not as important as various main effects or interaction contrasts with a factorial treatment structure arrangement. Although, as noted earlier, Figure 1 is symmetric with respect to the treatment labels, the classical ANOVA table for this interwoven loop design would be even more complicated (not shown) than that presented for the A-loop design since there is not one single denominator MS that would serve as the experimental error term for inoculate, time or inoculate by time effects! One should perhaps compare two alternative experimental designs having the same factorial treatment structure, but a different design structure, for contrasts of highest priority, choosing those designs where such contrasts have the smaller standard error. Let us consider the following comparisons: , , and that is, respectively, the overall mean difference between inoculates 1 and 3, the overall mean difference between times 1 and 3, and the interaction component pertaining to the difference between inoculates 1 and 3 within time 1 versus that same difference within time 3. Recall that these contrasts were components of the EGLS tests on the two sets of main effects and the interaction and previously labeled as (A1), (B1), and (AB1), respectively. Now the comparison of efficient designs for the relative precision of various contrasts will generally depend upon the relative magnitude of the random effects VC as noted recently by Hedayat et al. [ 44 ] and for various microarray design comparisons [ 10 ]. Suppose the “true” variance components for , , and were 0.03, 0.06, and 0.25, comparable to either set of estimates provided previously on ID_REF #30 from Zou et al. [ 20 ]. The linear mixed model for analyzing data generated from Figure 1 would be identical to that in ( 1 ) except that arrays would no longer be specified as being nested within times. For the interwoven loop design of Figure 1 , the standard errors for each of the three contrasts are , , and whereas for the A-loop subdesign of Figure 2 , the corresponding standard errors are , , and . So whereas the optimized design in Figure 1 using Wit et al. [ 13 ] provided a substantial improvement for the estimation of overall mean time differences, the A-loop design is indeed more efficient for inferring upon the main effects of inoculate and the interaction between inoculate and time. Hence, the choice between the two designs would reflect a matter of priority for inference on the various main effects and their interactions. It should be carefully noted as demonstrated by Tempelman [ 10 ], that designs leading to lower standard errors for certain comparisons do not necessarily translate to greater statistical power as the denominator degrees of freedom for various tests may be substantially different between the two designs. 4.3. Unbalanced Designs and Shrinkage Estimation Shrinkage or empirical Bayes (EB) estimation is known to improve statistical power for inference on differential gene expression between treatments in microarray experiments [ 46 ]. Shrinkage-based estimation is based on the well-established hierarchical modeling concept that more reliable inferences on gene-specific treatment differences are to be attained by borrowing information across all genes [ 47 , 48 ]. Typically, such strategies have involved improving estimation of standard errors of gene-specific treatment differences by “shrinking” gene-specific variances towards an overall mean or other measure of central tendency. However, most shrinkage estimation procedures have been developed for fixed effects models, that is, for simple experimental designs having a treatment structure but no or very limited design structure, or even treating all design structure factors as fixed [ 30 ]. Currently popular shrinkage estimation procedures [ 49 – 51 ] are certainly appropriate for many designs based on one-color Affymetrix systems or for common reference designs. Other proposed shrinkage procedures have facilitated extensions to very special cases of nested designs [ 47 ], including some based on rather strong modeling assumptions such as a constant correlation of within-array replicate spots across all genes [ 52 ] or a design structure facilitating the use of permutation testing [ 29 ]. However, virtually none of the procedures proposed thus far are well adapted to handle unbalanced designs such as the A-loop design where different sizes of experimental units need to be specified for different treatment factors; hence investigators should proceed with caution when using shrinkage estimation for unbalanced mixed-model designs. 5. Conclusions We have provided an overview of the use of mixed linear model analysis for the processing of unbalanced microarray designs, given that most efficient incomplete block designs for microarrays are unbalanced with respect to various comparisons. We strongly believe that much mixed-model software currently available for the analysis of microarrays does not adequately address the proper determination of error terms and/or denominator degrees of freedom for various tests. This would be particularly true if we had chosen to analyze all of the data for ID_REF #30 in Table 4 from Zou et al. [ 20 ] based on all of the hybridizations depicted in Figure 2 . Even then, the size of the standard errors and estimated degrees of freedom would still be seen to be somewhat different for estimating the main effects of times compared to estimating the main effects of inoculates given the lower degree of within-array connectivity between the various levels of time as illustrated in Figure 2 . If inferences on various comparisons of interest are not conducted correctly in defining a list of differently expressed genes, all subsequent microarray analysis (e.g., FDR estimates, gene clustering, gene class analysis, etc.) are absolutely futile. We believe that it is useful to choose proven mixed-model software (e.g., SAS) to properly conduct these tests and, if necessary, to work with an experienced statistician in order to do so. We have concentrated our attention on the analysis of a particular gene. It is, nevertheless, straightforward to use SAS to serially conduct mixed-model analysis for all genes on a microarray [ 53 ]; furthermore, SAS JMP GENOMICS ( http://www.jmp.com/software/genomics/ ) provides an even more powerful user interface to the mixed model analysis of microarray data. Appendix C. Matrix Representation of The Mixed Model Analysis of The A-loop Design of Zou Et Al. Any mixed model, including that specified in ( 1 ), can be written in matrix algebra formml: Here is the vector of all data, is the vector of all fixed effects (e.g., inoculate, time, dye, and inoculate by time interaction effects), u is the vector of all random effects (e.g., arrays and sample within inoculate by time effects), and is the vector of random residual effects. Furthermore, X and Z are corresponding incidence matrices that specify the treatment and design structure of the experiment, thereby linking the treatment and design effects, and u , respectively, to y . Note that y has a dimension of for the duplicated A-loop design of Zou et al. [ 20 ]. Now and u can be further partitioned into the effects as specified in ( 1 ); for our example, such that is a vector of fixed effects; that is, there are 18 elements in ( A.2 ). Furthermore, can be similarly partitioned into a vector of random effects, , for replicates within inoculate by time and another vector of random effects, , for arrays within time; that is, there are a total of 18 biological replicates and 18 arrays in the study, each characterized by a random effect. Note that it is coincidence that the row dimensions of , , and are all 18 for this particular example design. Again, the distributional assumptions on the random and residual effects are specified the same as in the paper but now written in matrix algebra notation: , , and with denoting a vector of zeros and denoting an identity matrix of dimension . Reasonably assuming that and are pairwise independent of each other (i.e., biological sample effects are not influenced by array effects and vice versa), then the variance-covariance matrix G of u is a diagonal matrix with the first 18 diagonal elements being and the remaining 18 diagonal elements being . The GLS estimator, , of can be written [ 22 , 32 ] as with its variance-covariance matrix defined by such that denotes the generalized inverse of . Once the VC are estimated, they are substituted for the true unknown VC in V to produce which are then used to provide the “estimated” GLS or EGLS , of : As noted in the text, typically (i.e., EGLS = GLS) for balanced designs but not necessarily for unbalanced designs, such as those depicted in Figures 1 or 2 . It was previously noted in the paper that the mean difference between inoculate and at time as could be written as a function of the model effects in ( 1 ) as . Similarly, the mean difference between time and time for inoculate i could be written as . These two contrasts written in matrix notation as and , respectively, where are contrast vectors whose coefficients align in order with the elements of in ( A.2 ). For example, note from ( A.6 ) that the nonzero coefficients of 1, 1, 1, and 1 occur within the 2nd, 3rd, 8th, and 11th positions of , respectively. When these coefficients are multiplied in the same order with the 2nd, 3rd, 8th, and 11th elements of provided in ( A.2 ), one gets which is indeed = as specified previously. The reader should be able to make a similar observation for in considering how the nonzero elements of ( A.7 ) align in position with elements of in ( A.2 ) to produce . In Figure 3 , SAS PROC MIXED is used to provide the estimates, standard errors, and test statistics for these two contrasts. That is, note how all of the elements from ( A.6 ) and ( A.7 ) are completely reproduced in the estimate statements as “k1 contrast” and “k2 contrast,” respectively, in Figure 3 . Now, when the VC are known, these two contrasts can be estimated by their GLS, and . Furthermore, using ( A.4 ), the true standard errors of these two estimates can be determined as and , respectively. However, as previously noted, the VC are generally not known but must be estimated from the data such that the two contrasts are typically estimated using and with approximate standard errors determined by and . Using the REML estimates of VC as provided in the paper, the code from Figure 3 can be executed to provide whereas for ID_REF #30 by simply changing method=type3 to method=reml and by deleting ddfm=kr . However, these standard errors are actually slightly understated since they do not take into account the uncertainty of the VC or as an estimate of as discussed by Kackar and Harville [ 38 ]. Kenward and Roger [ 39 ] derive a procedure to take this uncertainty into account and which is part of the SAS PROC MIXED implementation using the ddfm=kr option [ 35 ] as specified in Figure 3 . Invoking this option raises the two standard errors accordingly, albeit very slightly, to and . Furthermore, the ddfm=kr option invokes the procedure of Giesbrecht and Burns [ 40 ] to estimate the denominator degrees of freedom for EGLS inference. Using this option and REML, the estimated degrees of freedom for is 5.28 whereas that for is 17.0 as would be noted from executing the SAS code in Figure 3 . The corresponding SAS output will furthermore include the t -test statistics for the two contrasts as and . These statistics when compared to their Student distributions with their respective estimated degrees of freedom, 5.28 and 17.0, lead to P -values of 0.66 and 0.37, respectively; that is, there is no evidence that either contrast is statistically significant. Contrast matrices on can be used to derive ANOVA-like F tests for the overall importance of various fixed effects using EGLS. Recall from the paper that the test for the main effects of inoculates can be written as a joint function of contrasts and where with μ ij. is defined as in ( 6 ). These two contrasts, labeled as (A1) and (A2) in the paper, can be jointly written together as a linear function of the elements of in ( A.2 ), where For example, the first row of specifies the coefficients for testing as a function of the elements of using ( 6 ). In other words, matching up, in order, the first row of in ( A.8 ) with the elements of in ( A.2 ), the corresponding contrast can be rewritten as . Similarly, the second row of in ( A.8 ) specifies the contrast coefficients for as a function of the elements of . Recall that the main effects of times (Factor B) involves a joint test of contrasts and labeled as (B1) and (B2) in the paper, where . In terms of the elements of in ( A.2 ), these two contrasts are jointly specified as with That is, ( A.9 ) is another 2 18 contrast matrix, just like , where the two rows of specify the coefficients for the contrasts and , respectively, as a function of the elements of in ( 6 ). Recall that the interaction between the effects of inoculates and times was numerator degrees of freedom test based on jointly testing four complementary and independent contrasts, suggesting that there are four rows that determine the corresponding contrast matrix. The complete interaction contrast can then be written as where Note that the 4 rows in ( A.10 ) specify contrast coefficients on the model effects for each of the 4 constituent component hypotheses, (AB1), (AB2), (AB3), and (AB4) as defined in the paper, when aligned with the coefficients of in ( A.2 ). As a sidenote, the somewhat uninteresting contrast for dye effects could be written using a contrast vector (not shown) in order to test the overall mean difference between the two dyes. The EGLS test statistic for testing the overall importance of any fixed effects term, say , is specified as . Here is distributed as an F -random variable under with the numerator degrees of freedom being defined by the number of rows of the contrast matrix [ 27 , 32 , 35 ]. The denominator degrees of freedom for each contrast is dependent upon the design and can be determined based on that using classical ANOVA as in Table 1 or a multivariate extension of the Satterthwaite-based procedure from Giesbrecht and Burns [ 40 ] as proposed by Fai and Cornelius [ 41 ]; again this option is available as ddfm=kr using SAS PROC MIXED (Figure 3 ). The corresponding EGLS ANOVA output for ID_REF #30, based on either ANOVA or REML estimation of VC, is provided in Table 3 . Acknowledgment Support from the Michigan Agricultural Experiment Station (Project MICL 1822) is gratefully acknowledged. <h4>References</h4> N. S. Altman and J. Hua, “ Extending the loop design for two-channel microarray experiments ,” Genetical Research , vol. 88, no. 3, pp. 153–163, 2006. F. Bretz, J. Landgrebe, and E. Brunner, “Design and analysis of two-color microarray experiments using linear models,” Methods of Information in Medicine , vol. 44, no. 3, pp. 423–430, 2005. J. S. S. Bueno Filho, S. G. Gilmour, and G. J. M. Rosa, “ Design of microarray experiments for genetical genomics studies ,” Genetics , vol. 174, no. 2, pp. 945–957, 2006. F.-S. Chai, C.-T. Liao, and S.-F. Tsai, “ Statistical designs for two-color spotted microarray experiments ,” Biometrical Journal , vol. 49, no. 2, pp. 259–271, 2007. K. Dobbin, J. H. Shih, and R. Simon, “ Questions and answers on design of dual-label microarrays for identifying differentially expressed genes ,” Journal of the National Cancer Institute , vol. 95, no. 18, pp. 1362–1369, 2003. G. F. V. Glonek and P. J. Solomon, “ Factorial and time course designs for cDNA microarray experiments ,” Biostatistics , vol. 5, no. 1, pp. 89–111, 2004. S. Gupta, “ Balanced factorial designs for cDNA microarray experiments ,” Communications in Statistics: Theory and Methods , vol. 35, no. 8, pp. 1469–1476, 2006. K. F. Kerr, “Efficient 2 k factorial designs for blocks of size 2 with microarray applications,” Journal of Quality Technology , vol. 38, no. 4, pp. 309–318, 2006. J. Landgrebe, F. Bretz, and E. Brunner, “ Efficient design and analysis of two colour factorial microarray experiments ,” Computational Statistics & Data Analysis , vol. 50, no. 2, pp. 499–517, 2006. R. J. Tempelman, “ Assessing statistical precision, power, and robustness of alternative experimental designs for two color microarray platforms based on mixed effects models ,” Veterinary Immunology and Immunopathology , vol. 105, no. 3-4, pp. 175–186, 2005. S.-F. Tsai, C.-T. Liao, and F.-S. Chai, “ Statistical designs for two-color microarray experiments involving technical replication ,” Computational Statistics & Data Analysis , vol. 51, no. 3, pp. 2078–2090, 2006. V. Vinciotti, R. Khanin, D. D'Alimonte, et al., “ An experimental evaluation of a loop versus a reference design for two-channel microarrays ,” Bioinformatics , vol. 21, no. 4, pp. 492–501, 2005. E. Wit, A. Nobile, and R. Khanin, “ Near-optimal designs for dual channel microarray studies ,” Journal of the Royal Statistical Society: Series C , vol. 54, no. 5, pp. 817–830, 2005. G. A. Milliken, K. A. Garrett, and S. E. Travers, “ Experimental design for two-color microarrays applied in a pre-existing split-plot experiment ,” Statistical Applications in Genetics and Molecular Biology , vol. 6, no. 1, article 20, 2007. Y. H. Yang, S. Dudoit, P. Luu, et al., “ Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation ,” Nucleic Acids Research , vol. 30, no. 4, pp. 1–10, 2002. M. K. Kerr and G. A. Churchill, “ Statistical design and the analysis of gene expression microarray data ,” Genetical Research , vol. 77, no. 2, pp. 123–128, 2001. R. D. Wolfinger, G. Gibson, E. D. Wolfinger, et al., “ Assessing gene significance from cDNA microarray expression data via mixed models ,” Journal of Computational Biology , vol. 8, no. 6, pp. 625–637, 2001. W. Jin, R. M. Riley, R. D. Wolfinger, K. P. White, G. Passador-Gurgell, and G. Gibson, “The contributions of sex, genotype and age to transcriptional variance in Drosophila melanogaster ,” Nature Genetics , vol. 29, no. 4, pp. 389–395, 2001. G. J. M. Rosa, J. P. Steibel, and R. J. Tempelman, “ Reassessing design and analysis of two-colour microarray experiments using mixed effects models ,” Comparative and Functional Genomics , vol. 6, no. 3, pp. 123–131, 2005. J. Zou, S. Rodriguez-Zas, M. Aldea, et al., “Expression profiling soybean response to Pseudomonas syringae reveals new defense-related genes and rapid HR-specific downregulation of photosynthesis,” Molecular Plant-Microbe Interactions , vol. 18, no. 11, pp. 1161–1174, 2005. G. A. Milliken and D. E. Johnson, Analysis of Messy Data, Volume I: Designed Experiments , Wadsworth, Belmont, Calif, USA, 1984. S. R. Searle, G. Casella, and C. E. McCulloch, Variance Components , John Wiley & Sons, New York, NY, USA, 1992. M. Vuylsteke, F. van Eeuwijk, P. Van Hummelen, M. Kuiper, and M. Zabeau, “Genetic analysis of variation in gene expression in Arabidopsis thaliana ,” Genetics , vol. 171, no. 3, pp. 1267–1275, 2005. X. Cui and G. A. Churchill, “How many mice and how many arrays? Replication in mouse cDNA microarray experiments,” in Methods of Microarray Data Analysis III , S. M. Lin and K. F. Johnson, Eds., pp. 139–154, Kluwer Academic Publishers, Norwell, Mass, USA, 2003. H. P. Piepho, A. Büchse, and K. Emrich, “ A hitchhiker's guide to mixed models for randomized experiments ,” Journal of Agronomy and Crop Science , vol. 189, no. 5, pp. 310–322, 2003. H. P. Piepho, A. Büchse, and C. Richter, “ A mixed modelling approach for randomized experiments with repeated measures ,” Journal of Agronomy and Crop Science , vol. 190, no. 4, pp. 230–247, 2004. J. Spilke, H. P. Piepho, and X. Hu, “ Analysis of unbalanced data by mixed linear models using the MIXED procedure of the SAS system ,” Journal of Agronomy and Crop Science , vol. 191, no. 1, pp. 47–54, 2005. D. Nettleton, “ A discussion of statistical methods for design and analysis of microarray experiments for plant scientists ,” The Plant Cell , vol. 18, no. 9, pp. 2112–2121, 2006. X. Cui, J. T. G. Hwang, J. Qiu, N. J. Blades, and G. A. Churchill, “ Improved statistical tests for differential gene expression by shrinking variance components estimates ,” Biostatistics , vol. 6, no. 1, pp. 59–75, 2005. G. K. Smyth, “ Linear models and empirical bayes methods for assessing differential expression in microarray experiments ,” Statistical Applications in Genetics and Molecular Biology , vol. 3, no. 1, article 3, 2004. K. K. Dobbin, E. S. Kawasaki, D. W. Petersen, and R. M. Simon, “ Characterizing dye bias in microarray experiments ,” Bioinformatics , vol. 21, no. 10, pp. 2430–2437, 2005. R. C. Littell, “ Analysis of unbalanced mixed model data: a case study comparison of ANOVA versus REML/GLS ,” Journal of Agricultural, Biological, and Environmental Statistics , vol. 7, no. 4, pp. 472–490, 2002. S. R. Searle, Linear Models for Unbalanced Data , John Wiley & Sons, New York, NY, USA, 1987. F. E. Sattherthwaite, “ An approximate distribution of estimates of variance components ,” Biometrics Bulletin , vol. 2, no. 6, pp. 110–114, 1946. R. C. Littell, G. A. Milliken, W. W. Stroup, R. D. Wolfinger, and O. Schabenberger, SAS for Mixed Models , SAS Institute, Cary, NC, USA, 2nd edition, 2006. J. Spilke, H.-P. Piepho, and X. Hu, “ A simulation study on tests of hypotheses and confidence intervals for fixed effects in mixed models for blocked experiments with missing data ,” Journal of Agricultural, Biological, and Environmental Statistics , vol. 10, no. 3, pp. 374–389, 2005. W. H. Swallow and J. F. Monahan, “ Monte Carlo comparison of ANOVA, MIVQUE, REML, and ML estimators of variance components ,” Technometrics , vol. 26, no. 1, pp. 47–57, 1984. R. N. Kackar and D. A. Harville, “ Approximations for standard errors of estimators of fixed and random effects in mixed linear models ,” Journal of the American Statistical Association , vol. 79, no. 388, pp. 853–862, 1984. M. G. Kenward and J. H. Roger, “ Small sample inference for fixed effects from restricted maximum likelihood ,” Biometrics , vol. 53, no. 3, pp. 983–997, 1997. F. G. Giesbrecht and J. C. Burns, “ Two-stage analysis based on a mixed model: large-sample asymptotic theory and small-sample simulation results ,” Biometrics , vol. 41, no. 2, pp. 477–486, 1985. A. H.-T. Fai and P. L. Cornelius, “ Approximate F-tests of multiple degree of freedom hypotheses in generalized least squares analyses of unbalanced split-plot experiments ,” Journal of Statistical Computation and Simulation , vol. 54, no. 4, pp. 363–378, 1996. W. W. Stroup and R. C. Littell, “Impact of variance component estimates on fixed effect inference in unbalanced linear mixed models,” in Proceedings of the 14th Annual Kansas State University Conference on Applied Statistics in Agriculture , pp. 32–48, Manhattan, Kan, USA, April 2002. J. D. Storey and R. Tibshirani, “ Statistical significance for genomewide studies ,” Proceedings of the National Academy of Sciences of the United States of America , vol. 100, no. 16, pp. 9440–9445, 2003. A. S. Hedayat, J. Stufken, and M. Yang, “ Optimal and efficient crossover designs when subject effects are random ,” Journal of the American Statistical Association , vol. 101, no. 475, pp. 1031–1038, 2006. J. P. Steibel, Improving experimental design and inference for transcription profiling experiments , thesis, Department of Animal Science, Michigan State University, East Lansing, Mich, USA, 2007. D. B. Allison, X. Cui, G. P. Page, and M. Sabripour, “ Microarray data analysis: from disarray to consolidation and consensus ,” Nature Reviews Genetics , vol. 7, no. 1, pp. 55–65, 2006. I. Lönnstedt and T. Speed, “Replicated microarray data,” Statistica Sinica , vol. 12, no. 1, pp. 31–46, 2002. G. K. Robinson, “ That BLUP is a good thing: the estimation of random effects ,” Statistical Science , vol. 6, no. 1, pp. 15–51, 1991. P. Baldi and A. D. Long, “A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes,” Bioinformatics , vol. 17, no. 6, pp. 509–519, 2001. V. G. Tusher, R. Tibshirani, and G. Chu, “ Significance analysis of microarrays applied to the ionizing radiation response ,” Proceedings of the National Academy of Sciences of the United States of America , vol. 98, no. 9, pp. 5116–5121, 2001. G. W. Wright and R. M. Simon, “ A random variance model for detection of differential gene expression in small microarray experiments ,” Bioinformatics , vol. 19, no. 18, pp. 2448–2455, 2003. G. K. Smyth, J. Michaud, and H. S. Scott, “ Use of within-array replicate spots for assessing differential expression in microarray experiments ,” Bioinformatics , vol. 21, no. 9, pp. 2067–2075, 2005. G. Gibson and R. D. Wolfinger, “Gene expression profiling using mixed models,” in Genetic Analysis of Complex Traits Using SAS , A. M. Saxton, Ed., pp. 251–279, SAS Users Press, Cary, NC, USA, 2004. // http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png International Journal of Plant Genomics Hindawi Publishing Corporation

Statistical Analysis of Efficient Unbalanced Factorial Designs for Two-Color Microarray Experiments

International Journal of Plant Genomics , Volume 2008 (2008) – Jun 18, 2008

Loading next page...
 
/lp/hindawi-publishing-corporation/statistical-analysis-of-efficient-unbalanced-factorial-designs-for-two-DkMk0kpGnQ

References

References for this paper are not available at this time. We will be adding them shortly, thank you for your patience.

Publisher
Hindawi Publishing Corporation
Copyright
Copyright © 2008 Robert J. Tempelman.
ISSN
1687-5370
eISSN
1687-5389
Publisher site
See Article on Publisher Site

Abstract

Statistical Analysis of Efficient Unbalanced Factorial Designs for Two-Color Microarray Experiments //// Hindawi Publishing Corporation Home Journals About Us About this Journal Submit a Manuscript Table of Contents Journal Menu Abstracting and Indexing Aims and Scope Article Processing Charges Articles in Press Author Guidelines Bibliographic Information Contact Information Editorial Board Editorial Workflow Free eTOC Alerts Subscription Information Open Focus Issues Focus Issue Guidelines Open Special Issues Published Special Issues Special Issue Guidelines Abstract Full-Text PDF Full-Text HTML Linked References How to Cite this Article Complete Special Issue International Journal of Plant Genomics Volume 2008 (2008), Article ID 584360, 16 pages doi:10.1155/2008/584360 Review Article <h2>Statistical Analysis of Efficient Unbalanced Factorial Designs for Two-Color Microarray Experiments</h2> Robert J. Tempelman Department of Animal Science, College of Agriculture and Natural Resources, Michigan State University, East Lansing, MI 48824-1225, USA Received 2 November 2007; Revised 22 January 2008; Accepted 25 April 2008 Academic Editor: Chunguang Du Copyright © 2008 Robert J. Tempelman. This is an open access article distributed under the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract Experimental designs that efficiently embed a fixed effects treatment structure within a random effects design structure typically require a mixed-model approach to data analyses. Although mixed model software tailored for the analysis of two-color microarray data is increasingly available, much of this software is generally not capable of correctly analyzing the elaborate incomplete block designs that are being increasingly proposed and used for factorial treatment structures. That is, optimized designs are generally unbalanced as it pertains to various treatment comparisons, with different specifications of experimental variability often required for different treatment factors. This paper uses a publicly available microarray dataset, as based upon an efficient experimental design, to demonstrate a proper mixed model analysis of a typical unbalanced factorial design characterized by incomplete blocks and hierarchical levels of variability. 1. Introduction The choice and optimization of experimental designs for two-color microarrays have been receiving increasing attention [ 1 – 13 ]. Interest has been particularly directed towards optimizing experiments that involve a factorial design construction [ 7 , 9 , 14 ] in order to study the joint effects of several factors such as, for example, genotypes, pathogens, and herbicides. It is well known by plant scientists that factorial designs are more efficient than one-factor-at-a-time studies and allow the investigation of potentially interesting interactions between two or more factors. For example, investigators may study how herbicide effects (i.e., mean differences) depend upon plant genotypes or times after application. Two-color systems such as spotted cDNA or long oligonucleotide microarrays involve hybridizations of two different mRNA samples to the same microarray, each of the two samples being labeled with a different dye (e.g., Cy3 or Cy5; Alexa555 or Alexa647). These microarrays, also simply referred to as arrays or slides, generally contain thousands of probes with generally a few (≤4) spots per probe, and most often just one spot per probe. Each probe specifically hybridizes to a matching mRNA transcript of interest within each sample. After hybridization, microarray images are scanned at two different wavelengths as appropriate for each dye, thereby providing two different fluorescence intensity measurements for each probe. Upon further preprocessing or normalization [ 15 ], these dye-specific intensities for each probe are believed to reflect the relative mRNA abundance for the corresponding transcript within the respectively labeled samples. The normalized intensities, or the Cy3/Cy5 ratio thereof, for each spot are typically logarithmically transformed to render data that is generally characterized to be approximately normally distributed. An increasingly unifying and indisputable message is that the heavily used common reference design is statistically inefficient [ 1 , 9 , 10 , 12 , 13 ]. Here, the same common reference sample or pool is reused as one of the two samples on every microarray, the other sample deriving from a treatment group of interest. Hence, inferences on differential expression are based only on indirect connections across arrays as samples from different treatments of interest are never directly connected or hybridized together on the same microarray. In contrast, most of the alternatively proposed efficient designs are incomplete block designs, the most popular being various deviations of the loop design as first proposed for microarrays by Kerr and Churchill [ 16 ]. In these designs, direct connections or hybridizations are typically reserved for the most important treatment comparisons with inference on other comparisons being generally as efficient as any based on the common reference design. The intent of this review is to reemphasize the use of mixed models as the foundation for statistical analysis of efficient factorial designs for microarrays. Mixed model analysis for microarray data was first proposed by Wolfinger et al. [ 17 ]. However, this and other previous expositions on the use of mixed model analysis for microarray data have been primarily directed towards the analysis of completely balanced designs [ 18 , 19 ] whereas many recently proposed designs for microarray studies are unbalanced with respect to, for example, different standard errors on all pairwise comparisons between treatment groups [ 10 , 13 ]. We will review various aspects of mixed model analysis for unbalanced designs, including a demonstration on publicly available data from a recent plant genomics study [ 20 ]. 2. The Connection between Mixed Models and Efficient Designs Efficient experimental designs are typically constructed such that their factors can be broadly partitioned into two categories: treatment structure factors and design structure factors [ 21 ]. The treatment structure naturally includes the factors of greatest interest; for example, herbicides, genotypes, tissues, and so forth, whose effects are deemed to be fixed. In other words, the levels of these fixed effects factors are specifically chosen by the investigator such that mean comparisons between such levels, for example, different treatments, are of primary interest. These factors also include any of whose levels are consistently reused over different experiments, such as dye labels, for example, Cy3 versus Cy5, for two-color microarrays. On the other hand, the design structure primarily includes random effects factors, whereby the levels of each such factor are considered to be randomly chosen from a conceptually infinite set of such levels [ 22 ]. For example, the specific arrays used for a microarray study are considered to be a random sample from a large, perhaps hypothetically infinite, population of arrays; similar claims would be made regarding biological replicates, for example, plants, pools thereof, or even field plots as dependent upon the experimental design [ 14 ]. Within each random-effects factor, the effects are typically specified to be normally, independently, and identically distributed (NIID) with variability in effects formally quantified by a variance component (VC). These design structure or random effects factors are typically further partitioned into two subcategories: blocking factors and experimental error factors. In two-color microarray experiments, arrays are typically blocking factors as treatments can be directly compared within arrays, although this is not true for the common reference design as previously noted. Blocking represents a longstanding and efficient experimental design strategy for improving precision of inference on treatment comparisons. Experimental error factors, such as plants or pooled samples thereof within treatments, are often necessary to be included as random effects in order to properly specify true experimental replication at the biological level rather than merely at the measurement or technical level. Such specifications are particularly required when multiple aliquots are derived from the same biological specimen for use in multiple arrays [ 20 , 23 ] or when probes for each gene transcript are spotted more than once on each array. Of course, plants may also alternatively serve as blocking factors in some designs if different tissues are compared within plants. Currently, there is much software available for microarray data analysis, some of which is only suited for studies having only a treatment structure but no pure design structure. Common examples include the analysis of data generated from single channel systems (e.g., Affymetrix) or of log ratios generated from common reference designs. When no random effects are specified, other than the residuals, the corresponding statistical models are then simply fixed-effects models. Ordinary least squares (OLS) inference is then typically used to infer upon the treatment effects in these studies. OLS is appropriate if the assumption is valid that there is only one composite residual source of variability such that the residuals unique to each observation are NIID. Conversely, statistical analysis of efficient two-color experiments having a fully integrated treatment and design structure needs to account for fixed and random effects as typical of a mixed effects model, more often simply referred to as a mixed model. Generalized least squares (GLS) analysis, also referred to as mixed-model analysis, has been recognized as optimal in terms of minimizing variance of estimates for inference on treatment comparisons. This is true not only for efficient microarray designs [ 10 , 17 , 19 , 24 ] but even for general plant science and agronomy research [ 25 – 27 ], including recent applications in plant genomics research [ 20 , 23 , 28 ]. Some of the more recently popular microarray data analysis software has some mixed model analysis capabilities [ 29 , 30 ]. Recall that some designs may be characterized by different levels of variability thereby requiring particular care in order to properly separate biological from technical replication, for example. Hence, it is imperative for the data analyst to know how to correctly construct the hypothesis test statistics, including the determination or, in some cases, the estimation of the appropriate degrees of freedom. Although, some of these issues have been discussed for balanced designs by Rosa et al. [ 19 ], they have not generally been carefully addressed for the analysis of microarray data generated from unbalanced designs. Optimally constructed experimental designs are often unbalanced with respect to inference on all pairwise treatment comparisons, such that even greater care for statistical inference is required than in completely balanced designs. For example, Wit et al. [ 13 ] proposed a method for optimizing two-color microarray designs to compare any number of treatment groups. Suppose that 9 different treatment groups are to be compared. Using the methods and software developed by Wit et al. [ 13 ], the recommended interwoven loop design that is optimized for A-optimality (lowest average squared standard errors for a particular arrangement of treatment comparisons) is provided in Figure 1 . Although Figure 1 appears to be visually symmetric with respect to the treatment labels, including that all treatment groups are dye balanced, not all treatment groups are directly hybridized against each other. Hence, inferences on all pairwise comparisons between treatment groups will not be equally precise. For example, the standard errors for the inference on treatments R2 versus R8 or R8 versus R24 will not be the same as that for treatments R8 versus S24 or R8 versus M2 due to the differences in the number and/or degree of direct and indirect connections for these two sets of comparisons in Figure 1 . Figure 1: Optimized interwoven loop design for 9 treatments using R package SMIDA (Wit et al., 2005). Each circle represents a different treatment group. Each arrow represents a single array hybridization with circle base representing the Cy3 labeled sample and tail end representing the Cy5 labeled sample. Even for some balanced factorial designs, where the standard errors for comparing mean differences for levels of a certain factor are the same for all pairwise comparisons, the experimental error structure can vary substantially for different factors. That is, substantial care is required in deriving the correct test statistics, particularly with split plot arrangements [ 14 ]. Of course, even when a completely balanced design is intended, data editing procedures that delete poor quality spots for certain genes would naturally result in unbalanced designs. 3. Case Study 3.1. Design Zou et al. [ 20 ] present an experiment where three different inoculate treatments were applied to soybean ( Glycine max. ) plants 14 days after planting. The three different inoculates included bacteria inoculation along with the avirulence gene avrB thereby conferring resistance (R), bacteria inoculation without avrB thereby conferring susceptibility (S), and a control group whereby the inoculate simply contained an MgCl 2 solution (M). Unfoliated leaves from three to four plants were drawn and pooled for each treatment at each of three different times after postinoculation; 2, 8, and 24 hours. Hence, the treatment structure was comprised of a factorial, that is, 3 inoculates ×3 times, for a total of 9 groups. A 10th group involving a fourth null inoculate with leaves harvested at 2 hours postinoculation, N2, was additionally studied by Zou et al. [ 20 ]. The complete dataset on gene expression data for all 27 684 genes represented on a set of three microarray platforms as used by Zou et al. [ 20 ] is available as accession number GSE 2961 from the NCBI gene expression omnibus (GEO) repository ( http://www.ncbi.nlm.nih.gov/geo/ ). The vast majority of the corresponding probes were spotted only once per array or slide for each platform. A graphical depiction of the 13 hybridizations that superimposes the design structure upon one replicate of the factorial treatment structure plus the additional 14th hybridization involving the 10th group N2 is illustrated in Figure 2 . Note that at least two aliquots per each pooled sample are used, each aliquot being labeled with different dyes such that each replicate pool is used in at least two different hybridizations or arrays with opposite dye assignments. In other words, this design is characterized by technical replication such that it is imperative to explicitly model samples within inoculate by time combination as the biological replicates, that is, a set of random effects for modeling experimental error. Failing to do so would confuse pseudoreplication with true replication in the statistical analysis as each of the 2+ aliquots per each pool would then be incorrectly counted as 2+ different experimental replicates. The design in Figure 2 was replicated twice by Zou et al. [ 20 ], the second replication being of the exact same dye assignment and hybridization orientation as the first, for a total of 28 hybridizations. Hence, there were 20 samples (pools of leaves) utilized in the experiment, 2 per each of the 9 inoculate by time treatment groups plus 2 samples for the N2 control. Figure 2: Experimental design for one replicate from Zou et al. (2005). Treatments included a full factorial of inoculate and time effects plus a 10th null control group at time 2 (N2). Samples indicated by circles with letters indicating inoculate assignment: bacteria resistant (R), a bacteria susceptible (S), and MgCl 2 (M) control inoculate and numbers indicating time (2, 8, or 24 hours) after inoculation. Each arrow represents a single array hybridization with circle base representing the Cy3 labeled sample and tail end representing the Cy5 labeled sample. Solid arrows refer to the A-loop design of Landgrebe et al. (2006). We arbitrarily consider gene expression measurements for just one particular gene based on the GEO submission from Zou et al. [ 20 ]: ID_REF #30 located in the metarow-metacolumn-row-column location 1-1-2-14 of each array from microarray platform GPL 1013, one of three different platforms used by Zou et al. [ 20 ] and further described in GEO. The statistical analysis of any of the remaining 27 683 genes that were spotted once on each slide across the three different platforms would be exactly the same as that for ID_REF #30, at least for those genes where no observations would be edited out for poor data quality. We use the normalized Cy3 and Cy5 data, provided as fields S532N and S635N in accession number GSE 2961 for ID_REF #30 from GEO. Hence, for the 28 hybridizations considered for two replications of Figure 2 , there were 56 fluorescence intensities (28 Cy3 and 28 Cy5) for each gene. The 56 fluorescence intensities for ID_REF #30, as retrieved from GSE 2961 in GEO, are reproduced in Table 4 . 3.2. Statistical Model For the purposes of this review, we concentrate our attention just on the subdesign characterized by the solid arrows in Figure 2 that connect the three primary inoculates (R,S, and M) together within each of the 3 different times (2, 8, and 24 hours). The remaining dashed lines in Figure 2 involve either the 10th group (N2) or connect adjacent times (2 with 8 and 8 with 24) within each of two inoculates (R and S); note that no hybridizations connecting any of the three times within inoculate M were provided with GSE 2961 on GEO. Labeling inoculate type as Factor A and time after inoculation as Factor B, the resulting subdesign is an example of the “A-loop” design presented by Landgrebe et al. [ 9 ] as illustrated in their Figure 2 (B), albeit for a factorial treatment structure in their case. In other words, the only direct connections between the 9 treatment groups within arrays involve comparisons of levels of Factor A within levels of Factor B. Using the log intensities as the response variables for further statistical analysis, an appropriate linear mixed model to specify for this A-loop design would be as follows: where is the log fluorescence intensity pertaining to the l th biological replicate assigned to the th inoculate and th time labeled with the k th dye ( k = 1 or 2), and hybridized to array within the th time. Here, μ is the overall mean, is the effect of the i th inoculate, is the effect of the j th time, is the interaction effect between the i th inoculate and j th time, and is the effect of the k th dye, all of which are defined to be fixed effects. The design structure component of ( 1 ) is defined by the random effects of for the l th pool or biological replicate ( l = 1, 2) within the ij th inoculate-time combination, for the m th array or slide within the j th time, and the residual unique to the same subscript identifiers as that for . The typical distributional assumptions in mixed models are such that each of the three sets of random effects are NIID with their own VC; that is, , , and . As clearly demonstrated by Dobbin et al. [ 31 ] and based on our experiences, dye effects should be modeled in ( 1 ), even after using global normalization procedures such as loess [ 15 ], as gene-specific dye effects are common. Nevertheless, one would not normally anticipate interaction effects between dye and other treatment factors (e.g., inoculate or time), and hence these effects are not specified in ( 1 ). It should be somewhat apparent from the A-loop design of Figure 2 why the nesting or hierarchical specifications are specified as such for the random effects. For example, although each pool or replicate is labeled twice, once with each dye, each pool is still part of or nested within the same inoculate by time combination such that samples or replicates are specified to be nested within inoculate by time. Similarly, arrays are nested within times since each array is associated with only one particular level of time; that is, different times are never directly compared or connected within arrays. Hence, one should intuitively recognize from Figure 2 that there would be greater precision for inferring upon inoculate effects than for time effects using the A-loop design. That is, the variability due to arrays is completely confounded with time differences such that it partly defines the experimental unit or replicate for time. 3.3. Classical ANOVA The complex nature of different levels of replication in the A-loop this design is further confirmed in the classical analysis of variance or ANOVA [ 21 ] for this design in Table 1 . However, as demonstrated later, classical ANOVA is not necessarily equivalent to a more optimal GLS or mixed model analysis [ 32 ]; in fact, estimates of treatment effects based on classical ANOVA are simply equivalent to OLS estimates where all factors are treated as fixed. Nevertheless, the classical ANOVA table, when extended to include expected mean squares (EMS), is instructive in terms of identifying different levels of replication and hence experimental error. Table 1: Classical ANOVA of log intensities for duplicated A-loop design component of Figure 2 for any particular gene using ( 1 ). Classical ANOVA is based on equating sums of squares (SS), also called quadratic forms, to their expectations; typically this involves equating mean squares (MS), being SS divided by their degrees of freedom (ν), to their EMS. For completely balanced designs, there is generally one universal manner in which these quadratic forms, and hence the ANOVA table, are constructed [ 19 , 22 ]. However, for unbalanced designs, such as all of or even just the A-loop component of Figure 2 , there are a number of ways of constructing different quadratic forms and hence different ways of constructing ANOVA tables for the same set of data [ 21 , 32 ]. The most common ANOVA strategy is based on the use of type III quadratic forms as in Table 1 whereby the SS for each factor is adjusted for every other factor in the model. More details on type III and alternative ANOVA quadratic forms for unbalanced data can be found in Milliken and Johnson [ 21 ] and Searle [ 33 ]. Table 1 conceptually illustrates the basic components of an ANOVA table; again, for every term, say X, in a statistical model like ( 1 ), there is a sum of squares , degrees of freedom , mean square , and expected mean square . Generally, ANOVA tests on fixed effects are of greatest interest; for example, inoculate, time, and inoculate by time interaction. The correct F ratio test statistic for any fixed effects term in the ANOVA table is constructed such that its MS and a denominator MS have the same EMS if the null hypothesis is true; that is, that there are truly no effects for that particular term. In statistical parlance, no effects for a term , whether that pertains to the main effects of a factor or the interaction effects between two or more factors, is synonymous with its corresponding noncentrality parameter being equal to zero; that is, there is no signal due to that model term [ 32 ]. Consider, for example, the test for the main effects of inoculate denoted as Factor A in Table 1 . If the main effects of inoculate are nonexistent, that is, there are no overall or marginal mean differences between any of the inoculates, then . It should be clearly noted that when , the EMS for inoculate matches with the EMS for replicate within inoculate and time, denoted as rep(inoculate * time) in Table 1 . In other words, rep(inoculate * time) is said to be the denominator or error term for the main effects of inoculate such that rep(inoculate * time) defines the experimental unit or the biological replicate for inoculate effects. Hence, the correct F statistic for testing inoculate effects, as demonstrated from Table 1 , is based on numerator and denominator degrees of freedom. It should also be observed that this same error term or experimental unit would be specified as the denominator MS term for the ANOVA -test on inoculate by time interaction effects, denoted as inoculate * time in Table 1 . That is, when the corresponding noncentrality parameter , both inoculate * time and rep(inoculate * time) share the same EMS such that the correct F statistic for testing this interaction is based on numerator and denominator degrees of freedom. It was previously noted from the A-loop design of Figure 2 that inference on the main effects of time (Factor B) should be less precise than that for the main effects of inoculate. In other words, the size of the experimental unit should be larger for time effects since arrays are nested within levels of time whereas levels of inoculate treatments are directly compared within arrays. This is further demonstrated in Table 1 by the EMS for time with , being larger than that for inoculate effects with , under the corresponding true null hypotheses of no main effects for either factor. In fact, the experimental error term for time is composite of both rep(inoculate * time) and arrays(time) such that marginal mean comparisons between the three times, 2, 8, and 24 hours, will be affected by more noise than marginal mean comparisons between the three inoculates which were directly and indirectly connected within arrays. Note that under the null hypothesis of no time effects , there is no one other MS that shares the same EMS that would allow one to readily construct an ANOVA F -statistic for the main effects of time. Satterthwaite [ 34 ] provided a solution to this problem by proposing the “synthesis” of a denominator MS , call it , as being a linear combination of q random effects MS: where are known coefficients such that has the same expectation as that for a certain model term having mean square under the null hypothesis . Then is approximately distributed as a random variable from a central F distribution with numerator and denominator degrees of freedom, where with denoting . In our example, consider the synthesized as being a linear combination of the MS for rep(inoculate * time), array(time), and residual. With reference to ( 2 ), is then a linear function of different MS with , , and . Using the EMS for these three MS provided from Table 1 as , , and , respectively, it should be readily seen that the expectation of is then That is, shares the same EMS as that for time in Table 1 when . Hence, a suitable F statistic for inferring upon the main effects of time would be . To help further illustrate these concepts, let us conduct the ANOVA on the data generated from the A-loop design of Figure 2 for ID_REF #30 from Zou et al. [ 20 ]; that is, using data from arrays 1–9 and 15–23 as provided in Table 4 . The classical ANOVA table using the method=type3 option of the popular mixed-model software SAS PROC MIXED [ 35 ] for that particular gene is provided in Table 2 ; SAS code for all statistical analysis presented in this paper is provided in Figure 3 and also available for download, along with the data in Table 4 , from http://www.msu.edu/~tempelma/ijpg2008.sas . As noted previously, the correct denominator MS term for testing the main effects of inoculate is replicate within inoculate by time. Hence, the corresponding F statistic = = , with numerator and denominator degrees of freedom leading to a P -value of 0.1172. Similarly, for the inoculate * time interaction, the appropriate F-test statistic is = , with numerator and denominator degrees of freedom leading to a P -value of 0.3435. Even without considering the control of false discovery rates (FDRs) that involve the joint control of type I errors with respect to the remaining 27 683 genes, it seems apparent that neither the main effects of inoculate nor the interaction between inoculate and time would be statistically significant for gene ID_REF #30. Table 2: Classical ANOVA of log intensities for duplicated A-loop design component of Figure 2 on ID_REF #30 from Zou et al. (2005) using output from SAS PROC MIXED (code in Figure 3 ). Figure 3: SAS code for classical ANOVA and EGLS inference. Comments describing purpose immediately provided after corresponding code between / * and * / as with a regular SAS program. EGLS based on REML would simply involve substituting method = reml for method = type3 in the third line of the code. The synthesized denominator for time effects is . The estimated degrees of freedom for this synthesized MS using ( 3 ) is then Hence, the main effects of time, appropriate F -test statistic is = , with numerator and denominator degrees of freedom leading to a P -value of 0.0683 as also reported in the SAS output provided in Table 2 . 3.4. Mixed Model Analysis Although the classical ANOVA table is indeed instructive in terms of illustrating the different levels of variability and experimental error, it is not the optimal statistical analysis method for a mixed effects model, especially when the design is unbalanced. A mixed-model or GLS analysis more efficiently uses information on the design structure (i.e., random effects) for inferring upon the fixed treatment structure effects [ 27 , 32 ]. Unfortunately, GLS, in spite of its optimality properties, is generally not attainable with real data because the VC (e.g., , , and ) must be known. Hence, the VC must generally be estimated from the data at hand. There are a number of different methods that are available for estimating VC in mixed models [ 22 ]. The classical ANOVA method is based on equating MS to their EMS in the ANOVA table. For example, using the bottom row of Table 1 , the EMS of MSE is . So then using the numerical results for ID_REF #30 from Table 2 , the type III ANOVA estimate of is simply . Now work up one row further in Table 1 to the term array(time). Equating from the same corresponding row in Table 2 to its EMS of using gives . Finally, work up one more (i.e., third to last) row in both tables. Equating from Table 2 to its EMS of using leads to . So array variability is estimated to be roughly four times larger than the biological variability which, in turn, is estimated to be somewhat larger than residual variability for ID_REF #30. Recall that with unbalanced designs, quadratic forms are not unique such that ANOVA estimators of VC will not be unique either. Nevertheless, type III quadratic forms are most commonly chosen as then the SS for each term is adjusted for all other terms, as previously noted. Although ANOVA estimates of VC are unbiased, they are not efficient nor optimal in terms of estimates having minimum standard error [ 25 ]. Restricted maximum likelihood (REML) is a generally more preferred method of VC estimation [ 22 , 36 , 37 ] and is believed to have more desirable properties. Nevertheless, the corresponding REML estimates , and for ID_REF #30 are in some qualitative agreement with the previously provided ANOVA estimates. Once the VCs are estimated, they are substituted for the true unknown VCs to provide the “estimated” GLS or EGLS of the fixed effects. It is important to note that typically EGLS = GLS for balanced designs, such that knowledge of VC is somewhat irrelevant for point estimation of treatment effects. However, the same is generally not true for unbalanced designs, such as either the A-loop design derived from Figure 2 or even the interwoven loop design from Figure 1 . Hence, different methods of VC estimation could lead to different EGLS estimates of treatment effects as we demonstrate later. Suppose that it was of interest to compare the various mean responses of various inoculate by time group combinations in the duplicated A-loop design example. Based on the effects defined in the statistical model for this design in ( 1 ), the true mean response for the i th inoculate at the j th time averaged across the two dye effects ( and ) can be written as If the levels are, say, ordered alphanumerically, the mean difference between inoculate and at time (2 hours) is specified as . Using ( 6 ), this difference written as a function of the model effects is then = . Similarly, the mean difference between time (2 hours) and time (8 hours) for inoculate could be derived as Note that these two comparisons or contrasts can be more elegantly written using matrix algebra notation. A better understanding of contrasts is useful to help determine the correct standard errors and statistics used to test these contrasts, including how to write the corresponding SAS code. Hence, a matrix algebra approach to hypothesis testing on contrasts is provided in Appendix 5 that complements the SAS code provided in Figure 3 . For now, however, we simply use the “hat” notation in referring to the EGLS estimates of these two contrasts as and respectively. As we already intuitively noted from the A-loop design of Figure 2 , inference on should be much more precise than that for since inoculates are compared within arrays whereas times are not. This distinction should then be reflected in a larger standard error for than for . Indeed, using the REML estimates of VC for EGLS inference, this is demonstrated by whereas for ID_REF #30. However, these standard errors are actually slightly understated since they do not take into account the uncertainty of the VC estimates as discussed by Kackar and Harville [ 38 ]. Kenward and Roger [ 39 ] derive a procedure to take this uncertainty into account which is part of the SAS PROC MIXED implementation using the option ddfm=kr [ 35 ] as indicated in Figure 3 . Invoking this option raises the two standard errors accordingly, albeit very slightly, to and . Now, the denominator degrees of freedom for inference on these two contrasts should also differ given that the nature of experimental error variability somewhat differs for inoculate comparisons as opposed to time comparisons as noted previously from Figure 2 . However, with EGLS, there are no SS and hence no corresponding MS or EMS expression for each main effects or interaction term in the model, such that determining the correct test statistic and degrees of freedom is somewhat less obvious than with the previously described classical ANOVA approach [ 32 ]. Giesbrecht and Burns [ 40 ] introduced a procedure for estimating the denominator degrees of freedom for EGLS inference which, again, is invoked with the ddfm=kr option of SAS PROC MIXED. Using this option along with REML estimation of VC for the analysis of ID_REF #30, the estimated degrees of freedom for is 5.28 whereas that for is 17.0. Contrasts are also used in EGLS to provide ANOVA-like F tests for the overall importance of various fixed effects; more details based on the specification of contrast matrices to test these effects are provided in Appendix 5 . For example, denote the marginal or overall mean of inoculate i averaged across the 3 times and 2 dyes as . The numerator degrees of freedom hypothesis test for the main effects of inoculates can be written as a combination of two complementary contrasts (A1) and (A2) ; that is, if both contrasts are 0, then obviously is also true such that then is true. Similarly, let us suppose that one wished to test the main effects of times (Factor B). Then, it could be readily demonstrated that the corresponding hypothesis test can also be written as a combination of complementary contrasts: (B1) and (B2) , where denotes the marginal mean for the j th level of Factor B; that is, the j th time. If both component hypotheses (B1) and (B2) are true, then is also true thereby defining the composite numerator degrees of freedom hypothesis test for the main effects of Factor B. Now the interaction between inoculate and time is a numerator degrees of freedom test as previously noted from Tables 1 and 2 , suggesting that there are 4 complementary contrasts that jointly test for the interaction of the two factors. Of course, it is also well known that the interaction degrees of freedom is typically the product of the main effects degrees of freedom for the two factors considered. Two of the four degrees of freedom for the interaction involve testing whether or not the mean difference between inoculates 1 and 3 is the same within time 1 as it is within time 3, that is, (AB1) , and whether or not the mean difference between inoculates 2 and 3 is the same within time 1 as it is within time 3; that is, (AB2) . If both hypotheses (AB1) and (AB2) are true then it should be apparent that is also true; that is, the mean difference between inoculates 1 and 2 is the same within time 1 as it is within time 3. The remaining two degrees of freedom for the interaction involve testing whether or not the mean difference between inoculates 1 and 3 is the same within time 2 as it is within time 3; that is, (AB3) , and whether or not the mean difference between inoculates 2 and 3 is the same within time 2 as it is within time 3; that is, (AB4) . If both hypotheses (AB3) and (AB4) are true then is also true. Hence, contrasts AB1, AB2, AB3, and AB4 completely define the four components or numerator degrees of freedom for the interaction between Factors A and B. That is, the test for determining whether or not the mean differences between all levels of A are the same within each level of B, and vice versa, can be fully characterized by these four complementary contrasts. The EGLS statistics used for testing the overall importance of these main effects or interactions are approximately distributed as F -random variables with the numerator degrees of freedom defined by the number of complementary components or contrasts as previously described; refer to Appendix 5 and elsewhere [ 27 , 32 , 35 ] for more details. Now, the denominator degrees of freedom for each contrast are dependent upon the design and can be determined based on that using classical ANOVA as in Table 1 or by a multivariate extension of the Satterthwaite-based procedure proposed by Fai and Cornelius [ 41 ]; again this option is available as ddfm=kr using SAS PROC MIXED (Figure 3 ). Unfortunately, much available software used for mixed model analysis of microarray data does not carefully take into consideration that various fixed effects terms of interest may have different denominator degrees of freedom when constructing F test statistics. In fact, a typical strategy of such software is to assume that (i.e., the residual degrees of freedom) is the denominator degrees of freedom for all tests. This strategy is denoted as the “residual” method for determining denominator degrees of freedom by Spilke et al. [ 36 ] who demonstrated using simulation work that the use of the residual method can substantially inflate type I error rate for EGLS inference on fixed effects; in other words, the number of false-positive results or genes incorrectly declared to be differentially expressed between treatments would be unduly increased. Spilke et al. [ 36 ] further demonstrated that use of the Kenward-Rogers’ method for degrees of freedom estimation and control for uncertainty on VC provided best control of the nominal confidence interval coverage and type I error probabilities. 3.5. Impact of Method of Variance Component Estimation on EGLS It was previously noted that the estimated standard errors for EGLS on two contrasts and were and , respectively, when REML was used to estimate the variance components for ID_REF #30. If the VC estimates are computed using type III ANOVA, then these estimated standard errors would differ accordingly; that is, and , respectively. What perhaps is even more disconcerting is that the estimates of and also differ between the two EGLS inferences; for example, using REML, and whereas using ANOVA and . The overall EGLS tests for ID_REF #30 for testing the main effects of inoculate, time and their interaction as based on the previously characterized complementary contrasts are provided separately for ANOVA versus REML estimates of VC in Table 3 ; this output is generated as type III tests using the SAS code provided in Figure 3 . From here, it should be clearly noted that conclusions upon the overall importance of various fixed effects terms in ( 1 ) as derived from EGLS inference subtly depend upon the method of VC estimation; for example, the EGLS P -values in Table 3 tend to be several points smaller using ANOVA compared to REML; furthermore, note the differences in the estimated denominator degrees of freedom between the two sets. Naturally, this begs the question as to which method of VC estimation should be used? Table 3: EGLS inference on overall importance of fixed effects for ID_REF #30 based on REML versus ANOVA (type III quadratic forms) for estimation of variance components using output from SAS PROC MIXED (code in Figure 3 ). Table 4: Dataset for ID_REF #30 for all hybridizations (14 arrays/loop x2 loops) in Figure 1 for each of two replicates per 10 inoculate by time groups, fluorescence intensities provided as y, log(base 2) intensities provided as ly. In completely balanced designs, ANOVA and REML lead to identical estimates of VC and identical EGLS inference, provided that all ANOVA estimates of VC are positive. ANOVA estimates of VC that are negative are generally constrained by REML to be zero, thereby causing a “ripple” effect on REML estimates of other VC and subsequently on EGLS inference [ 42 ]. As noted previously, REML does tend to outperform most other methods for various properties of VC estimation [ 37 ]. Furthermore, there is evidence that EGLS based on ANOVA leads to poorer control of type I error rate for inference on fixed effects compared to EGLS based on REML in unbalanced data structures [ 36 ]. However, Stroup and Littell [ 42 ] concluded that EGLS using REML may sometimes lead to inference on fixed effects that is too conservative (i.e., actual error rates less than nominal type I error rate) again due to the nonnegative REML restrictions on the VC estimates and associated ripple effects. This issue warrants further study given that it has implications for control of FDR which are most commonly used to control the rate of type I errors in microarray studies [ 43 ]. Estimation of FDR inherently depends upon the distribution of P -values for treatment effects across all genes such that even mild perturbations on this distribution have potential bias implications for control of false-positive rates. 4. Other Issues for the Design Analysis Interface 4.1. Log Ratio versus Log Intensity Modeling Recent work on the optimization and comparison of various efficient microarray designs have been based on the assumption of OLS inference; that is, no random sources of variability other than residuals are considered [ 2 , 8 , 9 , 13 ]. While this observation may seem to be counterintuitive given that the arguments laid out in this review for the need of (E)GLS to analyze efficient designs, it is important to note at least a couple of things. First, virtually all of the work on design optimization has been based on the assumption that a sample or pool is used only once; the corresponding interwoven loop designs in such cases [ 13 ] have been referred to as classical loop designs [ 10 , 19 ]. However, sometimes two or more aliquots from each sample are used in separate hybridizations [ 20 , 23 ] such as the A-loop design, example used in this review; the corresponding designs are connected loop designs [ 10 , 19 ] that require the specification of random biological replicate effects separate from residual effects as previously noted. Secondly, almost all of the design optimization work has been based on the use of Cy3/Cy5 log ratios as the response variables rather than dye-specific log intensities as used in this review. This data reduction, that is, from two fluorescence intensities to one ratio per spot on an array, certainly eliminates array as a factor to specify in a linear model. However, the use of log ratios can severely limit estimability and inference efficiency of certain comparisons. Suppose that instead of using the 36 log intensities from the duplicated A-loop design from arrays 1–9 and 15–23 of Table 4 , we used the derivative 18 Cy3/Cy5 log ratios as the response variables. For example, the two corresponding log 2 Cy3 and Cy5 fluorescence intensities for array 1 from Table 4 are 13.9946 and 14.3312. The Cy3/Cy5 log ratio is then the difference or 0.3316 corresponding to a fold change of . Using log ratios as their response variables, Landgrebe et al. [ 9 ] concluded that it was impossible to infer upon the main effects of Factor B (e.g., time) in the A-loop design. However, as we demonstrated earlier, it is possible to infer upon these effects using ANOVA or EGLS analysis on the log intensities. Jin et al. [ 18 ] similarly illustrate the utility of log intensity analysis in a split plot design that would not otherwise have been possible using log ratios. Milliken et al. [ 14 ] provide much more extensive mixed modeling details on the utility of log intensity analysis in nested or split-plot microarray designs similar to the A-loop design. The relative efficiency of some designs may be seen to depend upon the relative magnitude of biological to technical variation [ 10 , 44 ]; sometimes it is only possible to separately estimate these two sources of variability using log intensities rather than log ratios thereby requiring the use of (E)GLS rather than OLS. In fact, analysis of log intensities using mixed effects model appears to be not only more flexible than log-ratio modeling but is statistically more efficient in recovering more data information [ 1 , 45 ]. That is, as also noted by Milliken et al. [ 14 ], treatment effects are more efficiently estimated by combining intraarray and interarray information in a mixed model analysis when an incomplete block design is used, and arrays are explicitly included as random effects by analyzing log intensities rather than log ratios. 4.2. Choosing between Efficient Experimental Designs Using Mixed Models There are a number of different criteria that might be used to choose between different designs for two-color microarrays. We have already noted that the interwoven loop design in Figure 1 is A-optimal for pairwise comparisons between 9 treatment groups. A-optimality has been criticized for microarray studies because it chooses designs with improved efficiency for certain contrasts at the expense of other perhaps more relevant contrasts and further depends upon the parameterization of the linear model [ 1 , 6 , 9 ]; other commonly considered types of optimality criteria are possible and further discussed by Wit et al. [ 13 ] and Landgrebe et al. [ 9 ]. At any rate, it is somewhat possible to modify A-optimality to explicitly take into account a particular set of scientific questions [ 13 ]; furthermore, optimization with respect to one criterion will generally be nearly optimal for others. For one particular type of optimality criterion, Landgrebe et al. [ 9 ] demonstrated that the A-loop design has the best relative efficiency compared to other designs for inference on the main effects of Factor A and the interaction effects between A and B although the main effects of Factor B could not be inferred upon using an analysis of log ratios as previously noted. How does the A-loop design of Figure 2 generally compare to the interwoven loop design of Figure 1 if a factorial treatment structure is imposed on the 9 treatments as implied by the same labels as used in Figure 2 ? Suppose that Figure 1 is a connected interwoven loop design [ 10 ] in the sense that the outer loop of Figure 1 (dashed arrows) connects one biological replicate for each of 9 groups whereas the inner loop of Figure 1 (solid arrows) connects a second biological replicate for each of the 9 groups. Then this design would consume 18 biological replicates and 18 arrays, thereby providing a fair comparison with the duplicated A-loop design of Figure 2 . Recall that Figure 1 is A-optimized for pairwise comparisons between all 9 groups. It is not quite clear what implications this might have for statistical efficiency for the constituent main effects of , , and the effects of their interaction ; note, incidentally, that these degrees of freedom independently sum to 8 as required for 9 groups. As duly noted by Altman and Hua [ 1 ], pairwise comparisons between all 9 groups may be not as important as various main effects or interaction contrasts with a factorial treatment structure arrangement. Although, as noted earlier, Figure 1 is symmetric with respect to the treatment labels, the classical ANOVA table for this interwoven loop design would be even more complicated (not shown) than that presented for the A-loop design since there is not one single denominator MS that would serve as the experimental error term for inoculate, time or inoculate by time effects! One should perhaps compare two alternative experimental designs having the same factorial treatment structure, but a different design structure, for contrasts of highest priority, choosing those designs where such contrasts have the smaller standard error. Let us consider the following comparisons: , , and that is, respectively, the overall mean difference between inoculates 1 and 3, the overall mean difference between times 1 and 3, and the interaction component pertaining to the difference between inoculates 1 and 3 within time 1 versus that same difference within time 3. Recall that these contrasts were components of the EGLS tests on the two sets of main effects and the interaction and previously labeled as (A1), (B1), and (AB1), respectively. Now the comparison of efficient designs for the relative precision of various contrasts will generally depend upon the relative magnitude of the random effects VC as noted recently by Hedayat et al. [ 44 ] and for various microarray design comparisons [ 10 ]. Suppose the “true” variance components for , , and were 0.03, 0.06, and 0.25, comparable to either set of estimates provided previously on ID_REF #30 from Zou et al. [ 20 ]. The linear mixed model for analyzing data generated from Figure 1 would be identical to that in ( 1 ) except that arrays would no longer be specified as being nested within times. For the interwoven loop design of Figure 1 , the standard errors for each of the three contrasts are , , and whereas for the A-loop subdesign of Figure 2 , the corresponding standard errors are , , and . So whereas the optimized design in Figure 1 using Wit et al. [ 13 ] provided a substantial improvement for the estimation of overall mean time differences, the A-loop design is indeed more efficient for inferring upon the main effects of inoculate and the interaction between inoculate and time. Hence, the choice between the two designs would reflect a matter of priority for inference on the various main effects and their interactions. It should be carefully noted as demonstrated by Tempelman [ 10 ], that designs leading to lower standard errors for certain comparisons do not necessarily translate to greater statistical power as the denominator degrees of freedom for various tests may be substantially different between the two designs. 4.3. Unbalanced Designs and Shrinkage Estimation Shrinkage or empirical Bayes (EB) estimation is known to improve statistical power for inference on differential gene expression between treatments in microarray experiments [ 46 ]. Shrinkage-based estimation is based on the well-established hierarchical modeling concept that more reliable inferences on gene-specific treatment differences are to be attained by borrowing information across all genes [ 47 , 48 ]. Typically, such strategies have involved improving estimation of standard errors of gene-specific treatment differences by “shrinking” gene-specific variances towards an overall mean or other measure of central tendency. However, most shrinkage estimation procedures have been developed for fixed effects models, that is, for simple experimental designs having a treatment structure but no or very limited design structure, or even treating all design structure factors as fixed [ 30 ]. Currently popular shrinkage estimation procedures [ 49 – 51 ] are certainly appropriate for many designs based on one-color Affymetrix systems or for common reference designs. Other proposed shrinkage procedures have facilitated extensions to very special cases of nested designs [ 47 ], including some based on rather strong modeling assumptions such as a constant correlation of within-array replicate spots across all genes [ 52 ] or a design structure facilitating the use of permutation testing [ 29 ]. However, virtually none of the procedures proposed thus far are well adapted to handle unbalanced designs such as the A-loop design where different sizes of experimental units need to be specified for different treatment factors; hence investigators should proceed with caution when using shrinkage estimation for unbalanced mixed-model designs. 5. Conclusions We have provided an overview of the use of mixed linear model analysis for the processing of unbalanced microarray designs, given that most efficient incomplete block designs for microarrays are unbalanced with respect to various comparisons. We strongly believe that much mixed-model software currently available for the analysis of microarrays does not adequately address the proper determination of error terms and/or denominator degrees of freedom for various tests. This would be particularly true if we had chosen to analyze all of the data for ID_REF #30 in Table 4 from Zou et al. [ 20 ] based on all of the hybridizations depicted in Figure 2 . Even then, the size of the standard errors and estimated degrees of freedom would still be seen to be somewhat different for estimating the main effects of times compared to estimating the main effects of inoculates given the lower degree of within-array connectivity between the various levels of time as illustrated in Figure 2 . If inferences on various comparisons of interest are not conducted correctly in defining a list of differently expressed genes, all subsequent microarray analysis (e.g., FDR estimates, gene clustering, gene class analysis, etc.) are absolutely futile. We believe that it is useful to choose proven mixed-model software (e.g., SAS) to properly conduct these tests and, if necessary, to work with an experienced statistician in order to do so. We have concentrated our attention on the analysis of a particular gene. It is, nevertheless, straightforward to use SAS to serially conduct mixed-model analysis for all genes on a microarray [ 53 ]; furthermore, SAS JMP GENOMICS ( http://www.jmp.com/software/genomics/ ) provides an even more powerful user interface to the mixed model analysis of microarray data. Appendix C. Matrix Representation of The Mixed Model Analysis of The A-loop Design of Zou Et Al. Any mixed model, including that specified in ( 1 ), can be written in matrix algebra formml: Here is the vector of all data, is the vector of all fixed effects (e.g., inoculate, time, dye, and inoculate by time interaction effects), u is the vector of all random effects (e.g., arrays and sample within inoculate by time effects), and is the vector of random residual effects. Furthermore, X and Z are corresponding incidence matrices that specify the treatment and design structure of the experiment, thereby linking the treatment and design effects, and u , respectively, to y . Note that y has a dimension of for the duplicated A-loop design of Zou et al. [ 20 ]. Now and u can be further partitioned into the effects as specified in ( 1 ); for our example, such that is a vector of fixed effects; that is, there are 18 elements in ( A.2 ). Furthermore, can be similarly partitioned into a vector of random effects, , for replicates within inoculate by time and another vector of random effects, , for arrays within time; that is, there are a total of 18 biological replicates and 18 arrays in the study, each characterized by a random effect. Note that it is coincidence that the row dimensions of , , and are all 18 for this particular example design. Again, the distributional assumptions on the random and residual effects are specified the same as in the paper but now written in matrix algebra notation: , , and with denoting a vector of zeros and denoting an identity matrix of dimension . Reasonably assuming that and are pairwise independent of each other (i.e., biological sample effects are not influenced by array effects and vice versa), then the variance-covariance matrix G of u is a diagonal matrix with the first 18 diagonal elements being and the remaining 18 diagonal elements being . The GLS estimator, , of can be written [ 22 , 32 ] as with its variance-covariance matrix defined by such that denotes the generalized inverse of . Once the VC are estimated, they are substituted for the true unknown VC in V to produce which are then used to provide the “estimated” GLS or EGLS , of : As noted in the text, typically (i.e., EGLS = GLS) for balanced designs but not necessarily for unbalanced designs, such as those depicted in Figures 1 or 2 . It was previously noted in the paper that the mean difference between inoculate and at time as could be written as a function of the model effects in ( 1 ) as . Similarly, the mean difference between time and time for inoculate i could be written as . These two contrasts written in matrix notation as and , respectively, where are contrast vectors whose coefficients align in order with the elements of in ( A.2 ). For example, note from ( A.6 ) that the nonzero coefficients of 1, 1, 1, and 1 occur within the 2nd, 3rd, 8th, and 11th positions of , respectively. When these coefficients are multiplied in the same order with the 2nd, 3rd, 8th, and 11th elements of provided in ( A.2 ), one gets which is indeed = as specified previously. The reader should be able to make a similar observation for in considering how the nonzero elements of ( A.7 ) align in position with elements of in ( A.2 ) to produce . In Figure 3 , SAS PROC MIXED is used to provide the estimates, standard errors, and test statistics for these two contrasts. That is, note how all of the elements from ( A.6 ) and ( A.7 ) are completely reproduced in the estimate statements as “k1 contrast” and “k2 contrast,” respectively, in Figure 3 . Now, when the VC are known, these two contrasts can be estimated by their GLS, and . Furthermore, using ( A.4 ), the true standard errors of these two estimates can be determined as and , respectively. However, as previously noted, the VC are generally not known but must be estimated from the data such that the two contrasts are typically estimated using and with approximate standard errors determined by and . Using the REML estimates of VC as provided in the paper, the code from Figure 3 can be executed to provide whereas for ID_REF #30 by simply changing method=type3 to method=reml and by deleting ddfm=kr . However, these standard errors are actually slightly understated since they do not take into account the uncertainty of the VC or as an estimate of as discussed by Kackar and Harville [ 38 ]. Kenward and Roger [ 39 ] derive a procedure to take this uncertainty into account and which is part of the SAS PROC MIXED implementation using the ddfm=kr option [ 35 ] as specified in Figure 3 . Invoking this option raises the two standard errors accordingly, albeit very slightly, to and . Furthermore, the ddfm=kr option invokes the procedure of Giesbrecht and Burns [ 40 ] to estimate the denominator degrees of freedom for EGLS inference. Using this option and REML, the estimated degrees of freedom for is 5.28 whereas that for is 17.0 as would be noted from executing the SAS code in Figure 3 . The corresponding SAS output will furthermore include the t -test statistics for the two contrasts as and . These statistics when compared to their Student distributions with their respective estimated degrees of freedom, 5.28 and 17.0, lead to P -values of 0.66 and 0.37, respectively; that is, there is no evidence that either contrast is statistically significant. Contrast matrices on can be used to derive ANOVA-like F tests for the overall importance of various fixed effects using EGLS. Recall from the paper that the test for the main effects of inoculates can be written as a joint function of contrasts and where with μ ij. is defined as in ( 6 ). These two contrasts, labeled as (A1) and (A2) in the paper, can be jointly written together as a linear function of the elements of in ( A.2 ), where For example, the first row of specifies the coefficients for testing as a function of the elements of using ( 6 ). In other words, matching up, in order, the first row of in ( A.8 ) with the elements of in ( A.2 ), the corresponding contrast can be rewritten as . Similarly, the second row of in ( A.8 ) specifies the contrast coefficients for as a function of the elements of . Recall that the main effects of times (Factor B) involves a joint test of contrasts and labeled as (B1) and (B2) in the paper, where . In terms of the elements of in ( A.2 ), these two contrasts are jointly specified as with That is, ( A.9 ) is another 2 18 contrast matrix, just like , where the two rows of specify the coefficients for the contrasts and , respectively, as a function of the elements of in ( 6 ). Recall that the interaction between the effects of inoculates and times was numerator degrees of freedom test based on jointly testing four complementary and independent contrasts, suggesting that there are four rows that determine the corresponding contrast matrix. The complete interaction contrast can then be written as where Note that the 4 rows in ( A.10 ) specify contrast coefficients on the model effects for each of the 4 constituent component hypotheses, (AB1), (AB2), (AB3), and (AB4) as defined in the paper, when aligned with the coefficients of in ( A.2 ). As a sidenote, the somewhat uninteresting contrast for dye effects could be written using a contrast vector (not shown) in order to test the overall mean difference between the two dyes. The EGLS test statistic for testing the overall importance of any fixed effects term, say , is specified as . Here is distributed as an F -random variable under with the numerator degrees of freedom being defined by the number of rows of the contrast matrix [ 27 , 32 , 35 ]. The denominator degrees of freedom for each contrast is dependent upon the design and can be determined based on that using classical ANOVA as in Table 1 or a multivariate extension of the Satterthwaite-based procedure from Giesbrecht and Burns [ 40 ] as proposed by Fai and Cornelius [ 41 ]; again this option is available as ddfm=kr using SAS PROC MIXED (Figure 3 ). The corresponding EGLS ANOVA output for ID_REF #30, based on either ANOVA or REML estimation of VC, is provided in Table 3 . Acknowledgment Support from the Michigan Agricultural Experiment Station (Project MICL 1822) is gratefully acknowledged. <h4>References</h4> N. S. Altman and J. Hua, “ Extending the loop design for two-channel microarray experiments ,” Genetical Research , vol. 88, no. 3, pp. 153–163, 2006. F. Bretz, J. Landgrebe, and E. Brunner, “Design and analysis of two-color microarray experiments using linear models,” Methods of Information in Medicine , vol. 44, no. 3, pp. 423–430, 2005. J. S. S. Bueno Filho, S. G. Gilmour, and G. J. M. Rosa, “ Design of microarray experiments for genetical genomics studies ,” Genetics , vol. 174, no. 2, pp. 945–957, 2006. F.-S. Chai, C.-T. Liao, and S.-F. Tsai, “ Statistical designs for two-color spotted microarray experiments ,” Biometrical Journal , vol. 49, no. 2, pp. 259–271, 2007. K. Dobbin, J. H. Shih, and R. Simon, “ Questions and answers on design of dual-label microarrays for identifying differentially expressed genes ,” Journal of the National Cancer Institute , vol. 95, no. 18, pp. 1362–1369, 2003. G. F. V. Glonek and P. J. Solomon, “ Factorial and time course designs for cDNA microarray experiments ,” Biostatistics , vol. 5, no. 1, pp. 89–111, 2004. S. Gupta, “ Balanced factorial designs for cDNA microarray experiments ,” Communications in Statistics: Theory and Methods , vol. 35, no. 8, pp. 1469–1476, 2006. K. F. Kerr, “Efficient 2 k factorial designs for blocks of size 2 with microarray applications,” Journal of Quality Technology , vol. 38, no. 4, pp. 309–318, 2006. J. Landgrebe, F. Bretz, and E. Brunner, “ Efficient design and analysis of two colour factorial microarray experiments ,” Computational Statistics & Data Analysis , vol. 50, no. 2, pp. 499–517, 2006. R. J. Tempelman, “ Assessing statistical precision, power, and robustness of alternative experimental designs for two color microarray platforms based on mixed effects models ,” Veterinary Immunology and Immunopathology , vol. 105, no. 3-4, pp. 175–186, 2005. S.-F. Tsai, C.-T. Liao, and F.-S. Chai, “ Statistical designs for two-color microarray experiments involving technical replication ,” Computational Statistics & Data Analysis , vol. 51, no. 3, pp. 2078–2090, 2006. V. Vinciotti, R. Khanin, D. D'Alimonte, et al., “ An experimental evaluation of a loop versus a reference design for two-channel microarrays ,” Bioinformatics , vol. 21, no. 4, pp. 492–501, 2005. E. Wit, A. Nobile, and R. Khanin, “ Near-optimal designs for dual channel microarray studies ,” Journal of the Royal Statistical Society: Series C , vol. 54, no. 5, pp. 817–830, 2005. G. A. Milliken, K. A. Garrett, and S. E. Travers, “ Experimental design for two-color microarrays applied in a pre-existing split-plot experiment ,” Statistical Applications in Genetics and Molecular Biology , vol. 6, no. 1, article 20, 2007. Y. H. Yang, S. Dudoit, P. Luu, et al., “ Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation ,” Nucleic Acids Research , vol. 30, no. 4, pp. 1–10, 2002. M. K. Kerr and G. A. Churchill, “ Statistical design and the analysis of gene expression microarray data ,” Genetical Research , vol. 77, no. 2, pp. 123–128, 2001. R. D. Wolfinger, G. Gibson, E. D. Wolfinger, et al., “ Assessing gene significance from cDNA microarray expression data via mixed models ,” Journal of Computational Biology , vol. 8, no. 6, pp. 625–637, 2001. W. Jin, R. M. Riley, R. D. Wolfinger, K. P. White, G. Passador-Gurgell, and G. Gibson, “The contributions of sex, genotype and age to transcriptional variance in Drosophila melanogaster ,” Nature Genetics , vol. 29, no. 4, pp. 389–395, 2001. G. J. M. Rosa, J. P. Steibel, and R. J. Tempelman, “ Reassessing design and analysis of two-colour microarray experiments using mixed effects models ,” Comparative and Functional Genomics , vol. 6, no. 3, pp. 123–131, 2005. J. Zou, S. Rodriguez-Zas, M. Aldea, et al., “Expression profiling soybean response to Pseudomonas syringae reveals new defense-related genes and rapid HR-specific downregulation of photosynthesis,” Molecular Plant-Microbe Interactions , vol. 18, no. 11, pp. 1161–1174, 2005. G. A. Milliken and D. E. Johnson, Analysis of Messy Data, Volume I: Designed Experiments , Wadsworth, Belmont, Calif, USA, 1984. S. R. Searle, G. Casella, and C. E. McCulloch, Variance Components , John Wiley & Sons, New York, NY, USA, 1992. M. Vuylsteke, F. van Eeuwijk, P. Van Hummelen, M. Kuiper, and M. Zabeau, “Genetic analysis of variation in gene expression in Arabidopsis thaliana ,” Genetics , vol. 171, no. 3, pp. 1267–1275, 2005. X. Cui and G. A. Churchill, “How many mice and how many arrays? Replication in mouse cDNA microarray experiments,” in Methods of Microarray Data Analysis III , S. M. Lin and K. F. Johnson, Eds., pp. 139–154, Kluwer Academic Publishers, Norwell, Mass, USA, 2003. H. P. Piepho, A. Büchse, and K. Emrich, “ A hitchhiker's guide to mixed models for randomized experiments ,” Journal of Agronomy and Crop Science , vol. 189, no. 5, pp. 310–322, 2003. H. P. Piepho, A. Büchse, and C. Richter, “ A mixed modelling approach for randomized experiments with repeated measures ,” Journal of Agronomy and Crop Science , vol. 190, no. 4, pp. 230–247, 2004. J. Spilke, H. P. Piepho, and X. Hu, “ Analysis of unbalanced data by mixed linear models using the MIXED procedure of the SAS system ,” Journal of Agronomy and Crop Science , vol. 191, no. 1, pp. 47–54, 2005. D. Nettleton, “ A discussion of statistical methods for design and analysis of microarray experiments for plant scientists ,” The Plant Cell , vol. 18, no. 9, pp. 2112–2121, 2006. X. Cui, J. T. G. Hwang, J. Qiu, N. J. Blades, and G. A. Churchill, “ Improved statistical tests for differential gene expression by shrinking variance components estimates ,” Biostatistics , vol. 6, no. 1, pp. 59–75, 2005. G. K. Smyth, “ Linear models and empirical bayes methods for assessing differential expression in microarray experiments ,” Statistical Applications in Genetics and Molecular Biology , vol. 3, no. 1, article 3, 2004. K. K. Dobbin, E. S. Kawasaki, D. W. Petersen, and R. M. Simon, “ Characterizing dye bias in microarray experiments ,” Bioinformatics , vol. 21, no. 10, pp. 2430–2437, 2005. R. C. Littell, “ Analysis of unbalanced mixed model data: a case study comparison of ANOVA versus REML/GLS ,” Journal of Agricultural, Biological, and Environmental Statistics , vol. 7, no. 4, pp. 472–490, 2002. S. R. Searle, Linear Models for Unbalanced Data , John Wiley & Sons, New York, NY, USA, 1987. F. E. Sattherthwaite, “ An approximate distribution of estimates of variance components ,” Biometrics Bulletin , vol. 2, no. 6, pp. 110–114, 1946. R. C. Littell, G. A. Milliken, W. W. Stroup, R. D. Wolfinger, and O. Schabenberger, SAS for Mixed Models , SAS Institute, Cary, NC, USA, 2nd edition, 2006. J. Spilke, H.-P. Piepho, and X. Hu, “ A simulation study on tests of hypotheses and confidence intervals for fixed effects in mixed models for blocked experiments with missing data ,” Journal of Agricultural, Biological, and Environmental Statistics , vol. 10, no. 3, pp. 374–389, 2005. W. H. Swallow and J. F. Monahan, “ Monte Carlo comparison of ANOVA, MIVQUE, REML, and ML estimators of variance components ,” Technometrics , vol. 26, no. 1, pp. 47–57, 1984. R. N. Kackar and D. A. Harville, “ Approximations for standard errors of estimators of fixed and random effects in mixed linear models ,” Journal of the American Statistical Association , vol. 79, no. 388, pp. 853–862, 1984. M. G. Kenward and J. H. Roger, “ Small sample inference for fixed effects from restricted maximum likelihood ,” Biometrics , vol. 53, no. 3, pp. 983–997, 1997. F. G. Giesbrecht and J. C. Burns, “ Two-stage analysis based on a mixed model: large-sample asymptotic theory and small-sample simulation results ,” Biometrics , vol. 41, no. 2, pp. 477–486, 1985. A. H.-T. Fai and P. L. Cornelius, “ Approximate F-tests of multiple degree of freedom hypotheses in generalized least squares analyses of unbalanced split-plot experiments ,” Journal of Statistical Computation and Simulation , vol. 54, no. 4, pp. 363–378, 1996. W. W. Stroup and R. C. Littell, “Impact of variance component estimates on fixed effect inference in unbalanced linear mixed models,” in Proceedings of the 14th Annual Kansas State University Conference on Applied Statistics in Agriculture , pp. 32–48, Manhattan, Kan, USA, April 2002. J. D. Storey and R. Tibshirani, “ Statistical significance for genomewide studies ,” Proceedings of the National Academy of Sciences of the United States of America , vol. 100, no. 16, pp. 9440–9445, 2003. A. S. Hedayat, J. Stufken, and M. Yang, “ Optimal and efficient crossover designs when subject effects are random ,” Journal of the American Statistical Association , vol. 101, no. 475, pp. 1031–1038, 2006. J. P. Steibel, Improving experimental design and inference for transcription profiling experiments , thesis, Department of Animal Science, Michigan State University, East Lansing, Mich, USA, 2007. D. B. Allison, X. Cui, G. P. Page, and M. Sabripour, “ Microarray data analysis: from disarray to consolidation and consensus ,” Nature Reviews Genetics , vol. 7, no. 1, pp. 55–65, 2006. I. Lönnstedt and T. Speed, “Replicated microarray data,” Statistica Sinica , vol. 12, no. 1, pp. 31–46, 2002. G. K. Robinson, “ That BLUP is a good thing: the estimation of random effects ,” Statistical Science , vol. 6, no. 1, pp. 15–51, 1991. P. Baldi and A. D. Long, “A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes,” Bioinformatics , vol. 17, no. 6, pp. 509–519, 2001. V. G. Tusher, R. Tibshirani, and G. Chu, “ Significance analysis of microarrays applied to the ionizing radiation response ,” Proceedings of the National Academy of Sciences of the United States of America , vol. 98, no. 9, pp. 5116–5121, 2001. G. W. Wright and R. M. Simon, “ A random variance model for detection of differential gene expression in small microarray experiments ,” Bioinformatics , vol. 19, no. 18, pp. 2448–2455, 2003. G. K. Smyth, J. Michaud, and H. S. Scott, “ Use of within-array replicate spots for assessing differential expression in microarray experiments ,” Bioinformatics , vol. 21, no. 9, pp. 2067–2075, 2005. G. Gibson and R. D. Wolfinger, “Gene expression profiling using mixed models,” in Genetic Analysis of Complex Traits Using SAS , A. M. Saxton, Ed., pp. 251–279, SAS Users Press, Cary, NC, USA, 2004. //

Journal

International Journal of Plant GenomicsHindawi Publishing Corporation

Published: Jun 18, 2008

There are no references for this article.