Access the full text.
Sign up today, get DeepDyve free for 14 days.
Hindawi Publishing Corporation International Journal of Plant Genomics Volume 2008, Article ID 817210, 4 pages doi:10.1155/2008/817210 Research Article An Empirical Bayesian Method for Detecting Differentially Expressed Genes Using EST Data Na You, Junmei Liu, and Chang Xuan Mao Department of Statistics, University of California, Riverside, CA 92521, USA Correspondence should be addressed to Chang Xuan Mao, cmao@stat.ucr.edu Received 18 September 2007; Accepted 22 December 2007 Recommended by Xinping Cui Detection of diﬀerentially expressed genes from expressed sequence tags (ESTs) data has received much attention. An empirical Bayesian method is introduced in which gene expression patterns are estimated and used to deﬁne detection statistics. Signiﬁcantly diﬀerentially expressed genes can be declared given detection statistics. Simulation is done to evaluate the performance of proposed method. Two real applications are studied. Copyright © 2008 Na You et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. 1. INTRODUCTION 2. RESULTS Let (π , π ,... , π )and (π , π ,... , π ) be the gene ex- It is important to detect diﬀerentially expressed genes, for ex- 11 12 1c 21 22 2c ample, exploring the key genes related to certain diseases. As pression patterns in two libraries, where π is the relative ji abundance of gene i in library j . The absolute diﬀerence be- the EST sequencing technology develops, a large number of tween relative abundances is D =|π − π |.Given asample EST databases from a variety of tissues are available. Enor- i 1i 2i of ESTs from library j , an empirical Bayes estimator π for mous EST collections provide opportunities to quantify gene ji π is deﬁned in Section 3.Given gene i seen in both samples, expression levels [1]. Eﬃcient statistical methods are in great ji demand. deﬁne D =|π − π |.Given gene i seen in only one sample, i 1i 2i Several methods have been proposed to detect signiﬁ- for example, sample 2, deﬁne D =|π − π | if π < π and i 1i 2i 1i 2i cantly diﬀerentially expressed (SDE) genes from EST data D = 0 otherwise, which is conservative in the sense that D i i [2]. Fisher’s exact test was used by the Cancer Genome possibly underestimates D .Gene i is declared to be SDE if D i i Anatomy Project [3]. Audic and Claverie [4] developed a is relatively large. Bayesian method. GT statistic [5] and R statistic [6]were proposed for multilibrary comparison. In each method, 2.1. Simulation gene-speciﬁc detection statistics quantify diﬀerences of gene expression levels and SDE genes are declared by their rank- In a simulation experiment, EST frequencies are generated ings. from a multinomial distribution with sample size s and An empirical Bayesian method is proposed to detect SDE probability vector (π , π ,... , π ), where c = 1000, π = j 1 j 2 jc ji genes. The relative gene expression abundances are estimated λ / λ ,(λ , λ ,... , λ )from G ,(λ , λ ,... , λ ) ji jk 11 12 1c 1 21 22 2c k=1 in each library, and a new detection statistic is derived for from G ,and G and G are two distributions over (0,∞). 2 1 2 each gene. In Section 2, simulation experiments suggest that The proposed methods, Fihser’s exact test, χ test, AC statis- the proposed method outperforms those existing methods. tic, and R statistic, are studied. Given a cutoﬀ point τ , the eﬃ- Real applications are also studied in Section 2. Statistical ciency of a statistical method is measured by p , the expected methods are described in Section 3. The possibility of ex- percentage of the true ﬁrst τ SDE genes being correctly de- tending the method for multiple libraries is indicated in clared as the ﬁrst τ SDE genes. The average of estimated p is Section 4. calculated from 500 replications. 2 International Journal of Plant Genomics 0.9 0.9 0.7 0.7 0.5 0.5 0.3 0.3 20 40 60 80 100 20 40 60 80 100 τ τ (a) (b) 0.9 0.9 0.7 0.7 0.5 0.5 0.3 0.3 20 40 60 80 100 20 40 60 80 100 τ τ (c) (d) Figure 1: Simulation results of Fisher’s exact test (◦), χ test (Δ), AC statistic (+), R statistic (×), and the proposed statistic (•) in detecting SDE genes using two EST samples of the same size. In the ﬁrst four experiments, s = s = 2000 and the loaded at http://www.tigr.org/tdb/tgi, 01/06/2006). In each 1 2 results are presented in Figure 1. Note that G = U (0, 10), EST sample, there are totally 790 and 1306 sequenced ESTs, Beta(2, 5), 0.2δ(2) + 0.4δ(5) + 0.2δ(10), Gamma(3, 0.1) and respectively. After removing the unannotated 103 and 194 G = Beta(2, 1), Beta(2, 5), Beta(2, 2), Beta(2, 2), respec- ESTs, the annotated ESTs are clustered into 465 and 804 tively, where U (a, b) is the uniform distribution on (a, b), groups with each group associated with a unique gene. Only δ(a) is degenerate at a,Beta(a, b) is transformed from the those well-annotated ESTs are used. The ﬁrst 20 SDE genes beta distribution with shape parameters a and b by λ = by the proposed method are listed in Table 1, among which p/(1 − p)for p ∈ (0, 1), and Gamma(a, b) is the gamma 7, 7, 7, and 7 genes are in the set of ﬁrst 20 SDE genes by distribution with shape a and scale b.For each cutoﬀ point Fisher’s exact test, χ test, AC statistic, and R statistic, respec- τ = 10, 20,... , 100, p are calculated. Clearly the proposed tively. method has better performance than others. Another example concerns pinus gene expression level In the second four experiments, (s , s ) = (2000, 4000), comparison in root gravitropism April 2003 test library 1 2 (4000, 2000), (2000, 4000), and (4000, 2000), respectively, (#FH3) and root control 2 (late) library (#FH4), also from and the results are presented in Figure 2. Note that G = TIGR, in which 2513 and 1132 ESTs associated with 1211 and Gamma(3, 0.1) and G = Beta(2, 2) in Figures 2(a) and 2(b) 605 genes are well annotated and clustered. Table 2 lists the and G = U (0, 10) and G = Beta(2, 1) in Figures 2(c) and ﬁrst 20 SDE genes by the proposed method, among which 4, 1 2 2(d). The proposed method is usually the best one among all 4, 5, and 3 genes are in the set of the ﬁrst 20 SDE genes by methods studied. Fisher’s exact test, χ test, AC statistic, and R statistic, respec- tively. 2.2. Real applications 3. METHODS One example concerns Chinese spring wheat drought stressed leaf cDNA library (7235) and root cDNA library Suppose that there are c genes in a library. Let x be the num- (#ASP), available at TIGR gene indexes database (down- ber of ESTs from gene i, a Poisson variable with mean λ . τ Na You et al. 3 0.9 0.9 0.7 0.7 0.5 0.5 0.3 0.3 20 40 60 80 100 20 40 60 80 100 τ τ (a) (b) 0.9 0.9 0.7 0.7 0.5 0.5 0.3 0.3 20 40 60 80 100 20 40 60 80 100 τ τ (c) (d) Figure 2: Simulation results of Fisher’s exact test (◦), χ test (Δ), AC statistic (+), R statistic (×), and the proposed statistic (•) in detecting SDE genes using two EST samples of diﬀerent sizes. Given a prior distribution G on the λ , the posterior mean whose calculation is discussed in [7]. It is diﬃcult to estimate of λ is E(λ | x ) = (x +1)h (x +1)/h (x ), where h (x) = θ(Q)well[8]. There are lower bound estimators, for exam- i i i i G i G i G x −λ λ /x!e dG(λ) is a Poisson mixture. A gene is observed if ple, θ(Q) = n (n − 1)/{2n(n +1)} [9], where n = n is 1 1 2 x≥1 x and only if x ≥ 1. Conditioning on x ≥ 1, x follows a zero- i i i the number of observed expressed genes. An empirical Bayes truncatedPoissonmixture h (x)/(1 − h (0)) or a mixture G G estimator for λ is f (x) of truncated Poisson, where f (1) x ⎪ ⎪ Q h (x) λ ⎪ , x = 0, f (x) = = dQ(λ), θ(Q) 1 − h (0) x! e − 1 λ = E(λ | x ) = (3) i i i (1) ⎪ ⎪ x +1 f x +1 i i ⎪ Q −λ 1 − e dG(λ) ⎪ , x ≥ 1. dQ(λ) = . f (x ) −η Q 1 − e dG(η) Let θ(Q) = h (0)/(1 − h (0)) be the odds that a gene is un- G G As the relative abundance π satisﬁes π = λ / λ ,let π = i i i k=1 k i seen. Write E(λ | x ) = f (1)/θ(Q)if x = 0and E(λ | x ) = i i Q i i i λ /s,where (x +1) f (x +1)/f (x ) otherwise. i Q i Q i Let n denote the number of genes with exactly x ESTs in the sample. The nonparametric maximum likelihood esti- n (x +1) f (x +1) mator Q for Q is x≥1 s = λ = nf (1) + , f (x) k=1 (4) Q = argmax n log f (x), (2) x Q c= n 1+ θ(Q) . x≥1 τ 4 International Journal of Plant Genomics Table 1: The ﬁrst 20 SDE genes in wheat leaf and root libraries by 4. DISCUSSION theproposedmethod(x -the EST number of gene i from leaf li- 1i A new statistical method is proposed to compare the gene ex- brary, x -that from root library, 0/1-absence/presence in the set of 2i the ﬁrst 20 SDE genes). pression patterns in two cDNA libraries. It can be extended to multilibrary comparison, for example, considering all pair- Gene x x 1000D Fisher χ AC R 1i 2i i wise comparisons among multiple libraries [3]. TC24953 19 0 27.10 1 1 1 1 TC23443 8 2 7.88 1 1 1 1 REFERENCES TC23215 1 8 4.87 0 0 0 0 [1] S. Mekhedov, O. M. de Ila ´rduya,and J. Ohlrogge,“Towardsa TC26419 1 8 4.87 0 0 0 0 functional catalog of the plant genome. A survey of genes for TC26431 1 8 4.87 0 0 0 0 lipid biosynthesis,” Plant Physiology, vol. 122, no. 2, pp. 389– TC24980 5 0 3.40 1 1 1 1 402, 2000. TC23786 0 6 2.62 0 0 0 0 [2] C. Romualdi, S. Bortoluzzi, and G. A. Danieli, “Detecting diﬀer- TC26436 0 6 2.62 0 0 0 0 entially expressed genes in multiple tag sampling experiments: comparative evaluation of statistical tests,” Human Molecular TC24819 0 6 2.62 0 0 0 0 Genetics, vol. 10, no. 19, pp. 2133–2141, 2001. TC26455 7 12 1.85 0 0 0 0 [3] C. O’Brien, “Cancer genome anatomy project launched,” TC23314 1 5 1.59 0 0 0 0 Molecular Medicine Today, vol. 3, no. 3, p. 94, 1997. TC24981 1 5 1.59 0 0 0 0 [4] S. Audic and J.-M. Claverie, “The signiﬁcance of digital gene TC24795 0 5 1.57 0 0 0 0 expression proﬁles,” Genome Research, vol. 7, no. 10, pp. 986– TC24804 0 5 1.57 0 0 0 0 995, 1997. [5] L. D. Greller and F. L. Tobin, “Detecting selective expression of TC26553 0 5 1.57 0 0 0 0 genes and proteins,” Genome Research, vol. 9, no. 3, pp. 282– TC26356 4 1 1.37 0 0 0 0 296, 1999. TC23560 4 0 1.37 1 1 1 1 [6] D. J. Stekel, Y. Git, and F. Falciani, “The comparison of gene TC24669 4 0 1.37 1 1 1 1 expression from multiple cDNA libraries,” Genome Research, TC24679 4 0 1.37 1 1 1 1 vol. 10, no. 12, pp. 2055–2061, 2000. [7] C. X. Mao, “Inference of the number of species geometric lower TC26379 4 0 1.37 1 1 1 1 bounds,” Journal of American Statistical Association, vol. 101, no. 476, pp. 1663–1670, 2006. [8] C. X. Mao and B. G. Lindsay, “Estimating the number of classes,” Annals of Statistics, vol. 35, no. 2, pp. 917–930, 2007. Table 2: The ﬁrst 20 SDE genes in #FH3 and #FH4 by the proposed [9] A. Chao, “Nonparametric estimation of the number of classes method (x -the EST number of gene i from #FH3, x -that from 1i 2i in a population,” Scandinavian Journal of Statistics, vol. 11, #FH4, 0/1-absence/presence in the set of the ﬁrst 20 SDE genes). no. 4, pp. 265–270, 1984. Gene x x 1000D Fisher χ AC R 1i 2i i TC40351 4 9 7.62 1 1 1 1 TC40355 6 10 6.25 1 1 1 0 TC51779 19 2 5.12 0 0 1 0 TC40566 7 7 4.03 0 0 0 0 TC51682 7 7 4.03 0 0 0 0 TC46290 14 2 3.79 0 0 0 0 TC46372 15 5 3.70 0 0 0 0 TC40768 13 3 3.40 0 0 0 0 TC40912 13 4 3.36 0 0 0 0 TC51995 12 3 2.94 0 0 0 0 TC40420 12 5 2.56 0 0 0 0 TC51708 18 12 2.44 0 0 0 0 TC40405 11 4 2.43 0 0 0 0 TC40361 0 6 2.34 1 1 1 1 TC46426 0 6 2.34 1 1 1 1 TC46276 19 12 2.12 0 0 0 0 TC40388 9 1 1.82 0 0 0 0 TC40647 9 2 1.82 0 0 0 0 TC40350 9 3 1.81 0 0 0 0 TC40731 8 2 1.63 0 0 0 0 International Journal of Peptides Advances in International Journal of BioMed Stem Cells Virolog y Research International International Genomics Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 Journal of Nucleic Acids International Journal of Zoology Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 Submit your manuscripts at http://www.hindawi.com The Scientific Journal of Signal Transduction World Journal Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 International Journal of Advances in Genetics Anatomy Biochemistry Research International Research International Microbiology Research International Bioinformatics Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 Enzyme Journal of International Journal of Molecular Biology Archaea Research Evolutionary Biology International Marine Biology Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014
International Journal of Plant Genomics – Hindawi Publishing Corporation
Published: Mar 13, 2008
You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.