Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Predictive performances of animal models using different multibreed relationship matrices in systems with rotational crossbreeding

Predictive performances of animal models using different multibreed relationship matrices in... Background: In livestock breeding, selection for some traits can be improved with direct selection for crossbred performance. However, genetic analyses with phenotypes from crossbred animals require methods for multibreed relationship matrices; especially when some animals are rotationally crossbred. Multiple methods for multibreed relationship matrices exist, but there is a lack of knowledge on how these methods compare for prediction of breed- ing values with phenotypes from rotationally crossbred animals. Therefore, the objective of this study was to compare models that use different multibreed relationship matrices in terms of ability to predict accurate and unbiased breed- ing values with phenotypes from two-way rotationally crossbred animals. Methods: We compared four methods for multibreed relationship matrices: numerator relationship matrices (NRM), García-Cortés and Toro’s partial relationship matrices (GT ), Strandén and Mäntysaari’s approximation to the GT method (SM), and one NRM with metafounders (MF). The methods were compared using simulated data. We simulated two phenotypes; one with and one without dominance effects. Only crossbred animals were phenotyped and only pure - bred animals were genotyped. Results: The MF and GT methods were the most accurate and least biased methods for prediction of breeding values in rotationally crossbred animals. Without genomic information, all methods were almost equally accurate for prediction of breeding values in purebred animals; however, with genomic information, the MF and GT methods were the most accurate. The GT, MF, and SM methods were the least biased methods for prediction of breeding values in purebred animals. Conclusions: For prediction of breeding values with phenotypes from rotationally crossbred animals, models using the MF method or the GT method were generally more accurate and less biased than models using the SM method or the NRM method. Crossbred performance is often indirectly selected Background for through selection for purebred performance. This Several livestock production systems use crossbred ani- is valid if the genetic correlation between the crossbred mals at the commercial level. In these systems, the phe- and purebred performances is strong [1]. However, the notypic performance of crossbred animals should be the genetic correlation between crossbred and purebred per- primary objective of the breeding goal. formances is only moderate for many traits [2]. For such traits, it may be a solution to directly select for crossbred *Correspondence: bgp@lf.dk performance. Breeding & Genetics, Danish Agriculture and Food Council, Axelborg, Multiple crossbreeding procedures exist [3]. The most Axeltorv 3, Copenhagen W, 1609 Copenhagen, Denmark notable procedures for modern pork and beef systems Full list of author information is available at the end of the article © The Author(s) 2022. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/. The Creative Commons Public Domain Dedication waiver (http:// creat iveco mmons. org/ publi cdoma in/ zero/1. 0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. Poulsen et al. Genetics Selection Evolution (2022) 54:25 Page 2 of 17 are the two-way terminal, three-way terminal, and two- The objective of this study was to compare methods for way rotational crossbreeding procedures. Among these relationship matrices in terms of ability to predict accu- crossbreeding systems, genetic analysis is most compli- rate and unbiased breeding values with phenotypes from cated with phenotypes from rotationally crossbred ani- rotationally crossbred animals. We compared the NRM mals [4–6]. Nevertheless, rotationally crossbred animals as used by Poulsen et  al. [15], the partial relationship comprise a possible source of both additional and novel matrices by García-Cortés and Toro [9], the approximate phenotypes. Therefore, in the following, we will focus on partial relationship matrices by Strandén and Mäntysaari genetic analyses with phenotypes from rotationally cross- [12], and the relationship matrix with metafounders by bred animals. Legarra et al. [14]. Phenotypes from rotationally crossbred animals are We hypothesized that the methods by García-Cortés often subject to more variable genetic effects than phe - and Toro [9] and Legarra et al. [14] were the most accu- notypes from purebred animals and F1 animals [4, 7]. rate and least biased methods because they are the only Mating animals from different populations often leads to methods which fully comply with the theory [4]. offspring with a high degree of heterozygosity. For domi - nance effects, the increase in heterozygosity results in a Methods favorable change in the phenotypic mean and increased The prediction accuracies and prediction biases of the dominance variance in subsequent generations [7]. For models with different relationship matrices were investi - additive genetic effects, the increase in heterozygosity gated through a simulation study. The simulation design increases the additive genetic variance in following gen- represents a two-way-rotational crossbreeding system erations [4]. All the aforementioned changes are relative [3]. In this section, we first present how the populations to the average of the genetic parameters in the consti- were simulated. This includes the description of their tuting purebred populations. Animal breeding focuses population structure, genomic architecture, genetic mainly on additive genetic effects, which are modelled effects, and phenotypes. Then, we present how we pre - using additive genetic relationship matrices. Since the dicted breeding values with phenotypes from rotationally usual numerator relationship matrix (NRM) [8] can not crossbred animals using statistical models with different correctly model additive genetic effects in rotationally relationship matrices. Lastly, we present how we evalu- crossbred animals [4], specialized additive relationship ated and compared the statistical models with different matrices are needed. relationship matrices. In the following, we refer to the Specialized additive genetic relationship matrices for statistical models with different relationship matrices as crossbred animals exist [4, 9–11]. These relationship methods. matrices decompose the additive genetic relationships into a breed-specific term for each breed and a segrega - Simulation tion term for each pair of breeds. The partial relation - General ship matrices for the breed-specific terms are analogous A two-way-rotational crossbreeding system and genomic to NRM-based matrices and they refer to the additive architecture were simulated with the QMSim software genetic variances in the purebred base populations. [16]. For all populations, generations did not overlap, Meanwhile, the partial relationship matrices for the the numbers of males and females were equal, sires and segregation terms model the increased additive genetic dams were chosen at random (no selection), mating was variances in crossbred animals. Both types of partial rela- random and sampled without replacement, and the litter tionship matrices have later been approximated [12] and size was 6. We simulated 100 replicates. The population the theory for the partial relationship matrices for breed- structures are shown in Fig. 1. specific terms has been extended to incorporate genomic information [13]. The additive relationship matrix with Historical population metafounders was proposed by Legarra et  al. [14], and The first generation in the historical population consisted it is an alternative to the partial relationship matrices of 3000 animals. The population size was constant for mentioned above. In theory, the relationship matrix with 1000 generations, and over the following 200 generations metafounders simultaneously models both breed-specific the population size decreased linearly to 2800 animals at and segregation terms with one additive genetic effect the end. [14]. There is a need to investigate how models with these relationship matrices compare for prediction of accu- Purebred populations rate and unbiased breeding values with phenotypes from We created two purebred populations. Each purebred rotationally crossbred animals. population was founded by 25 males and 25 females from the last generation in the historical population. The P oulsen et al. Genetics Selection Evolution (2022) 54:25 Page 3 of 17 Fig. 1 General population structures. Colors: Types of information made available for prediction. Grey: No information. Blue: Pedigree information. Red: Pedigree information and phenotypes. Green: Pedigree information and genomic information sampling of founders was random and independent for when marker genotypes were pooled across the pure- the two purebred populations. The purebred populations bred populations. Similarly, 1223 of the 1750 QTL segre- were kept separate for 39 generations. At each genera- gated in generation 32 of either purebred population and tion, 25 randomly selected sires were mated with 25 ran- 1394 QTL segregated when QTL genotypes were pooled domly selected dams; i.e., the effective population sizes across the purebred populations. were approximately 50 [17], and not all animals produced offspring. In the following, the two purebred populations Genetic effects are referred to as Population A and Population B. We simulated both additive and dominant QTL effects. Additive and dominant QTL effects were identical across Crossbred population populations. The first crossbred generation was founded by mating The additive genetic animal effects were solely based on 75 males from Population A and 75 females from Popu- additive QTL effects. The absolute additive QTL effects lation B. These animals were drawn from generation 32 were drawn from a gamma-distribution with the stand- of their respective population. The first generation in ard parameters in QMSim [16]. Additive QTL effects the crossbred population is referred to as generation 33. were scaled by QMSim such that the additive genetic ani- For generations 34 to 39, crossbred animals were created mal variance was 0.2 after the historical population [16]. by mating 75 males from one of the purebred popula- The dominant QTL effects, were simulated as described tions with 150 females from the crossbred population; by Wellmann and Bennewitz [18]: i.e., for these generations, each purebred sire was mated d = h ◦ |β − β |, (1) 1 2 with two crossbred dams. Sires were from Population A in odd-numbered generations and Population B in even- where d is a vector of dominant QTL effects; numbered generations. In the following, the crossbred 1 1 h ∼ N ( 1, I) is a vector of dominance degrees; ◦ is the 2 10 population is referred to as Population C. Hadamard product; β is a vector of additive QTL effects of the first QTL-allele; and β is a vector of additive QTL Genomic architecture effects of the second QTL-allele. Dominant genetic ani - The genome consisted of five 100-cM chromosomes. mal effects, d, were calculated as the sum of dominant Each chromosome contained 3500 markers and 350 QTL effects where the animal was heterozygous. Domi - quantitative trait loci (QTL). Marker positions, QTL nant genetic animal effects were scaled such that the positions, and allele frequencies were randomly and uni- dominant genetic animal variance was 0.1 in Population formly distributed. Marker and QTL genotypes were ini- C. On average, 8% of the loci showed overdominance; tialized in the first generation of the historical population. 45% showed partial dominance that was greater than On average, 12,104 of the 17,500 markers segregated half the allele substitution effect; and 46% showed partial with a minor allele frequency (MAF) higher than 0.01 in dominance that was less than half the allele substitution generation 32 of either purebred population. Meanwhile, effect. 13,769 markers segregated with a MAF higher than 0.01 Poulsen et al. Genetics Selection Evolution (2022) 54:25 Page 4 of 17 Phenotypes 1 1 + a , i = j sd a = ij (2) We defined two phenotypes: a phenotype without domi - 1 (a + a ), otherwise, is id nance effects, y = a + e , and a phenotype with domi- nance effects, y = a + d + e , where a is a vector of AD where i and j denote animals, a is the pedigree-based ij additive genetic animal effects, d is a vector of domi - covariance between the additive genetic effects of ani - nant genetic animal effects, e is a vector of environmen - mals i and j, s is the sire of j , and d is the dam of j [8]. tal effects, and e ∼ N (0, 0.8I) . Note that y and y A AD The pedigrees for the two relationship matrices were have different narrow-sense heritabilities: h = 0.2 and different. The pedigree for Population A included animals h = 0.2/1.1. from both Population A and Population C, and the pedi- AD gree for Population B included animals from both Pop- Information used for prediction ulation B and Population C. To create the pedigree for We used the information such as to represent a system Population A, animals from Population B were removed where only crossbred animals were phenotyped and only from the pedigree and vice versa. the purebred animals were genotyped (Fig.  1). It is com- We used two genomic relationship matrices for the NRM NRM mon practice to not genotype crossbred animals because NRM method; G and G . Preliminary genomic A B VanRaden VanRaden it is more important to genotype selection candidates relationship matrices, G and G , were A B than phenotyped animals [19]. calculated using VanRaden’s first method [22], geno - More specifically, pedigree information was kept only types from purebred animals in generations 35 to 39, for animals born in generations 32 through 39. Animals and marker allele frequencies in the respective pure- VanRaden born before generation 32 were regarded as unknown. bred base-populations. When calculating G VanRaden Marker information was only available for purebred ani- and G , a marker was included if its minor mals born in generations 35 through 39. Phenotypes were allele frequency was higher than 0.01 in its respec- only available for crossbred animals born in generations tive purebred base-population. The positive definite - 33 through 39. ness of genomic relationship matrices was ensured by using the weighted average between VanRaden’s Prediction first method and the sub-matrix of genotyped ani - General mals from its respective pedigree-based relationship NRM NRM VanRaden We compared four methods for multibreed relationship matrix: G = 0.05{A } + 0.95G , where X X X NRM matrices, i.e., the NRM [8]; García-Cortés and Toro (GT) X ∈{A, B} denotes the population and {A } is the [9]; Strandén and Mäntysaari (SM) [12]; and Legarra et al. sub-matrix of genotyped animals from the respective (MF) [14]. All four methods can be extended to include pedigree-based relationship matrix. The genomic rela - NRM NRM genomic information using the single-step procedure tionship matrices, G and G , were scaled and A B [13, 20, 21]. For each method, we describe the theory, centered such that their average diagonal and off-diago - pedigree(s), incorporation of genomic information, the nal elements were equal to those of the sub-matrices of statistical model, and calculation of predicted breeding genotyped animals from their respective pedigree-based values. In Appendix 1, each method is showcased with a relationship matrices. We calculated combined relation- small example pedigree. We highly recommend readers ship matrices for genotyped and non-genotyped animals who are unfamiliar with the methods to view Appendix 1 [20, 21] because some animals were not genotyped. The after reading their respective sections in Methods. combined relationship matrices for genotyped and non- NRM NRM genotyped animals were H and H for animals A B The NRM method with genetic contributions from Populations A and B, This method can be used for multibreed analyses in mul - respectively. tiple ways. We use the NRM method such that we have The statistical model for the NRM method was: one relationship matrix per breed. This allows us to par - y = Xb + Z a + Z a + e, A A B B (3) tition the breeding values of crossbred animals into one term per purebred population. Furthermore, it allows where y is a vector of phenotypes; b is a vector of param- the additive genetic variances to differ between purebred eters for the general mean, pedigree-derived breed pro- populations. In this study, the NRM method required portion, and pedigree-derived heterosis; a is a vector of two relationship matrices; one for terms contributed additive genetic effects from Population A; a is a vector from Population A, and one for terms contributed from of additive genetic effects from Population B; e is a vector population B. of residuals; and X , Z , and Z are design matrices. A B The recursive algorithm for each of the NRM matrices The three vectors with random effects (a , a , and e ) A B is: were assumed to be distributed as: P oulsen et al. Genetics Selection Evolution (2022) 54:25 Page 5 of 17       NRM 2 A B A B 1 A σ 2 f f + f f + a , i = j a 0 sd A A A s s d d 2 a = ij (7) NRM 2       a ∼ N 0 , 0A σ , (a + a ), otherwise, is id B A e 0 0 0Iσ where i , j , s , and d are as for the algorithm for the NRM (4) method (Eq.  2); a is the pedigree-based covariance ij NRM where A is a relationship matrix for additive genetic between the additive genetic segregation effects of ani - NRM effects from Population A; A is a relationship matrix A B mals i and j; f and f are the proportions of genetic 2 s s for additive genetic effects from Population B; σ is the material from Population A and Population B, respec- additive genetic variance in Population A; σ is the addi- A B tively, in the sire of animal j ; and f and f are the d d tive genetic variance in Population B; 0s are vectors or proportions of genetic material from Population A and matrices of zeros; I is an identity matrix; and σ is the Population B, respectively, in the dam of animal j . B oth residual variance. For prediction with genomic infor- GT diagonal and off-diagonal elements in A can only be NRM NRM NRM AB mation, A was replaced with H and A was A A B non-zero for descendants of crossbred animals. NRM replaced with H . The pedigree for the GT method included all the ani - The vector of predicted breeding values for the NRM mals, purebred and crossbred, in generations 32 through method was:     We used two genomic relationship matrices for the GT {ˆa } 0 A P GT GT NRM     method; G and G . Generally, the single-step proce- ebv = 0 + {ˆa } , B P (5) A B dure for the GT method requires that marker alleles are {ˆa } {ˆa } A C B C phased and traced such that their breed of origin can where a ˆ and a ˆ are the vectors of predicted addi- be determined [13]; however, tracing the breed of ori- A B tive genetic effects in the statistical model for the NRM gin of alleles was not required in this study because we method (Eq.  4); subscript P denotes that the sub-vector only used genotypes from purebred animals. Therefore, GT GT only contains predicted effects from purebred animals; G and G were the same as the genomic relationship A B GT NRM subscript C denotes that the sub-vector only contains matrices for the NRM method; i.e., G = G and A A GT NRM predicted effects from crossbred animals; and 0s are vec - G = G . The single-step procedure [20, 21] was B B tors of zeros. used for the partial relationship matrices for breed-spe- cific terms. The combined partial relationship matrices The GT method for breed-specific terms for genotyped and non-gen - GT GT This method partitions the additive genetic relationship otyped animals were H and H for animals with A B into several partial relationship matrices [9]: one for each genetic contributions from Populations A and B, respec- breed (partial relationship matrices for breed-specific tively. The partial relationship matrix for the segregation GT GT terms; A and A in our study), and one for each pair term did not include genomic information. A B of breeds (partial relationship matrices for segregation The statistical model for the GT method was: GT terms; A in our study). The partial relationship matrix AB y = Xb + Z a + Z a + Z a + e, A A B B AB AB (8) for segregation terms captures the increase in additive genetic variance in crossbred animals [4, 9]. where y, b, and X are as described for the statistical GT The recursive algorithm for calculating A is [9]: model for the NRM method (Eq.  4); a is a vector of breed-specific partial additive genetic effects from Popu - f + a , i = j sd a = (6) lation A; a is a vector of breed-specific partial additive ij B (a + a ), otherwise, is id genetic effects from Population B; a is a vector of addi- AB tive genetic segregation effects between Populations A where i , j , s , and d are as described for the algorithm and B; e is a vector of residuals; and Z , Z , and Z are A B AB for the NRM method (Eq.  2); a is the pedigree-based ij design matrices. covariance between the breed-specific partial additive A The four vectors of random effects (a , a , a and e ) A B AB genetic effects of animals i and j; and f is the propor- were assumed to be distributed as: tion of genetic material from Population A in animal i . GT    The sub-matrix of A for purebred animals is identical     GT 2 A σ a 0 A A NRM A A to its analogous sub-matrix of A . GT 2    0A σ  a  0 B A GT   B    ∼ N   , , The recursive algorithm for A is:   GT 2  a 0 AB AB 0 0A σ AB A AB e 0 2 0 0 0Iσ (9) Poulsen et al. Genetics Selection Evolution (2022) 54:25 Page 6 of 17 GT NRM NRM where A is a partial relationship matrix for the are identical to submatrices from A and A for A A B GT breed-specific term from Population A; A is a par- purebreds animals, respectively. tial relationship matrix for the breed-specific term from The SM method is equivalent to random- GT Population B; A is a partial relationship matrix for the regressions of additive genetic effects on AB segregation term between Populations A and B; σ is F , F , and F , respectively [12], because A B AB 2 NRM T NRM T the additive genetic variance in Population A; σ is the F a ∼ N 0, F A F , F a ∼ N 0, F A F , A A A B B B A A A B B 2 NRM T additive genetic variance in Population B; σ is the seg- and F a ∼ N 0, F A F . In this study, we AB AB AB A AB AB AB regation variance between Populations A and B; 0s are apply the SM method through random-regression. vectors or matrices of zeros; I is an identity matrix; and Three pedigrees were constructed for the SM method: σ is the residual variance. For prediction with genomic one for each purebred population, which are identical to GT GT GT information, A was replaced with H and A was those for the NRM method, and the third is for the par- A A B GT replaced with H . tial relationship matrix for the segregation term between The vector of predicted breeding values for the GT Populations A and B. The partial relationship matrix for method was: the segregation term between Populations A and B was calculated with a pedigree from which all purebred and       {ˆa } 0 0 A P F1 animals had been removed. 0 {ˆa } 0       GT B P ebv = + + , We did not use the same pedigree for segregation       (10) {ˆa } {ˆa } 0 A C:F 1 B C:F 1 effects as described by the SM method [12]. They used {ˆa } {ˆa } aˆ A C:R B C:R AB the full pedigree to construct an additive genetic relation- where a ˆ , a ˆ , a ˆ are the vectors of predicted partial ship matrix on which they applied random regression. A B AB additive genetic effects in the statistical model for the GT However, using the full pedigree may promote discre- method (Eq.  8); subscript P denotes that the sub-vector prancies between the GT and SM methods. According only contains the predicted effects from purebred ani - to the GT method, segregation effects are independent mals; subscript C:F1 denotes that the sub-vector only among all offspring from F1 animals and their magni - contains the predicted effects from F1 crossbred animals; tude only depend on the breed proportions of parental subscript C:R denotes that the sub-vector only contains animals. For the SM method, a deep pedigree for seg- the predicted effects from rotationally crossbred animals; regation effects would increase the likelihood of both and 0s are vectors of zeros. non-zero inbreeding coefficients in offspring from F1 animals and covariance between offspring from F1 ani - The SM method mals. Therefore, the compliance between the GT and SM This method is an approximation of the GT method and methods should be greater if purebred and F1 animals it partitions the additive genetic variance in the same are removed from the pedigree for segregation effects, as way. done in this study. The relationship matrices for the SM method are cal - The genomic relationship matrices for this method culated as: were the same as for both the NRM and GT methods. As for the GT method, we calculated combined relationship GT SM NRM A ≈A = F A F , (11) A A A A A matrices for genotyped and non-genotyped animals for the breed-specific terms but not for the segregation term. GT SM NRM The statistical model for the SM method was: A ≈A = F A F , B B (12) B B B y = Xb + Z F a + Z F a + Z F a + e, A A A B B B AB AB AB GT SM NRM (14) A ≈A = F A F , (13) AB AB AB AB AB where F , F , and, F are as defined for the calculation A B AB where F and F are diagonal matrices with square roots A B of partial relationship matrices with the SM method of breed proportions for populations A and B, respec- (Eqs.  11, 12, 13); and the remaining components are NRM tively; A is a NRM-based relationship matrix repre- AB the same as in the statistical model for the GT method senting at least all the descendants of crossbred animals; (Eq. 8). Note that the additive genetic vectors now consist and F is a diagonal matrix with square roots of the AB of regression coefficients. A B A B “ 2 f f + f f ” term from Eq. 7. As for the GT method, s s d d The four vectors of random effects (a , a , a and e ) A B AB SM SM the sub-matrices of A and A for purebred animals A B were assumed to be distributed as: P oulsen et al. Genetics Selection Evolution (2022) 54:25 Page 7 of 17        NRM 2 A σ a 0 A A A A NRM 2     a  0 0A σ   B A  B (15) ∼ N , ,     NRM 2    a 0 AB 0 0A σ AB A AB e 0 2 0 0 0Iσ 2 2 2 2 σ ∗ where σ , σ , σ , σ , 0, and I are as in the statistical γ A A A e A A B AB Ŵ = = 8 , (17) NRM NRM 2 γ γ ∗ ∗ model for the GT method (Eq.  8); A and A are σ σ AB B ∗ p p A B A B p as in the statistical model for the NRM method (Eq.  4); NRM and A is the usual numerator relationship matrix where γ is the metafounder relationship for Population AB based on the pedigree without purebred and F1 animals. A; γ is the metafounder relationship for Population B; NRM For prediction with genomic information, A was γ is the metafounder relationship between Populations AB NRM NRM NRM 2 replaced with H and A was replaced with H . A and B; σ is the  variance of marker allele frequencies A B B The vector of predicted breeding values for the SM in Population A; σ ∗ is the variance of marker allele fre- method was: ∗ ∗ quencies in Population B; σ is a  covariance between p p A B       ∗ marker allele frequencies in Populations A and B; p and {ˆa } 0 0 A P p are marker-allele frequencies in the base populations 0 {ˆa } 0       SM B P B ebv = + + ,       (16) {ˆa } {ˆa } 0 of Populations A and B, respectively; and the asterisk A C:F 1 B C:F 1 ∗ ∗ {ˆa } {ˆa } aˆ superscripts in p and p denote that allele annotations A C:R B C:R AB A B ∗ ∗ 1 were randomized such that E(p ) = E(p ) = . A B 2 where a ˆ , a ˆ , a ˆ are the vectors of predicted partial A B AB In this study, metafounder relationships were calcu- additive genetic effects in the statistical model for the SM lated with estimated marker allele frequencies in genera- method (Eq.  14); and 0, subscript P, subscript C:F1, and tion 32. We estimated marker allele frequencies as subscript C:R are as defined for the GT method (Eq. 10). proposed by Gengler et al. [24] and genotypes from pure- bred animals in generations 35 to 39. Marker allele fre- The MF method quencies were estimated independently for each This method is conceptually different from the other purebred population. Finally, metafounder relationships methods. The other methods model populations as sep - were calculated from markers that have a minor allele arate entities while the MF method models populations frequency higher than 0.01 when averaged across the as sub-populations derived from a common ancestral purebred base-populations. The average metafounder population. In practice, this is done by identifying each relationship matrix across replicates, Ŵ , wa s : sub-population through a metafounder, calculating an γ¯ 0.80 Ŵ = = . additive genetic relationship matrix, Ŵ , between meta- γ¯ γ¯ 0.38 0.80 AB B founders, and then incorporating this information into The recursive algorithm for the MF method is: 1 + γ , i = j ∧ i ∈ m A A  1 1 + γ , i = j ∧ i ∈ m B B  γ , i �= j ∧{i, j}⊂ m A A (18) a = γ , i �= j ∧{i, j}⊂ m ij B B γ , i �= j ∧ [(i ∈ m ∧ j ∈ m ) ∨ (i ∈ m ∧ j ∈ m )]  AB A B B A 1 + a , i = j ∧ i �∈ {m ,m } sd A B  2 (a + a ), otherwise, is id where i , j , s , and are as in the recursive algorithms for one shared additive genetic relationship matrix for all the NRM and GT methods (Eqs.  4 and 8); a is as in the ij populations, A(Ŵ) . In theory, this method should simul- recursive algorithm for the NRM method (Eq.  4); γ , γ , A B taneously account for both the breed-specific terms and and γ are the metafounder relationships (Eq.  17); m AB A the segregation term [14]. is a vector of base animals from Population A; m is a The metafounder relationships can be calculated in vector of base animals from Population B; ∧ is the logi- several ways [14, 23]. We used the method proposed by cal “and”; and ∨ is the logical “or”. Please note that the Garcia-Baccino et al. [23]: last two elements of the recursive algorithm for the MF Poulsen et al. Genetics Selection Evolution (2022) 54:25 Page 8 of 17 method are the same as in the algorithm for the NRM The vector of predicted breeding values for the MF method. In other words, the only differences between method was: the NRM method and the MF method are that base ani- MF (21) ebv = aˆ , mals are related and their inbreeding coefficient can be greater than zero. These differences then carry over into where a ˆ is the vector of predicted additive genetic effects the additive genetic relationships for animals which are in the statistical model for the MF method (Eq. 19). not in the base population. The pedigree for the MF method included all the ani - Variance components mals, purebred and crossbred, in generations 32 to 39. We estimated variance components for each method The MF method uses one genomic relationship matrix and its respective statistical model (Eqs. 4, 8, 14, and 19). MF across all populations; G . A preliminary genomic rela- Variance components were only estimated with pedigree VanRaden tionship matrix, G , was calculated using Van- information. Breeding values were predicted with these Raden’s first method [22], and genotypes from purebred estimated variance components regardless of whether animals in generations 35 to 39; however, we scaled and breeding values were predicted with or without genomic centered the genomic relationship matrix with allele fre- information. quencies of 0.5. Markers were included in the genomic The estimated variance components for the phenotype relationship matrix if their minor allele frequency was without dominance effects are in Table  1. For presenta- higher than 0.01 when pooling genotypes from the pure- tion only, the estimated additive genetic variance from bred base-populations. The positive definiteness of the the MF method was transformed using the estimated genomic relationship matrix was ensured by using the metafounder relationships, Ŵ , such that the parametriza- VanRaden weighted average of G and the sub-matrix of tion was the same as for the GT method [14]: genotyped animals from the pedigree-based relationship MF VanRaden matrix: G = 0.05{A(Ŵ)} + 0.95G , where 2 2 σ = σ 1 − γ A A A MF {A(Ŵ)} is the sub-matrix of genotyped animals from the 22 2 pedigree-based relationship matrix. The genomic rela - 2 2 MF , (22) σ = σ 1 − γ tionship matrix, G , was not scaled and centered such A A B MF that its average diagonal and off-diagonal elements were MF 2 2 equal to those of {A(Ŵ)} because G and {A(Ŵ)} 22 22 σ = σ (γ + γ − 2γ ) A B AB A A AB MF MF 8 are comparable when G and Ŵ are calculated with the same set of markers. We calculated a combined rela- 2 2 2 where , , and are the partial additive genetic σ σ σ A A A A B AB tionship matrix for genotyped and non-genotyped ani- variance components; σ is the estimated additive MF mals [14, 20, 21], H(Ŵ) , because some animals were not genetic variance in the ancestral population (Eq. 20); and genotyped. γ , γ , and γ are metafounder relationships (Eq. 8). A B AB The statistical model for the MF method was: We calculated true partial additive genetic variance components and used them as reference for the magni- y = Xb + Za + e, (19) tude of the estimated variance components in Table  1. The true partial additive genetic variance components where y, b, and X are as described for the NRM method (Eq. 4); a is a vector of additive genetic effects; e is a vec- tor of residuals; and Z is a design matrix. The two vectors of random effects (a and e ) were dis- Table 1 Means and standard deviations of variance tributed as: components across replicates 2 2 2 2 2 Method σ σ σ σ A(Ŵ)σ e a 0 A A A A B AB MF ∼ N , , (20) e 0 0Iσ a e True 0.15 ± 0.01 0.15 ± 0.01 0.023 ± 0.003 0.80 GT 0.15 ± 0.04 0.14 ± 0.04 0.035 ± 0.039 0.80 ± 0.02 where A( ) is the additive relationship matrix with MF 0.15 ± 0.02 0.15 ± 0.02 0.026 ± 0.005 0.80 ± 0.02 metafounders [14]; Ŵ is the additive relationship matrix SM 0.25 ± 0.07 0.19 ± 0.06 0.297 ± 0.052 0.61 ± 0.04 between metafounders; σ is the additive genetic vari- MF NRM 0.23 ± 0.05 0.22 ± 0.05 0.51 ± 0.07 ance in the ancestral population; 0s are vectors or matri- 2 2 σ : Additive genetic variance for Population A ces of zeros; I is an identity matrix; and σ is the residual AA σ : Additive genetic variance for Population B variance. The additive genetic relationship matrix, A(Ŵ) , A σ : Additive genetic segregation variance between populations A and B was replaced with H(Ŵ) when breeding values were pre- AB σ : Residual variance dicted with genomic prediction. The true residual variance was constant across replicates P oulsen et al. Genetics Selection Evolution (2022) 54:25 Page 9 of 17 were calculated with the parametrization of the GT methods used to compare the methods. We stratified the method and the phenotype without dominance effects. comparison according to population. The true partial additive genetic variance for breed- specific effects from Population A was calculated as: True breeding values The true breeding value depends on whether the pheno - qtl 2 type includes only an additive genetic term, or both addi- σ = 2 p 1 − p β − β , (23) i,A i,A i,1 i,2 tive genetic and dominant genetic terms. i=1 The true breeding values with only an additive genetic where σ is the true partial additive genetic variance for term were calculated as: breed-specific effects from Population A; n is the num- qtl tbv = Q(β − β ) + 2Jβ , A (25) 1 2 2 ber of QTL; p is the allele frequency at QTL i in base i,A animals from Population A; β is the additive genetic i,1 where Q is a QTL genotype matrix with allelic loads of effect of the first QTL allele at QTL i; and β is the addi- i,2 the first allele; β is a vector of additive genetic effects tive genetic effect of the second QTL allele at QTL i. The of the first QTL allele; β is a vector of additive genetic partial additive genetic variance for breed-specific effects effects of the second QTL allele; and J is a matrix of 1s from Population B was calculated in the same way. with dimensions equal those of Q. The true partial additive genetic variance for segrega - In the presence of dominance, the true breeding value tion effects between Populations A and B was calculated of an animal depends on its ability to promote both addi- as [4]: tive and dominance genetic effects in its offspring [28, n 29]. Therefore, the true breeding value now depends on qtl the genotypes of the mate. True breeding values with a σ =2 p 1 − p β − β i,F 1 i,F 1 i,1 i,2 AB dominance term can be calculated with allele frequen- i=1 (24) cies from the population of mating candidates [28, 29]. 2 2 − σ + σ , A A The true breeding values with both an additive term and A B a dominance term were calculated as: where β , β , and n are as in Eq.  23; σ is the true i,1 i,2 qtl AB X Q partial additive genetic variance for segregation effects tbv = tbv + (Q − J) (1 − 2p ) ◦ d , (26) A X AD between Populations A and B; p is a vector of QTL i,F 1 allele frequencies in generation 33 of Population C; σ is where X ∈{A, B, C} denotes the population to which the true partial additive genetic variance for breed-spe- the possible mating candidates belong; p is a vector of cific effects from Population A; and σ is the true partial QTL allele frequencies in population X; d is a vector additive genetic variance for breed-specific effects from of dominant QTL effects; ◦ is the Hadamard product; 1 Population B. is a vector of ones; and tbv , Q, β , β , and J are as for 1 2 true breeding values with only an additive genetic term Software for analysis and prediction (Eq. 25). Most data-handling was carried out in the R-software [25]. The relationship matrices for the GT and MF meth - Accuracy and bias ods were calculated using the RcppArmadillo R-package We evaluated the methods according to their prediction [26]. The relationship matrices for the NRM and SM accuracy and prediction bias. We used two measures for methods were calculated using the DMU software [27]. the prediction bias [30, 31]: level bias and dispersion bias. Variance components were estimated using the AI-ReML The prediction accuracy was defined as Pearson’s corre - algorithm in the DMU software package [27]. Additive lation between true breeding values and predicted breed- genetic effects were predicted using the best linear unbi - ing values: ased prediction (BLUP) method and the Preconditioned Accuracy = ρ(tbv, ebv), (27) Conjugate Gradient algorithm implemented in DMU software [27]. where ρ(.) is the Pearson correlation function; tbv is a vector of true breeding values; and ebv is a vector of pre- Comparison of the methods dicted breeding values. General The level bias was calculated as: We compared how well the methods predicted accurate and unbiased breeding values in animals from generation μ = ebv − tbv + tbv , (28) bias base 39. In the following, we describe how we calculated true breeding values, accuracies, and biases, and the statistical Poulsen et al. Genetics Selection Evolution (2022) 54:25 Page 10 of 17 groups within which validation parameters were com- where μ is the level bias; ebv is the mean predicted bias pared; and n = 8 is the number of unique combinations breeding value in validation animals; tbv is the mean true between methods and use of genomic information. breeding values in validation animals; and tbv is the base mean true breeding value in base animals. The correction Expected pattern in results for tbv was required because the true breeding values, base The accuracies and biases are expected to differ between in contrast to predicted breeding values, differed from populations A, B, and C. For animals in generation 39 zero in the base populations. There is no level bias when of Population A, halfsibs are the closest relatives with μ is equal to 0. bias phenotypes. For animals in generation 39 of Population The dispersion bias was calculated as: B, cousins are the closest relatives with phenotypes. For cov(tbv, ebv) animals in generation 39 of Population C, own perfor- b = , bias (29) var(ebv) mance is available for all animals. Therefore, we expect that prediction is most accurate in Population C, less where b is the dispersion bias; cov() is the empirical bias accurate in Population A, and least accurate in Popula- covariance; ebv is a vector of predicted breeding values; tion B. tbv is a vector of true breeding values; and var() is the empirical variance. There is no dispersion bias when b bias Results is equal to 1. Prediction accuracy Generally, the GT and MF methods were as accurate or Statistical analysis of accuracy, level bias, and dispersion bias more accurate than the SM and NRM methods (Table 2). The accuracies and biases were compared across meth - Use of genomic information always increased the predic- ods, use of genomic information, and replicates; but tion accuracy (Table 2). not across populations and definition of true breeding For Population A, the methods were equally accurate values. for prediction of breeding values without genomic infor- We used non-parametric tests because accuracies and mation (median: 0.37–0.41, Table 2). When breeding val- biases were heteroscedastic across methods and not nor- ues were predicted with genomic information, the MF mally distributed. and GT methods were the most accurate (median: 0.59– We investigated whether a method was more accurate 0.65, Table 2). The SM method was generally as accurate or biased than others using paired Wilcoxon signed rank as the MF and GT methods (median: 0.58–0.65, Table 2), tests. We used paired tests to compare the methods to while the NRM method was always the least accurate remove the variation caused by the stochastic simulation; (median: 0.56–0.63). i.e., the methods were paired within replicates. Further- For Population B, the methods were equally accu- more, we investigated whether the methods were biased rate for prediction of breeding values without genomic using the one-sample Wilcoxon signed rank tests. The information (median: 0.29–0.35, Table  2). When breed- null hypotheses for these tests were that the level biases ing values were predicted with genomic information, were equal to 0 and that the dispersion biases were equal the MF method and the GT method were the most to 1. accurate for prediction of any definition of true breed - We used the Bonferroni-correction of p-values to con- ing value (median: 0.50–0.55, Table  2) while the SM trol for multiple testing: α = α/n = 0.05/1000 , bon tests and NRM methods were the least accurate (median: where α is the significance level and n is the tests 0.48–0.51). number of statistical tests. Among the 1000 tests, For Population C, the GT and MF methods were the n × n × n × (n − 1)/2 = 840 were com- p g m m most accurate (median: 0.61–0.63, Table  2). The least parisons between validation parameters and accurate methods were the SM method (median: 0.57) (n − 1) × n × n = 160 were tests for whether vali- p g m and the NRM method (median: 0.48–0.49), respectively dation parameters differed from expected values, where (Table 2). n = 3 is the number of validation parameters (accuracy, level bias, and dispersion bias); n = 10 is the number of g P oulsen et al. Genetics Selection Evolution (2022) 54:25 Page 11 of 17 Table 2 Median prediction accuracy across replicates Table 3 Median level bias across replicates Population × y y Population y y A AD A AD method × method Purebred F1 Rotation Purebred F1 Rotation Population A (Purebred) Population A (Purebred) c c c c a a a a GT 0.41 (0.09) 0.37 (0.11) 0.40 (0.11) 0.39 (0.12) GT −0.02 (0.08) 0.29 (0.09) −0.03 (0.08) 0.12 (0.08) c c c c a a a a MF 0.41 (0.09) 0.37 (0.11) 0.40 (0.11) 0.39 (0.12) MF −0.02 (0.08) 0.28 (0.09) −0.03 (0.08) 0.12 (0.08) c c c c a a a a NRM 0.39 (0.09) 0.37 (0.09) 0.40 (0.11) 0.37 (0.11) NRM −0.01 (0.08) 0.29 (0.10) −0.04 (0.09) 0.12 (0.09) c c c c a a a a SM 0.41 (0.09) 0.37 (0.10) 0.41 (0.09) 0.40 (0.12) SM −0.03 (0.08) 0.28 (0.09) −0.03 (0.09) 0.11 (0.09) a a a a a a a a ssGT 0.65 (0.06) 0.59 (0.08) 0.60 (0.08) 0.60 (0.08) ssGT −0.01 (0.07) 0.28(0.07) −0.01 (0.07) 0.13 (0.07) a a a a a a a a ssMF 0.65 (0.06) 0.59 (0.08) 0.60 (0.07) 0.60 (0.08) ssMF −0.01 (0.08) 0.28 (0.07) −0.04 (0.07) 0.13 (0.08) a a a a b b b b ssNRM ssNRM 0.00 (0.08) 0.29 (0.08) −0.02 (0.09) 0.14 (0.09) 0.63 (0.05) 0.56 (0.07) 0.58 (0.06) 0.58 (0.07) a a a a a a ab b ssSM 0.00 (0.07) 0.29 (0.08) −0.03 (0.08) 0.13(0.08) ssSM 0.58 (0.08) 0.60 (0.08) 0.65 (0.04) 0.60 (0.07) Population B (Purebred) Population B (Purebred) a a a a c c c c GT 0.00 (0.07) 0.28 (0.09) −0.03 (0.08) 0.04 (0.07) GT 0.34 (0.12) 0.31 (0.11) 0.34 (0.12) 0.34 (0.12) a a a a c c c c MF 0.00 (0.07) 0.28 (0.10) −0.02 (0.08) 0.04 (0.07) MF 0.33 (0.12) 0.31 (0.12) 0.34 (0.12) 0.34 (0.12) a a a a c c c c NRM 0.01 (0.08) 0.28 (0.10) 0.03 (0.09) 0.05 (0.08) NRM 0.33 (0.11) 0.30 (0.11) 0.31 (0.13) 0.31 (0.13) a a a a c c c c SM 0.01 (0.07) 0.29 (0.08) −0.03 (0.08) 0.04 (0.07) SM 0.35 (0.11) 0.29 (0.11) 0.31 (0.11) 0.32 (0.11) a a a a a a a a ssGT −0.01 (0.06) 0.28 (0.08) −0.03 (0.07) 0.04 (0.06) ssGT 0.53 (0.10) 0.50 (0.08) 0.54 (0.10) 0.53 (0.09) a a a a a a a a ssMF 0.01 (0.06) 0.28 (0.08) −0.01 (0.07) 0.05 (0.07) ssMF 0.55 (0.09) 0.51 (0.07) 0.54 (0.08) 0.54 (0.08) a a a a b b b b ssNRM 0.01 (0.07) 0.28 (0.08) −0.02 (0.08) 0.04 (0.08) ssNRM 0.50 (0.11) 0.50 (0.09) 0.50 (0.09) 0.51 (0.08) a a a a ssSM 0.01 (0.06) 0.28 (0.08) −0.02 (0.07) 0.04 (0.07) b b b b ssSM 0.51 (0.11) 0.48 (0.09) 0.49 (0.08) 0.50 (0.08) Population C (Crossbred) Population C (Crossbred) a a GT 0.00 (0.05) 0.20 (0.07) b b GT 0.62 (0.03) 0.61 (0.03) a a MF 0.00 (0.06) 0.20 (0.07) b b MF 0.62 (0.03) 0.62 (0.03) a a NRM 0.00 (0.06) 0.19 (0.07) NRM 0.48 (0.02) 0.48 (0.02) a a SM 0.00 (0.06) 0.20 (0.06) d d SM a a 0.57 (0.02) 0.57 (0.02) ssGT 0.00 (0.06) 0.20 (0.07) a a a a ssGT 0.63 (0.03) 0.62 (0.03) ssMF 0.00 (0.06) 0.20 (0.07) a a ssMF 0.63 (0.03) 0.62 (0.03) a a ssNRM 0.00 (0.06) 0.21 (0.06) e e ssNRM 0.48 (0.02) 0.49 (0.03) a a ssSM 0.00 (0.06) 0.20 (0.06) c c ssSM 0.57 (0.02) 0.57 (0.02) Median absolute deviations from medians are in parentheses Median absolute deviations are in parentheses Level Bias: Difference between change in predicted and true breeding values relative to in the base population Accuracy: Pearson’s correlation between true breeding values and predicted breeding values Bold: Medians in bold differ significantly from zero Superscripts: Different superscripts denote that medians are significantly Superscripts: Different superscripts denote that medians are significantly different different Superscripts are comparable within combinations of Population and column Superscripts are comparable within combinations of Population and column ss-prefix: Relationship matrices include genomic information ss-prefix: Relationship matrices include genomic information y : A phenotype with additive genetic effects A y : A phenotype with additive genetic effects y : A phenotype with both additive and dominant genetic effects AD y : A phenotype with both additive and dominant genetic effects AD Purebred: True breeding value is for production of purebred animals Purebred: True breeding value is for production of purebred animals F1: True breeding value is for production of F1-animals F1: True breeding value is for production of F1-animals Rotation: True breeding value is for mating with rotationally crossbred animals Rotation: True breeding value is for mating with rotationally crossbred animals Level bias Dispersion bias The level biases were not statistically significantly differ - In general, the dispersion biases for the GT, MF, and SM ent from 0 for the phenotype without a dominant genetic methods were not statistically significantly different from term (Table  3). For the phenotype with a dominant 1 (Table 4). genetic term, the level biases were statistically signifi - cantly different from 0 for mating an animal with another animal from the same population. Poulsen et al. Genetics Selection Evolution (2022) 54:25 Page 12 of 17 Table 4 Median dispersion bias across replicates genetic term (median: 0.78–0.94). The dispersion biases for the NRM method were always statistically signifi - Population y y A AD cantly different from 1. × method Purebred F1 Rotation For Population B, the dispersion biases for the GT and MF methods were not statistically significantly different Population A (Purebred) ab ab ab ab from 1 (median: 0.91–1.12). The dispersion biases for GT 0.98 (0.22) 0.85 (0.22) 0.89 (0.28) 0.86 (0.27) the SM method were only statistically significantly dif - ab b ab ab MF 0.96 (0.22) 0.84 (0.20) 0.88 (0.24) 0.85 (0.24) ferent from 1 when breeding values were predicted with c c c c NRM 0.74 (0.17) 0.62 (0.17) 0.65 (0.15) 0.65 (0.18) genomic information and the phenotype did not include ab b b b SM 0.94 (0.24) 0.78 (0.25) 0.84 (0.26) 0.79 (0.25) a dominant genetic term (median: 1.20). The dispersion a a a a ssGT 1.01 (0.14) 0.90 (0.15) 0.96 (0.14) 0.95 (0.14) biases for the NRM method was statistically significantly ab ab ab ab ssMF 1.00 (0.11) 0.88 (0.12) 0.93 (0.12) 0.90 (0.11) different from 1 in almost all cases (median: 0.67–0.82). c c c c ssNRM 0.76 (0.10) 0.67 (0.11) 0.69 (0.10) 0.66 (0.10) For Population C, the dispersion biases for the GT and b b b b ssSM 0.94 (0.15) 0.84 (0.18) 0.86 (0.15) 0.84 (0.17) MF methods were not statistically significantly different Population B (Purebred) from 1 when the phenotype did not include a dominant a a a bc GT 1.02 (0.44) 1.01 (0.43) 1.01 (0.41) 1.12 (0.40) genetic term (median: 0.99–1.02). The dispersion biases a a a bc MF 0.98 (0.39) 1.02 (0.42) 0.96 (0.44) 1.10 (0.37) for the SM and NRM methods were always statistically d b b b NRM significantly different from 1. 0.82 (0.30) 0.72 (0.33) 0.74 (0.34) 0.73 (0.33) a a a a SM 1.26 (0.57) 0.94 (0.47) 1.15 (0.51) 1.13 (0.50) c a a a Discussion ssGT 1.09 (0.16) 0.98 (0.20) 0.99 (0.17) 0.96 (0.16) c a a a As hypothesized, the GT and MF methods were generally ssMF 1.04 (0.17) 0.91 (0.19) 0.93 (0.16) 0.93 (0.15) d b b b the most accurate and least biased methods for predic- ssNRM 0.80 (0.16) 0.67 (0.15) 0.68 (0.14) 0.67 (0.12) a a a tion of breeding values with phenotypes from rotation- ab ssSM 1.02 (0.27) 0.98 (0.23) 0.97 (0.22) 1.20 (0.30) ally crossbred animals. The SM method was almost as Population C (Crossbred) accurate as the GT and MF methods but was also more a a GT 1.00 (0.06) 0.92 (0.06) biased. The NRM method was the least accurate and a a MF 1.01 (0.06) 0.93 (0.05) most biased of the methods. e e NRM 0.40 (0.04) 0.34 (0.03) c c SM 0.54 (0.05) 0.46 (0.04) The GT and MF methods a a ssGT 0.99 (0.07) 0.92 (0.06) We found that the GT and MF methods performed simi- a a ssMF 1.00 (0.07) 0.93 (0.05) larly for prediction of breeding values with phenotypes d d ssNRM 0.41 (0.04) 0.34 (0.03) from rotationally crossbred animals. This is in accord - b b ssSM 0.55 (0.05) 0.46 (0.04) ance with the fact that the MF method, in theory, can Median absolute deviations from medians are in parentheses account for both breed-specific terms and segregation Dispersion Bias: Linear regression coefficient of true breeding values onto terms from GT method [14]. More specifically, the GT predicted breeding values and MF methods are equivalent when the tranforma- Bold: Medians in bold differ significantly from zero tions of Eq.  22 yield the estimated variance components Superscripts: Different superscripts denote that medians are significantly from the GT method. However, this relies on the accu- different rate estimation of the metafounder relationships which Superscripts are comparable within combinations of Population and column ss-prefix: Relationship matrices include genomic information has some degree of estimation error. Fortunately for the y : A phenotype with additive genetic effects A MF method, it is the relative sizes of γ , γ , and γ which A B AB y : A phenotype with both additive and dominant genetic effects AD determine the relative sizes of the partial additive genetic 2 2 2 Purebred: True breeding value is for production of purebred animals parameters, σ , σ , and σ (Eq. 22). As long as Eq. 22 A A A A B AB F1: True breeding value is for production of F1-animals holds true, changes to the metafounder relationships are Rotation: True breeding value is for mating with rotationally crossbred animals accounted for through changes to the estimated additive genetic variance in the ancestral population, σ . MF One major advantage of the MF method is that genomic For Population A, the dispersion biases for the GT and information can readily be included in the additive rela- MF methods were not statistically significantly different tionship matrix with metafounders using the single-step from 1 for the phenotype without a dominant genetic procedure [14], regardless of the genetic composition of term (median: 0.94–1.01). The dispersion biases for SM the animals in the relationship matrix. On the contrary, method were not statistically significantly different from the single-step procedure has only been developed for 1 when breeding values were predicted without genomic the partial relationship matrices for breed-specific terms information or the phenotype was without a dominant P oulsen et al. Genetics Selection Evolution (2022) 54:25 Page 13 of 17 from the GT method; i.e., a combined partial relation- the NRM method and the pedigree tracing breed-spe- ship matrix for both genotyped and non-genotyped ani- cific genetic effects from Population A. Meanwhile, the mals for segregation terms has not been developed [6]. covariance for partial additive breed-specific effects from This may make the MF method more appropriate than Population B between the same animals was the same for the GT method when rotationally crossbred animals are the GT and SM methods: genotyped. 1 3 3 1 3 SM NRM:B NRM:B B B Based on this study, it is not possible to conclude a = f f a + a = [1 + 1] = , ij s i j d 4 4 4 4 8 whether the GT or MF method is better for the analysis 1 1 1 3 GT B B of these specific populations. a = f + f = 1 + = , ij s d 4 4 2 8 (31) The SM method where superscript NRM:B denotes that the covariance This method was generally as accurate and unbiased as was calculated with the NRM method and the pedigree the GT and MF methods for prediction in purebred ani- that traces breed-specific genetic effects from Population mals but less accurate and more biased for prediction in B; f is the breed proportion from Population B; and the rotationally crossbred animals (Tables 2 and 4). The inac - other terms are as for Eq. 30. curacy and bias of the SM method may be caused by its It is simple to see that the GT and SM methods do inability to properly separate the phenotype into its com- not always produce identical relationships. However, it ponents (Table 1). is challenging to explain how discreprancies between The SM method is only an approximation to the the GT and SM methods across the three partial addi- GT method and discreprancies between the two are tive relationship matrices affect the partitioning of ran - expected. For example, for the GT method and disregard- dom effects. Nevertheless, according to our study, the ing inbreeding, the covariance between siblings depends SM method seems to be a good approximation of the GT on the diagonal elements of their shared parents (Eq.  7). method when the aim is to predict breeding values in Meanwhile, for the SM method, the covariance between purebred animals. siblings depends on the product between their own regression covariates for partial additive genetic effects The NRM method (Eqs.  11, 12, 13). Consequently, the SM method is a bet- This method has the most inaccurate assumptions for ter approximation to the GT method between animals additive genetic effects among the methods investigated. where the weighted average of diagonal and off-diagonal In rotationally crossbred animals between divergent elements of common ancestors is equal to the product purebred populations, the model does not fit the data between the animals’ regression covariates for partial if the partial additive genetic variances due to breed- additive genetic effects and their additive genetic covari - specific effects are not proportional to breed propor - ance according to the NRM method. tions [10], and the segregation variance is not modelled In a rotational crossbreeding system, breed proportions [4]. Therefore, it was expected that this method was the differ across generations. Consequently, the weighted least accurate and most biased among those investigated average of diagonal and off-diagonal elements of com - (Tables 2, 3, 4). mon ancestors can differ from the product between the The NRM method is a common approach for multi- animals’ regression covariates for partial additive genetic breed analyses. The main argument for the NRM method effects and their additive genetic covariance according is that it is commonly implemented into softwares for to the NRM method. For example, in this study and dis- genetic evaluations. However, we argue that the GT, regarding inbreeding, the covariance of partial additive MF, and SM methods either are accessible or can easily breed-specific effects from Population A between full become accessible. Currently, the GT or MF methods sibs i and j from generation 34 and Population C was not may not be implemented in softwares for genetic evalua- the same for the GT and SM methods: tions, but both random regression and the NRM method 1 1 1 1 1 are. The combination of random regression and the NRM SM A A NRM:A NRM:A a = f f a + a = [0 + 1] = , ij i j s d 4 4 4 4 16 method enables the use of the SM method which, in this 1 1 1 1 GT A A study, was more accurate and less biased than the NRM a = f + f = 0 + = , ij s d 4 4 2 8 method (Tables  2, 3, 4). In the future, the GT and MF (30) methods should become accessible through  their imple- where subscripts i, j, s, and d denote animals; f is the mentation into commonly used softwares for genetic breed proportion from Population A; and superscript evaluations. The implementation of both the GT and NRM:A denotes that the covariance was calculated with MF methods is simple as the algorithms for directly Poulsen et al. Genetics Selection Evolution (2022) 54:25 Page 14 of 17 computing their inverse covariance matrices are very A simulation design with less diverged purebred popu- similar to the algorithm for the NRM method [9, 14]. lations would most likely yield the same ranking of the Consequently, the time required for implementing the methods but with less absolute differences between their GT and MF methods should be greatly reduced as a large prediction accuracies. proportion of program code from the NRM method can In this study, only genetic drift caused changes in be reused. All things considered, we do not recommend allele frequencies. In practice, allele frequencies are also the NRM method for genetic analyses with phenotypes affected by selection. Simulating selection would most of rotationally crossbred animals, because its alternatives likely also change the results. However, we have no rea- are more accurate, less biased, and easily accessible. son to believe that selection would change the ranking between the methods, because all the methods theoreti- Simulation design cally can account for selection, and because their mecha- Results from simulation studies are most relevant when nism for doing so is the same [8, 9, 12, 14]. the simulated populations are representative of real pop- ulations. Populations can be described with several Genotypes from crossbred animals parameters, however, the divergence between the popu- It is simpler to incorporate genomic information from lations is a key argument for the relevance of multibreed crossbred animals into some methods than into others. relationship matrices [4]. The magnitude of divergence For the MF, SM, and NRM methods, genomic informa- between two populations can be represented by the ratio tion on crossbreds can be incorporated as for purebred between the segregation variance and the additive genetic animals. For the GT method, it becomes necessary to 2 1 2 1 2 2 trace the breed of origin of alleles to construct genomic variance in F2 animals: σ / σ + σ + σ ; which, AB 2 A 2 B AB relationship matrices for breed-specific terms [13]. in turn, can be calculated using the metafounder rela- Furthermore, to our knowledge, it is not known how tionships (Eq.  22). Using this measure, the average mag- genomic information should be incorporated into par- nitude of divergence between Populations A and B is 15% tial relationship matrices for segregation terms. Although based on the metafounder relationships (Table 1). Mean- it is simple to incorporate genomic information for the while, this measure for the magnitude of divergence is MF, SM, and NRM methods, it is not known whether the 16% between DanBred Landrace and DanBred Yorkshire resulting relationship matrices correctly represent the pigs [32], 15% between Hereford and Zebu cattle [33], additive genetic covariance between animals. In particu- and on average 11% (min: 3%, max: 25%) between sub- lar, this is the case for the SM method and our applica- populations of Manech Tête Rousse sheep [34]. There - tion of the NRM method, as they are approximations. fore, the magnitude of divergence between populations A Although relevant, it was outside the scope of this study and B is representative of the divergence between real to compare the methods in a scenario with genomic populations. information from crossbred animals. It would have been reasonable to compare the meth- ods with a different simulation design, which would most Synthetic breeds likely give a different result. However, the purebred popu - This study was on genetic analyses with rotationally cross - lations need to have diverged from each other; otherwise bred animals, but our results may also apply to other segregation effects would be small. We ensured that the genetic analyses of mixed populations. For example, some purebred populations had diverged by simulating sepa- breeding companies create synthetic breeds. In practice, rate population bottlenecks in the two populations, and not a shared population bottleneck; by only sampling 50 animals (0.2% of the historical population) when found- Table 5 Example pedigree ing the purebred populations; by keeping the effec - tive population sizes small in the purebred populations Id Sire Dam Breed ( N ≈ 50 animals); and by isolating the purebred popu- 1 0 0 A lations for 32 generations prior to the pedigreed genera- 2 0 0 B tions. In a scenario where the purebred populations had 3 0 0 A only slightly diverged from each other, segregation effects 4 0 0 B would be small and the additive genetic variances would 5 1 2 – be the same in the purebred populations. This would 6 3 4 – diminish the argument for partial additive relationship 7 5 6 – matrices for the breed-specific terms and the segrega - 8 5 6 – tion term. In other words, it would be better to regard the 9 1 7 – two purebred populations as one purebred population. P oulsen et al. Genetics Selection Evolution (2022) 54:25 Page 15 of 17 Table 6 Breed-specific relationship matrix for the GT-method with the sample pedigree GT 1 3 5 6 7 8 9 1 1.00 3 1.00 5 0.50 0.50 6 0.50 0.50 7 0.25 0.25 0.25 0.25 0.50 8 0.25 0.25 0.25 0.25 0.25 0.50 9 0.63 0.13 0.38 0.13 0.38 0.25 0.88 Upper triangle and zeroes are omitted Table 7 Breed-specific relationship matrix for the NRM-method with the sample pedigree NRM 1 3 5 6 7 8 9 1 1.00 3 1.00 5 0.50 1.00 6 0.50 1.00 7 0.25 0.25 0.50 0.50 1.00 8 0.25 0.25 0.50 0.50 0.50 1.00 9 0.63 0.13 0.50 0.25 0.63 0.38 1.13 Upper triangle and zeroes are omitted Table 8 Breed-specific relationship matrix for the SM-method with the sample pedigree SM 1 3 5 6 7 8 9 1 1.00 3 1.00 5 0.35 0.50 6 0.35 0.50 7 0.18 0.18 0.25 0.25 0.50 8 0.18 0.18 0.25 0.25 0.25 0.50 9 0.54 0.11 0.31 0.15 0.38 0.23 0.84 Upper triangle and zeroes are omitted Table 9 Relationship matrix for the segregation term with Table 10 Relationship matrix for the segregation term with sample pedigree and the GT-method sample pedigree and the SM-method GT SM 7 8 9 7 8 9 A A AB AB 7 1.00 7 1.00 8 1.00 8 0.50 1.00 9 0.50 0.50 9 0.44 0.27 0.56 Upper triangle and zeroes are omitted Upper triangle and zeroes are omitted Poulsen et al. Genetics Selection Evolution (2022) 54:25 Page 16 of 17 Table 11 Relationship matrix with sample pedigree and the MF-method MF 1 2 3 4 5 6 7 8 9 1 1.33 2 0.10 1.25 3 0.66 0.10 1.33 4 0.10 0.50 0.10 1.25 5 0.72 0.68 0.38 0.30 1.05 6 0.38 0.30 0.72 0.68 0.34 1.05 7 0.55 0.49 0.55 0.49 0.70 0.70 1.17 8 0.55 0.49 0.55 0.49 0.70 0.70 0.70 1.17 9 0.94 0.29 0.60 0.29 0.71 0.54 0.86 0.62 1.27 Upper triangle is omitted. The matrix was calculated with: γ = 0.66 , γ = 0.1 , and γ = 0.50 AB B synthetic breeds are crossbred populations and they are Appendix subject to the same mechanisms as other crossbred popula- Appendix 1: Multibreed relationship matrices with a small tions. The only difference between a rotationally crossbred example pedigree population and a synthetic breed is that sires are not nec- The differences between the NRM, GT, and SM methods essarily purebred for synthetic breeds. Similar to the rota- are easier to understand through examples. This example tionally crossbred populations, the complex distributions of is based on a pedigree with both purebred animals, F1 genetic effects may complicate accurate and unbiased pre - animals, F2 animals, and a F2-backcross animal (Table 5). diction of breeding values in synthetic breeds. Our results We use the GT method as reference because it is theo- may assist with the choice of method for the relationship retically correct. matrix used in genetic analysis of synthetic breeds. The methods yield different additive relationship matri - ces for the term from breed A (Tables 6, 7, 8). The NRM Solving BLUP equation systems method calculates the correct relationships for purebred The choice between methods may also be impacted by animals; but is erroneous after it encounters crossbred their computational requirements. For all the relationship animals. The diagonal elements for crossbred animals are matrices that were studied here, the inverse can be directly not scaled according to their breed proportions, and this computed [8, 9, 14]. However, the resulting equation sys- error affects both diagonal and off-diagonal elements for tems differ in dimensions and sparseness. Using the GT, descendants of the crossbred animals (Tables  6 and  7). SM, or NRM method results in a larger equation system The SM method yields the same diagonal elements as the than with the MF method; especially with large numbers GT method in the absence of inbreeding (Table  8). The of breeds and crossbred animals. Meanwhile, the MF off-diagonal elements between F1 and F2 crossbred ani - method contains more non-zero elements than the other mals are also correct. The off-diagonal elements between methods; and using the MF method with the single-step purebred animals and crossbred animals are erroneous, procedure may require the inversion of one large genomic and so is the off-diagonal element for the F2-backcross relationship matrix rather than the inversion of smaller (animal 9; Table 8). genomic relationship matrices as with the other meth - The methods also yield different partial additive rela - ods. Comparison of computational demands between the tionship matrices for the segregation term (Tables  9 and methods was outside the scope of this study but it could be 10). The SM method calculates non-zero off-diagonal ele - relevant when computer hardware is a limiting factor. ments for related animals where the off-diagonal element is zero for the GT method. Furthermore, the off-diagonal Conclusion elements between animals 7 and 9 are erroneous as is the In the scenarios that  we investigated, models using the diagonal element for animal 9. additive relationship matrix with metafounders [14] or The relationship matrix from the MF method is not the partial relationship matrices by García-Cortés and directly comparable to those from the other methods Toro [9] were generally more accurate and less biased (Table  11) although it is theoretically equal to the GT than those using the partial relationship matrices by method [14]. Strandén and Mäntysaari [12] or the usual numerator relationship matrix [8]. P oulsen et al. Genetics Selection Evolution (2022) 54:25 Page 17 of 17 Acknowledgements 13. Christensen OF, Legarra A, Lund MS, Su G. Genetic evaluation for three- The authors thank M. Henryon and A.C. Sørensen for discussions on the simu- way crossbreeding. Genet Sel Evol. 2015;47:98. lation design. Furthermore, we thank A. Sampathkumar for assistance with 14. Legarra A, Christensen OF, Vitezica ZG, Aguilar I, Misztal I. Ancestral reducing the computational demands of the study. relationships using metafounders: finite ancestral populations and across population relationships. Genetics. 2015;200:455–68. Authors’ contributions 15. Poulsen BG, Nielsen B, Ostersen T, Christensen OF. Genetic associations BGP simulated the data, analyzed the data, and wrote the manuscript. OFC, between stayability and longevity in commercial crossbred sows, and TO, and BN supervised and assisted at all stages of the study, including the stayability in multiplier sows. J Anim Sci. 2020;98:skaa183. writing of the manuscript. All authors read and approved the final manuscript. 16. Sargolzaei M, Schenkel F. Qmsim: a large-scale genome simulator for livestock. Bioinformatics. 2009;25:680–1. Funding 17. Falconer DS, Mackay T. Eec ff tive population size. Introduction to quantita- This work is partly funded by the Innovation Fund Denmark (IFD) under File tive genetics. Harlow: Prentice Hall; 1996. p. 65–72. No. 9065-00070B. IFD has had no role in the design of the study, collection 18. Wellmann R, Bennewitz J. The contribution of dominance to the under- data, data analysis, interpretation of data, and in writing the manuscript. standing of quantitative genetic variation. Genet Res. 2011;93:139–54. 19. Henryon M, Berg P, Ostersen T, Nielsen B, Sørensen AC. Most of the Availability of data and materials benefits from genomic selection can be realized by genotyping a small The datasets used and analysed during the current study are available from proportion of available selection candidates. J Anim Sci. 2012;90:4681–9. the corresponding author on reasonable request. 20. Aguilar I, Misztal I, Johnson DL, Legarra A, Tsuruta S, Lawlor TJ. Hot topic: a unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluation of holstein final score. J Dairy Sci. Declarations 2010;93:743–52. 21. Christensen OF, Lund MS. Genomic prediction when some animals are Ethics approval and consent to participate not genotyped. Genet Sel Evol. 2010;42:2. Not applicable. 22. Vanraden PM. Efficient methods to compute genomic predictions. J Dairy Sci. 2008;91:4414–23. Consent for publication 23. Garcia-Baccino CA, Legarra A, Christensen OF, Misztal I, Pocrnic I, Vitezica Not applicable. ZG, et al. Metafounders are related to fst fixation indices and reduce bias in single-step genomic evaluations. Genet Sel Evol. 2017;49:34. Competing interests 24. Gengler N, Mayeres P, Szydlowski M. A simple method to approximate The authors declare that they have no competing interests. gene content in large pedigree populations: application to the myostatin gene in dual-purpose belgian blue cattle. Animal. 2007;1:21–8. Author details 25. R Core Team. R: a language and environment for statistical computing. R Breeding & Genetics, Danish Agriculture and Food Council, Axelborg, Axeltorv Foundation for Statistical Computing. 2020. https:// www.R- proje ct. org/. 3, Copenhagen W, 1609 Copenhagen, Denmark. Center for Quantita- 26. Eddelbuettel D, Sanderson C. Rcpparmadillo: accelerating r with tive Genetics and Genomics, Aarhus University, Blichers Allé 20, 8830 Tjele, high-performance c++ linear algebra. Comput Stat Data Anal. Denmark. 2014;71:1054–63. 27. Madsen P, Jensen J, Labouriau R, Christensen OF, Sahana G. Dmu—a Received: 26 April 2021 Accepted: 28 February 2022 package for analyzing multivariate mixed models in quantitative genetics and genomics. In: Proceedings of the 10th World Congress of Genetics Applied to Livestock Production: 17–22 August 2014; Vancouver. 2014; pp. 18–22. 28. Falconer DS, Mackay T. Average effect. Introduction to quantitative References genetics. Harlow: Prentice Hall; 1996. p. 112–4. 1. Falconer DS, Mackay T. Correlated response to selection. Introduction to 29. Falconer DS, Mackay T. Breeding values. Introduction to quantitative quantitative genetics. Harlow: Prentice Hall; 1996. p. 317–21. genetics. Harlow: Prentice Hall; 1996. p. 114–6. 2. Wientjes YCJ, Calus MPL. Board invited review: the purebred-crossbred 30. Legarra A, Reverter A. Semi-parametric estimates of population accuracy correlation in pigs: a review of theory, estimates, and implications. J Anim and bias of predictions of breeding values and future phenotypes using Sci. 2017;95:3467–78. the lr method. Genet Sel Evol. 2018;50:53. 3. Oldenbroek K, Waaij LVD. The different crossbreeding systems and their 31. Legarra A, Reverter A. Correction to: semi-parametric estimates of popu- applicability. Textbook animal breeding: animal breeding and genetics lation accuracy and bias of predictions of breeding values and future for BSc students. Wageningen: Centre for Genetic Resources and Animal phenotypes using the lr method. Genet Sel Evol. 2019;51:69. Breeding and Genomics; 2014. p. 236–41. 32. Xiang T, Christensen OF, Legarra A. Technical note: genomic evaluation 4. Lo LL, Fernando RL, Grossman M. Covariance between relatives in multi- for crossbred performance in a single-step approach with metafounders. breed populations: additive model. Theor Appl Genet. 1993;87:423–30. J Anim Sci. 2017;95:1472–80. 5. Wei M, van der Werf JHJ. Maximizing genetic response in crossbreds using 33. Junqueira VS, Lopes PS, Lourenco D, Silva FF, Cardoso FF. Applying the both purebred and crossbred information. Anim Sci. 1994;59:401–13. metafounders approach for genomic evaluation in a multibreed beef 6. Christensen OF, Madsen P, Nielsen B, Su G. Genomic evaluation of both cattle population. Front Genet. 2020;11:fgene.2020.556399. purebred and crossbred performances. Genet Sel Evol. 2014;46:23. 34. Macedo FL, Christensen OF, Astruc JM, Aguilar I, Masuda Y. Bias and accu- 7. Falconer DS, Mackay T. Genetic components of variance. Introduction to racy of dairy sheep evaluations using blup and ssgblup with metafound- quantitative genetics. Harlow: Prentice Hall; 1996. p. 125–31. ers and unknown parent groups. Genet Sel Evol. 2020;52:47. 8. Mrode RA. Genetic covariance between relatives. Linear models for the prediction of animal breeding values. Wallingford: CABI; 2014. p. 22–33. Publisher’s Note 9. García-Cortés LA, Toro MA. Multibreed analysis by splitting the breeding Springer Nature remains neutral with regard to jurisdictional claims in pub- values. Genet Sel Evol. 2006;38:601–15. lished maps and institutional affiliations. 10. Elzo MA. Recursive procedures to compute the inverse of the multiple trait additive genetic covariance matrix in inbred and noninbred multi- breed populations. J Anim Sci. 1990;68:1215–28. 11. Cantet R, Fernando R. Prediction of breeding values with additive animal models for crosses from 2 populations. Genet Sel Evol. 1995;27:323–34. 12. Strandén I, Mäntysaari EA. Use of random regression model as an alterna- tive for multibreed relationship matrix. J Anim Breed Genet. 2013;130:4–9. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Genetics Selection Evolution Springer Journals

Predictive performances of animal models using different multibreed relationship matrices in systems with rotational crossbreeding

Loading next page...
 
/lp/springer-journals/predictive-performances-of-animal-models-using-different-multibreed-nV00ZHOy0l

References (43)

Publisher
Springer Journals
Copyright
Copyright © The Author(s) 2022
eISSN
1297-9686
DOI
10.1186/s12711-022-00714-w
Publisher site
See Article on Publisher Site

Abstract

Background: In livestock breeding, selection for some traits can be improved with direct selection for crossbred performance. However, genetic analyses with phenotypes from crossbred animals require methods for multibreed relationship matrices; especially when some animals are rotationally crossbred. Multiple methods for multibreed relationship matrices exist, but there is a lack of knowledge on how these methods compare for prediction of breed- ing values with phenotypes from rotationally crossbred animals. Therefore, the objective of this study was to compare models that use different multibreed relationship matrices in terms of ability to predict accurate and unbiased breed- ing values with phenotypes from two-way rotationally crossbred animals. Methods: We compared four methods for multibreed relationship matrices: numerator relationship matrices (NRM), García-Cortés and Toro’s partial relationship matrices (GT ), Strandén and Mäntysaari’s approximation to the GT method (SM), and one NRM with metafounders (MF). The methods were compared using simulated data. We simulated two phenotypes; one with and one without dominance effects. Only crossbred animals were phenotyped and only pure - bred animals were genotyped. Results: The MF and GT methods were the most accurate and least biased methods for prediction of breeding values in rotationally crossbred animals. Without genomic information, all methods were almost equally accurate for prediction of breeding values in purebred animals; however, with genomic information, the MF and GT methods were the most accurate. The GT, MF, and SM methods were the least biased methods for prediction of breeding values in purebred animals. Conclusions: For prediction of breeding values with phenotypes from rotationally crossbred animals, models using the MF method or the GT method were generally more accurate and less biased than models using the SM method or the NRM method. Crossbred performance is often indirectly selected Background for through selection for purebred performance. This Several livestock production systems use crossbred ani- is valid if the genetic correlation between the crossbred mals at the commercial level. In these systems, the phe- and purebred performances is strong [1]. However, the notypic performance of crossbred animals should be the genetic correlation between crossbred and purebred per- primary objective of the breeding goal. formances is only moderate for many traits [2]. For such traits, it may be a solution to directly select for crossbred *Correspondence: bgp@lf.dk performance. Breeding & Genetics, Danish Agriculture and Food Council, Axelborg, Multiple crossbreeding procedures exist [3]. The most Axeltorv 3, Copenhagen W, 1609 Copenhagen, Denmark notable procedures for modern pork and beef systems Full list of author information is available at the end of the article © The Author(s) 2022. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/. The Creative Commons Public Domain Dedication waiver (http:// creat iveco mmons. org/ publi cdoma in/ zero/1. 0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. Poulsen et al. Genetics Selection Evolution (2022) 54:25 Page 2 of 17 are the two-way terminal, three-way terminal, and two- The objective of this study was to compare methods for way rotational crossbreeding procedures. Among these relationship matrices in terms of ability to predict accu- crossbreeding systems, genetic analysis is most compli- rate and unbiased breeding values with phenotypes from cated with phenotypes from rotationally crossbred ani- rotationally crossbred animals. We compared the NRM mals [4–6]. Nevertheless, rotationally crossbred animals as used by Poulsen et  al. [15], the partial relationship comprise a possible source of both additional and novel matrices by García-Cortés and Toro [9], the approximate phenotypes. Therefore, in the following, we will focus on partial relationship matrices by Strandén and Mäntysaari genetic analyses with phenotypes from rotationally cross- [12], and the relationship matrix with metafounders by bred animals. Legarra et al. [14]. Phenotypes from rotationally crossbred animals are We hypothesized that the methods by García-Cortés often subject to more variable genetic effects than phe - and Toro [9] and Legarra et al. [14] were the most accu- notypes from purebred animals and F1 animals [4, 7]. rate and least biased methods because they are the only Mating animals from different populations often leads to methods which fully comply with the theory [4]. offspring with a high degree of heterozygosity. For domi - nance effects, the increase in heterozygosity results in a Methods favorable change in the phenotypic mean and increased The prediction accuracies and prediction biases of the dominance variance in subsequent generations [7]. For models with different relationship matrices were investi - additive genetic effects, the increase in heterozygosity gated through a simulation study. The simulation design increases the additive genetic variance in following gen- represents a two-way-rotational crossbreeding system erations [4]. All the aforementioned changes are relative [3]. In this section, we first present how the populations to the average of the genetic parameters in the consti- were simulated. This includes the description of their tuting purebred populations. Animal breeding focuses population structure, genomic architecture, genetic mainly on additive genetic effects, which are modelled effects, and phenotypes. Then, we present how we pre - using additive genetic relationship matrices. Since the dicted breeding values with phenotypes from rotationally usual numerator relationship matrix (NRM) [8] can not crossbred animals using statistical models with different correctly model additive genetic effects in rotationally relationship matrices. Lastly, we present how we evalu- crossbred animals [4], specialized additive relationship ated and compared the statistical models with different matrices are needed. relationship matrices. In the following, we refer to the Specialized additive genetic relationship matrices for statistical models with different relationship matrices as crossbred animals exist [4, 9–11]. These relationship methods. matrices decompose the additive genetic relationships into a breed-specific term for each breed and a segrega - Simulation tion term for each pair of breeds. The partial relation - General ship matrices for the breed-specific terms are analogous A two-way-rotational crossbreeding system and genomic to NRM-based matrices and they refer to the additive architecture were simulated with the QMSim software genetic variances in the purebred base populations. [16]. For all populations, generations did not overlap, Meanwhile, the partial relationship matrices for the the numbers of males and females were equal, sires and segregation terms model the increased additive genetic dams were chosen at random (no selection), mating was variances in crossbred animals. Both types of partial rela- random and sampled without replacement, and the litter tionship matrices have later been approximated [12] and size was 6. We simulated 100 replicates. The population the theory for the partial relationship matrices for breed- structures are shown in Fig. 1. specific terms has been extended to incorporate genomic information [13]. The additive relationship matrix with Historical population metafounders was proposed by Legarra et  al. [14], and The first generation in the historical population consisted it is an alternative to the partial relationship matrices of 3000 animals. The population size was constant for mentioned above. In theory, the relationship matrix with 1000 generations, and over the following 200 generations metafounders simultaneously models both breed-specific the population size decreased linearly to 2800 animals at and segregation terms with one additive genetic effect the end. [14]. There is a need to investigate how models with these relationship matrices compare for prediction of accu- Purebred populations rate and unbiased breeding values with phenotypes from We created two purebred populations. Each purebred rotationally crossbred animals. population was founded by 25 males and 25 females from the last generation in the historical population. The P oulsen et al. Genetics Selection Evolution (2022) 54:25 Page 3 of 17 Fig. 1 General population structures. Colors: Types of information made available for prediction. Grey: No information. Blue: Pedigree information. Red: Pedigree information and phenotypes. Green: Pedigree information and genomic information sampling of founders was random and independent for when marker genotypes were pooled across the pure- the two purebred populations. The purebred populations bred populations. Similarly, 1223 of the 1750 QTL segre- were kept separate for 39 generations. At each genera- gated in generation 32 of either purebred population and tion, 25 randomly selected sires were mated with 25 ran- 1394 QTL segregated when QTL genotypes were pooled domly selected dams; i.e., the effective population sizes across the purebred populations. were approximately 50 [17], and not all animals produced offspring. In the following, the two purebred populations Genetic effects are referred to as Population A and Population B. We simulated both additive and dominant QTL effects. Additive and dominant QTL effects were identical across Crossbred population populations. The first crossbred generation was founded by mating The additive genetic animal effects were solely based on 75 males from Population A and 75 females from Popu- additive QTL effects. The absolute additive QTL effects lation B. These animals were drawn from generation 32 were drawn from a gamma-distribution with the stand- of their respective population. The first generation in ard parameters in QMSim [16]. Additive QTL effects the crossbred population is referred to as generation 33. were scaled by QMSim such that the additive genetic ani- For generations 34 to 39, crossbred animals were created mal variance was 0.2 after the historical population [16]. by mating 75 males from one of the purebred popula- The dominant QTL effects, were simulated as described tions with 150 females from the crossbred population; by Wellmann and Bennewitz [18]: i.e., for these generations, each purebred sire was mated d = h ◦ |β − β |, (1) 1 2 with two crossbred dams. Sires were from Population A in odd-numbered generations and Population B in even- where d is a vector of dominant QTL effects; numbered generations. In the following, the crossbred 1 1 h ∼ N ( 1, I) is a vector of dominance degrees; ◦ is the 2 10 population is referred to as Population C. Hadamard product; β is a vector of additive QTL effects of the first QTL-allele; and β is a vector of additive QTL Genomic architecture effects of the second QTL-allele. Dominant genetic ani - The genome consisted of five 100-cM chromosomes. mal effects, d, were calculated as the sum of dominant Each chromosome contained 3500 markers and 350 QTL effects where the animal was heterozygous. Domi - quantitative trait loci (QTL). Marker positions, QTL nant genetic animal effects were scaled such that the positions, and allele frequencies were randomly and uni- dominant genetic animal variance was 0.1 in Population formly distributed. Marker and QTL genotypes were ini- C. On average, 8% of the loci showed overdominance; tialized in the first generation of the historical population. 45% showed partial dominance that was greater than On average, 12,104 of the 17,500 markers segregated half the allele substitution effect; and 46% showed partial with a minor allele frequency (MAF) higher than 0.01 in dominance that was less than half the allele substitution generation 32 of either purebred population. Meanwhile, effect. 13,769 markers segregated with a MAF higher than 0.01 Poulsen et al. Genetics Selection Evolution (2022) 54:25 Page 4 of 17 Phenotypes 1 1 + a , i = j sd a = ij (2) We defined two phenotypes: a phenotype without domi - 1 (a + a ), otherwise, is id nance effects, y = a + e , and a phenotype with domi- nance effects, y = a + d + e , where a is a vector of AD where i and j denote animals, a is the pedigree-based ij additive genetic animal effects, d is a vector of domi - covariance between the additive genetic effects of ani - nant genetic animal effects, e is a vector of environmen - mals i and j, s is the sire of j , and d is the dam of j [8]. tal effects, and e ∼ N (0, 0.8I) . Note that y and y A AD The pedigrees for the two relationship matrices were have different narrow-sense heritabilities: h = 0.2 and different. The pedigree for Population A included animals h = 0.2/1.1. from both Population A and Population C, and the pedi- AD gree for Population B included animals from both Pop- Information used for prediction ulation B and Population C. To create the pedigree for We used the information such as to represent a system Population A, animals from Population B were removed where only crossbred animals were phenotyped and only from the pedigree and vice versa. the purebred animals were genotyped (Fig.  1). It is com- We used two genomic relationship matrices for the NRM NRM mon practice to not genotype crossbred animals because NRM method; G and G . Preliminary genomic A B VanRaden VanRaden it is more important to genotype selection candidates relationship matrices, G and G , were A B than phenotyped animals [19]. calculated using VanRaden’s first method [22], geno - More specifically, pedigree information was kept only types from purebred animals in generations 35 to 39, for animals born in generations 32 through 39. Animals and marker allele frequencies in the respective pure- VanRaden born before generation 32 were regarded as unknown. bred base-populations. When calculating G VanRaden Marker information was only available for purebred ani- and G , a marker was included if its minor mals born in generations 35 through 39. Phenotypes were allele frequency was higher than 0.01 in its respec- only available for crossbred animals born in generations tive purebred base-population. The positive definite - 33 through 39. ness of genomic relationship matrices was ensured by using the weighted average between VanRaden’s Prediction first method and the sub-matrix of genotyped ani - General mals from its respective pedigree-based relationship NRM NRM VanRaden We compared four methods for multibreed relationship matrix: G = 0.05{A } + 0.95G , where X X X NRM matrices, i.e., the NRM [8]; García-Cortés and Toro (GT) X ∈{A, B} denotes the population and {A } is the [9]; Strandén and Mäntysaari (SM) [12]; and Legarra et al. sub-matrix of genotyped animals from the respective (MF) [14]. All four methods can be extended to include pedigree-based relationship matrix. The genomic rela - NRM NRM genomic information using the single-step procedure tionship matrices, G and G , were scaled and A B [13, 20, 21]. For each method, we describe the theory, centered such that their average diagonal and off-diago - pedigree(s), incorporation of genomic information, the nal elements were equal to those of the sub-matrices of statistical model, and calculation of predicted breeding genotyped animals from their respective pedigree-based values. In Appendix 1, each method is showcased with a relationship matrices. We calculated combined relation- small example pedigree. We highly recommend readers ship matrices for genotyped and non-genotyped animals who are unfamiliar with the methods to view Appendix 1 [20, 21] because some animals were not genotyped. The after reading their respective sections in Methods. combined relationship matrices for genotyped and non- NRM NRM genotyped animals were H and H for animals A B The NRM method with genetic contributions from Populations A and B, This method can be used for multibreed analyses in mul - respectively. tiple ways. We use the NRM method such that we have The statistical model for the NRM method was: one relationship matrix per breed. This allows us to par - y = Xb + Z a + Z a + e, A A B B (3) tition the breeding values of crossbred animals into one term per purebred population. Furthermore, it allows where y is a vector of phenotypes; b is a vector of param- the additive genetic variances to differ between purebred eters for the general mean, pedigree-derived breed pro- populations. In this study, the NRM method required portion, and pedigree-derived heterosis; a is a vector of two relationship matrices; one for terms contributed additive genetic effects from Population A; a is a vector from Population A, and one for terms contributed from of additive genetic effects from Population B; e is a vector population B. of residuals; and X , Z , and Z are design matrices. A B The recursive algorithm for each of the NRM matrices The three vectors with random effects (a , a , and e ) A B is: were assumed to be distributed as: P oulsen et al. Genetics Selection Evolution (2022) 54:25 Page 5 of 17       NRM 2 A B A B 1 A σ 2 f f + f f + a , i = j a 0 sd A A A s s d d 2 a = ij (7) NRM 2       a ∼ N 0 , 0A σ , (a + a ), otherwise, is id B A e 0 0 0Iσ where i , j , s , and d are as for the algorithm for the NRM (4) method (Eq.  2); a is the pedigree-based covariance ij NRM where A is a relationship matrix for additive genetic between the additive genetic segregation effects of ani - NRM effects from Population A; A is a relationship matrix A B mals i and j; f and f are the proportions of genetic 2 s s for additive genetic effects from Population B; σ is the material from Population A and Population B, respec- additive genetic variance in Population A; σ is the addi- A B tively, in the sire of animal j ; and f and f are the d d tive genetic variance in Population B; 0s are vectors or proportions of genetic material from Population A and matrices of zeros; I is an identity matrix; and σ is the Population B, respectively, in the dam of animal j . B oth residual variance. For prediction with genomic infor- GT diagonal and off-diagonal elements in A can only be NRM NRM NRM AB mation, A was replaced with H and A was A A B non-zero for descendants of crossbred animals. NRM replaced with H . The pedigree for the GT method included all the ani - The vector of predicted breeding values for the NRM mals, purebred and crossbred, in generations 32 through method was:     We used two genomic relationship matrices for the GT {ˆa } 0 A P GT GT NRM     method; G and G . Generally, the single-step proce- ebv = 0 + {ˆa } , B P (5) A B dure for the GT method requires that marker alleles are {ˆa } {ˆa } A C B C phased and traced such that their breed of origin can where a ˆ and a ˆ are the vectors of predicted addi- be determined [13]; however, tracing the breed of ori- A B tive genetic effects in the statistical model for the NRM gin of alleles was not required in this study because we method (Eq.  4); subscript P denotes that the sub-vector only used genotypes from purebred animals. Therefore, GT GT only contains predicted effects from purebred animals; G and G were the same as the genomic relationship A B GT NRM subscript C denotes that the sub-vector only contains matrices for the NRM method; i.e., G = G and A A GT NRM predicted effects from crossbred animals; and 0s are vec - G = G . The single-step procedure [20, 21] was B B tors of zeros. used for the partial relationship matrices for breed-spe- cific terms. The combined partial relationship matrices The GT method for breed-specific terms for genotyped and non-gen - GT GT This method partitions the additive genetic relationship otyped animals were H and H for animals with A B into several partial relationship matrices [9]: one for each genetic contributions from Populations A and B, respec- breed (partial relationship matrices for breed-specific tively. The partial relationship matrix for the segregation GT GT terms; A and A in our study), and one for each pair term did not include genomic information. A B of breeds (partial relationship matrices for segregation The statistical model for the GT method was: GT terms; A in our study). The partial relationship matrix AB y = Xb + Z a + Z a + Z a + e, A A B B AB AB (8) for segregation terms captures the increase in additive genetic variance in crossbred animals [4, 9]. where y, b, and X are as described for the statistical GT The recursive algorithm for calculating A is [9]: model for the NRM method (Eq.  4); a is a vector of breed-specific partial additive genetic effects from Popu - f + a , i = j sd a = (6) lation A; a is a vector of breed-specific partial additive ij B (a + a ), otherwise, is id genetic effects from Population B; a is a vector of addi- AB tive genetic segregation effects between Populations A where i , j , s , and d are as described for the algorithm and B; e is a vector of residuals; and Z , Z , and Z are A B AB for the NRM method (Eq.  2); a is the pedigree-based ij design matrices. covariance between the breed-specific partial additive A The four vectors of random effects (a , a , a and e ) A B AB genetic effects of animals i and j; and f is the propor- were assumed to be distributed as: tion of genetic material from Population A in animal i . GT    The sub-matrix of A for purebred animals is identical     GT 2 A σ a 0 A A NRM A A to its analogous sub-matrix of A . GT 2    0A σ  a  0 B A GT   B    ∼ N   , , The recursive algorithm for A is:   GT 2  a 0 AB AB 0 0A σ AB A AB e 0 2 0 0 0Iσ (9) Poulsen et al. Genetics Selection Evolution (2022) 54:25 Page 6 of 17 GT NRM NRM where A is a partial relationship matrix for the are identical to submatrices from A and A for A A B GT breed-specific term from Population A; A is a par- purebreds animals, respectively. tial relationship matrix for the breed-specific term from The SM method is equivalent to random- GT Population B; A is a partial relationship matrix for the regressions of additive genetic effects on AB segregation term between Populations A and B; σ is F , F , and F , respectively [12], because A B AB 2 NRM T NRM T the additive genetic variance in Population A; σ is the F a ∼ N 0, F A F , F a ∼ N 0, F A F , A A A B B B A A A B B 2 NRM T additive genetic variance in Population B; σ is the seg- and F a ∼ N 0, F A F . In this study, we AB AB AB A AB AB AB regation variance between Populations A and B; 0s are apply the SM method through random-regression. vectors or matrices of zeros; I is an identity matrix; and Three pedigrees were constructed for the SM method: σ is the residual variance. For prediction with genomic one for each purebred population, which are identical to GT GT GT information, A was replaced with H and A was those for the NRM method, and the third is for the par- A A B GT replaced with H . tial relationship matrix for the segregation term between The vector of predicted breeding values for the GT Populations A and B. The partial relationship matrix for method was: the segregation term between Populations A and B was calculated with a pedigree from which all purebred and       {ˆa } 0 0 A P F1 animals had been removed. 0 {ˆa } 0       GT B P ebv = + + , We did not use the same pedigree for segregation       (10) {ˆa } {ˆa } 0 A C:F 1 B C:F 1 effects as described by the SM method [12]. They used {ˆa } {ˆa } aˆ A C:R B C:R AB the full pedigree to construct an additive genetic relation- where a ˆ , a ˆ , a ˆ are the vectors of predicted partial ship matrix on which they applied random regression. A B AB additive genetic effects in the statistical model for the GT However, using the full pedigree may promote discre- method (Eq.  8); subscript P denotes that the sub-vector prancies between the GT and SM methods. According only contains the predicted effects from purebred ani - to the GT method, segregation effects are independent mals; subscript C:F1 denotes that the sub-vector only among all offspring from F1 animals and their magni - contains the predicted effects from F1 crossbred animals; tude only depend on the breed proportions of parental subscript C:R denotes that the sub-vector only contains animals. For the SM method, a deep pedigree for seg- the predicted effects from rotationally crossbred animals; regation effects would increase the likelihood of both and 0s are vectors of zeros. non-zero inbreeding coefficients in offspring from F1 animals and covariance between offspring from F1 ani - The SM method mals. Therefore, the compliance between the GT and SM This method is an approximation of the GT method and methods should be greater if purebred and F1 animals it partitions the additive genetic variance in the same are removed from the pedigree for segregation effects, as way. done in this study. The relationship matrices for the SM method are cal - The genomic relationship matrices for this method culated as: were the same as for both the NRM and GT methods. As for the GT method, we calculated combined relationship GT SM NRM A ≈A = F A F , (11) A A A A A matrices for genotyped and non-genotyped animals for the breed-specific terms but not for the segregation term. GT SM NRM The statistical model for the SM method was: A ≈A = F A F , B B (12) B B B y = Xb + Z F a + Z F a + Z F a + e, A A A B B B AB AB AB GT SM NRM (14) A ≈A = F A F , (13) AB AB AB AB AB where F , F , and, F are as defined for the calculation A B AB where F and F are diagonal matrices with square roots A B of partial relationship matrices with the SM method of breed proportions for populations A and B, respec- (Eqs.  11, 12, 13); and the remaining components are NRM tively; A is a NRM-based relationship matrix repre- AB the same as in the statistical model for the GT method senting at least all the descendants of crossbred animals; (Eq. 8). Note that the additive genetic vectors now consist and F is a diagonal matrix with square roots of the AB of regression coefficients. A B A B “ 2 f f + f f ” term from Eq. 7. As for the GT method, s s d d The four vectors of random effects (a , a , a and e ) A B AB SM SM the sub-matrices of A and A for purebred animals A B were assumed to be distributed as: P oulsen et al. Genetics Selection Evolution (2022) 54:25 Page 7 of 17        NRM 2 A σ a 0 A A A A NRM 2     a  0 0A σ   B A  B (15) ∼ N , ,     NRM 2    a 0 AB 0 0A σ AB A AB e 0 2 0 0 0Iσ 2 2 2 2 σ ∗ where σ , σ , σ , σ , 0, and I are as in the statistical γ A A A e A A B AB Ŵ = = 8 , (17) NRM NRM 2 γ γ ∗ ∗ model for the GT method (Eq.  8); A and A are σ σ AB B ∗ p p A B A B p as in the statistical model for the NRM method (Eq.  4); NRM and A is the usual numerator relationship matrix where γ is the metafounder relationship for Population AB based on the pedigree without purebred and F1 animals. A; γ is the metafounder relationship for Population B; NRM For prediction with genomic information, A was γ is the metafounder relationship between Populations AB NRM NRM NRM 2 replaced with H and A was replaced with H . A and B; σ is the  variance of marker allele frequencies A B B The vector of predicted breeding values for the SM in Population A; σ ∗ is the variance of marker allele fre- method was: ∗ ∗ quencies in Population B; σ is a  covariance between p p A B       ∗ marker allele frequencies in Populations A and B; p and {ˆa } 0 0 A P p are marker-allele frequencies in the base populations 0 {ˆa } 0       SM B P B ebv = + + ,       (16) {ˆa } {ˆa } 0 of Populations A and B, respectively; and the asterisk A C:F 1 B C:F 1 ∗ ∗ {ˆa } {ˆa } aˆ superscripts in p and p denote that allele annotations A C:R B C:R AB A B ∗ ∗ 1 were randomized such that E(p ) = E(p ) = . A B 2 where a ˆ , a ˆ , a ˆ are the vectors of predicted partial A B AB In this study, metafounder relationships were calcu- additive genetic effects in the statistical model for the SM lated with estimated marker allele frequencies in genera- method (Eq.  14); and 0, subscript P, subscript C:F1, and tion 32. We estimated marker allele frequencies as subscript C:R are as defined for the GT method (Eq. 10). proposed by Gengler et al. [24] and genotypes from pure- bred animals in generations 35 to 39. Marker allele fre- The MF method quencies were estimated independently for each This method is conceptually different from the other purebred population. Finally, metafounder relationships methods. The other methods model populations as sep - were calculated from markers that have a minor allele arate entities while the MF method models populations frequency higher than 0.01 when averaged across the as sub-populations derived from a common ancestral purebred base-populations. The average metafounder population. In practice, this is done by identifying each relationship matrix across replicates, Ŵ , wa s : sub-population through a metafounder, calculating an γ¯ 0.80 Ŵ = = . additive genetic relationship matrix, Ŵ , between meta- γ¯ γ¯ 0.38 0.80 AB B founders, and then incorporating this information into The recursive algorithm for the MF method is: 1 + γ , i = j ∧ i ∈ m A A  1 1 + γ , i = j ∧ i ∈ m B B  γ , i �= j ∧{i, j}⊂ m A A (18) a = γ , i �= j ∧{i, j}⊂ m ij B B γ , i �= j ∧ [(i ∈ m ∧ j ∈ m ) ∨ (i ∈ m ∧ j ∈ m )]  AB A B B A 1 + a , i = j ∧ i �∈ {m ,m } sd A B  2 (a + a ), otherwise, is id where i , j , s , and are as in the recursive algorithms for one shared additive genetic relationship matrix for all the NRM and GT methods (Eqs.  4 and 8); a is as in the ij populations, A(Ŵ) . In theory, this method should simul- recursive algorithm for the NRM method (Eq.  4); γ , γ , A B taneously account for both the breed-specific terms and and γ are the metafounder relationships (Eq.  17); m AB A the segregation term [14]. is a vector of base animals from Population A; m is a The metafounder relationships can be calculated in vector of base animals from Population B; ∧ is the logi- several ways [14, 23]. We used the method proposed by cal “and”; and ∨ is the logical “or”. Please note that the Garcia-Baccino et al. [23]: last two elements of the recursive algorithm for the MF Poulsen et al. Genetics Selection Evolution (2022) 54:25 Page 8 of 17 method are the same as in the algorithm for the NRM The vector of predicted breeding values for the MF method. In other words, the only differences between method was: the NRM method and the MF method are that base ani- MF (21) ebv = aˆ , mals are related and their inbreeding coefficient can be greater than zero. These differences then carry over into where a ˆ is the vector of predicted additive genetic effects the additive genetic relationships for animals which are in the statistical model for the MF method (Eq. 19). not in the base population. The pedigree for the MF method included all the ani - Variance components mals, purebred and crossbred, in generations 32 to 39. We estimated variance components for each method The MF method uses one genomic relationship matrix and its respective statistical model (Eqs. 4, 8, 14, and 19). MF across all populations; G . A preliminary genomic rela- Variance components were only estimated with pedigree VanRaden tionship matrix, G , was calculated using Van- information. Breeding values were predicted with these Raden’s first method [22], and genotypes from purebred estimated variance components regardless of whether animals in generations 35 to 39; however, we scaled and breeding values were predicted with or without genomic centered the genomic relationship matrix with allele fre- information. quencies of 0.5. Markers were included in the genomic The estimated variance components for the phenotype relationship matrix if their minor allele frequency was without dominance effects are in Table  1. For presenta- higher than 0.01 when pooling genotypes from the pure- tion only, the estimated additive genetic variance from bred base-populations. The positive definiteness of the the MF method was transformed using the estimated genomic relationship matrix was ensured by using the metafounder relationships, Ŵ , such that the parametriza- VanRaden weighted average of G and the sub-matrix of tion was the same as for the GT method [14]: genotyped animals from the pedigree-based relationship MF VanRaden matrix: G = 0.05{A(Ŵ)} + 0.95G , where 2 2 σ = σ 1 − γ A A A MF {A(Ŵ)} is the sub-matrix of genotyped animals from the 22 2 pedigree-based relationship matrix. The genomic rela - 2 2 MF , (22) σ = σ 1 − γ tionship matrix, G , was not scaled and centered such A A B MF that its average diagonal and off-diagonal elements were MF 2 2 equal to those of {A(Ŵ)} because G and {A(Ŵ)} 22 22 σ = σ (γ + γ − 2γ ) A B AB A A AB MF MF 8 are comparable when G and Ŵ are calculated with the same set of markers. We calculated a combined rela- 2 2 2 where , , and are the partial additive genetic σ σ σ A A A A B AB tionship matrix for genotyped and non-genotyped ani- variance components; σ is the estimated additive MF mals [14, 20, 21], H(Ŵ) , because some animals were not genetic variance in the ancestral population (Eq. 20); and genotyped. γ , γ , and γ are metafounder relationships (Eq. 8). A B AB The statistical model for the MF method was: We calculated true partial additive genetic variance components and used them as reference for the magni- y = Xb + Za + e, (19) tude of the estimated variance components in Table  1. The true partial additive genetic variance components where y, b, and X are as described for the NRM method (Eq. 4); a is a vector of additive genetic effects; e is a vec- tor of residuals; and Z is a design matrix. The two vectors of random effects (a and e ) were dis- Table 1 Means and standard deviations of variance tributed as: components across replicates 2 2 2 2 2 Method σ σ σ σ A(Ŵ)σ e a 0 A A A A B AB MF ∼ N , , (20) e 0 0Iσ a e True 0.15 ± 0.01 0.15 ± 0.01 0.023 ± 0.003 0.80 GT 0.15 ± 0.04 0.14 ± 0.04 0.035 ± 0.039 0.80 ± 0.02 where A( ) is the additive relationship matrix with MF 0.15 ± 0.02 0.15 ± 0.02 0.026 ± 0.005 0.80 ± 0.02 metafounders [14]; Ŵ is the additive relationship matrix SM 0.25 ± 0.07 0.19 ± 0.06 0.297 ± 0.052 0.61 ± 0.04 between metafounders; σ is the additive genetic vari- MF NRM 0.23 ± 0.05 0.22 ± 0.05 0.51 ± 0.07 ance in the ancestral population; 0s are vectors or matri- 2 2 σ : Additive genetic variance for Population A ces of zeros; I is an identity matrix; and σ is the residual AA σ : Additive genetic variance for Population B variance. The additive genetic relationship matrix, A(Ŵ) , A σ : Additive genetic segregation variance between populations A and B was replaced with H(Ŵ) when breeding values were pre- AB σ : Residual variance dicted with genomic prediction. The true residual variance was constant across replicates P oulsen et al. Genetics Selection Evolution (2022) 54:25 Page 9 of 17 were calculated with the parametrization of the GT methods used to compare the methods. We stratified the method and the phenotype without dominance effects. comparison according to population. The true partial additive genetic variance for breed- specific effects from Population A was calculated as: True breeding values The true breeding value depends on whether the pheno - qtl 2 type includes only an additive genetic term, or both addi- σ = 2 p 1 − p β − β , (23) i,A i,A i,1 i,2 tive genetic and dominant genetic terms. i=1 The true breeding values with only an additive genetic where σ is the true partial additive genetic variance for term were calculated as: breed-specific effects from Population A; n is the num- qtl tbv = Q(β − β ) + 2Jβ , A (25) 1 2 2 ber of QTL; p is the allele frequency at QTL i in base i,A animals from Population A; β is the additive genetic i,1 where Q is a QTL genotype matrix with allelic loads of effect of the first QTL allele at QTL i; and β is the addi- i,2 the first allele; β is a vector of additive genetic effects tive genetic effect of the second QTL allele at QTL i. The of the first QTL allele; β is a vector of additive genetic partial additive genetic variance for breed-specific effects effects of the second QTL allele; and J is a matrix of 1s from Population B was calculated in the same way. with dimensions equal those of Q. The true partial additive genetic variance for segrega - In the presence of dominance, the true breeding value tion effects between Populations A and B was calculated of an animal depends on its ability to promote both addi- as [4]: tive and dominance genetic effects in its offspring [28, n 29]. Therefore, the true breeding value now depends on qtl the genotypes of the mate. True breeding values with a σ =2 p 1 − p β − β i,F 1 i,F 1 i,1 i,2 AB dominance term can be calculated with allele frequen- i=1 (24) cies from the population of mating candidates [28, 29]. 2 2 − σ + σ , A A The true breeding values with both an additive term and A B a dominance term were calculated as: where β , β , and n are as in Eq.  23; σ is the true i,1 i,2 qtl AB X Q partial additive genetic variance for segregation effects tbv = tbv + (Q − J) (1 − 2p ) ◦ d , (26) A X AD between Populations A and B; p is a vector of QTL i,F 1 allele frequencies in generation 33 of Population C; σ is where X ∈{A, B, C} denotes the population to which the true partial additive genetic variance for breed-spe- the possible mating candidates belong; p is a vector of cific effects from Population A; and σ is the true partial QTL allele frequencies in population X; d is a vector additive genetic variance for breed-specific effects from of dominant QTL effects; ◦ is the Hadamard product; 1 Population B. is a vector of ones; and tbv , Q, β , β , and J are as for 1 2 true breeding values with only an additive genetic term Software for analysis and prediction (Eq. 25). Most data-handling was carried out in the R-software [25]. The relationship matrices for the GT and MF meth - Accuracy and bias ods were calculated using the RcppArmadillo R-package We evaluated the methods according to their prediction [26]. The relationship matrices for the NRM and SM accuracy and prediction bias. We used two measures for methods were calculated using the DMU software [27]. the prediction bias [30, 31]: level bias and dispersion bias. Variance components were estimated using the AI-ReML The prediction accuracy was defined as Pearson’s corre - algorithm in the DMU software package [27]. Additive lation between true breeding values and predicted breed- genetic effects were predicted using the best linear unbi - ing values: ased prediction (BLUP) method and the Preconditioned Accuracy = ρ(tbv, ebv), (27) Conjugate Gradient algorithm implemented in DMU software [27]. where ρ(.) is the Pearson correlation function; tbv is a vector of true breeding values; and ebv is a vector of pre- Comparison of the methods dicted breeding values. General The level bias was calculated as: We compared how well the methods predicted accurate and unbiased breeding values in animals from generation μ = ebv − tbv + tbv , (28) bias base 39. In the following, we describe how we calculated true breeding values, accuracies, and biases, and the statistical Poulsen et al. Genetics Selection Evolution (2022) 54:25 Page 10 of 17 groups within which validation parameters were com- where μ is the level bias; ebv is the mean predicted bias pared; and n = 8 is the number of unique combinations breeding value in validation animals; tbv is the mean true between methods and use of genomic information. breeding values in validation animals; and tbv is the base mean true breeding value in base animals. The correction Expected pattern in results for tbv was required because the true breeding values, base The accuracies and biases are expected to differ between in contrast to predicted breeding values, differed from populations A, B, and C. For animals in generation 39 zero in the base populations. There is no level bias when of Population A, halfsibs are the closest relatives with μ is equal to 0. bias phenotypes. For animals in generation 39 of Population The dispersion bias was calculated as: B, cousins are the closest relatives with phenotypes. For cov(tbv, ebv) animals in generation 39 of Population C, own perfor- b = , bias (29) var(ebv) mance is available for all animals. Therefore, we expect that prediction is most accurate in Population C, less where b is the dispersion bias; cov() is the empirical bias accurate in Population A, and least accurate in Popula- covariance; ebv is a vector of predicted breeding values; tion B. tbv is a vector of true breeding values; and var() is the empirical variance. There is no dispersion bias when b bias Results is equal to 1. Prediction accuracy Generally, the GT and MF methods were as accurate or Statistical analysis of accuracy, level bias, and dispersion bias more accurate than the SM and NRM methods (Table 2). The accuracies and biases were compared across meth - Use of genomic information always increased the predic- ods, use of genomic information, and replicates; but tion accuracy (Table 2). not across populations and definition of true breeding For Population A, the methods were equally accurate values. for prediction of breeding values without genomic infor- We used non-parametric tests because accuracies and mation (median: 0.37–0.41, Table 2). When breeding val- biases were heteroscedastic across methods and not nor- ues were predicted with genomic information, the MF mally distributed. and GT methods were the most accurate (median: 0.59– We investigated whether a method was more accurate 0.65, Table 2). The SM method was generally as accurate or biased than others using paired Wilcoxon signed rank as the MF and GT methods (median: 0.58–0.65, Table 2), tests. We used paired tests to compare the methods to while the NRM method was always the least accurate remove the variation caused by the stochastic simulation; (median: 0.56–0.63). i.e., the methods were paired within replicates. Further- For Population B, the methods were equally accu- more, we investigated whether the methods were biased rate for prediction of breeding values without genomic using the one-sample Wilcoxon signed rank tests. The information (median: 0.29–0.35, Table  2). When breed- null hypotheses for these tests were that the level biases ing values were predicted with genomic information, were equal to 0 and that the dispersion biases were equal the MF method and the GT method were the most to 1. accurate for prediction of any definition of true breed - We used the Bonferroni-correction of p-values to con- ing value (median: 0.50–0.55, Table  2) while the SM trol for multiple testing: α = α/n = 0.05/1000 , bon tests and NRM methods were the least accurate (median: where α is the significance level and n is the tests 0.48–0.51). number of statistical tests. Among the 1000 tests, For Population C, the GT and MF methods were the n × n × n × (n − 1)/2 = 840 were com- p g m m most accurate (median: 0.61–0.63, Table  2). The least parisons between validation parameters and accurate methods were the SM method (median: 0.57) (n − 1) × n × n = 160 were tests for whether vali- p g m and the NRM method (median: 0.48–0.49), respectively dation parameters differed from expected values, where (Table 2). n = 3 is the number of validation parameters (accuracy, level bias, and dispersion bias); n = 10 is the number of g P oulsen et al. Genetics Selection Evolution (2022) 54:25 Page 11 of 17 Table 2 Median prediction accuracy across replicates Table 3 Median level bias across replicates Population × y y Population y y A AD A AD method × method Purebred F1 Rotation Purebred F1 Rotation Population A (Purebred) Population A (Purebred) c c c c a a a a GT 0.41 (0.09) 0.37 (0.11) 0.40 (0.11) 0.39 (0.12) GT −0.02 (0.08) 0.29 (0.09) −0.03 (0.08) 0.12 (0.08) c c c c a a a a MF 0.41 (0.09) 0.37 (0.11) 0.40 (0.11) 0.39 (0.12) MF −0.02 (0.08) 0.28 (0.09) −0.03 (0.08) 0.12 (0.08) c c c c a a a a NRM 0.39 (0.09) 0.37 (0.09) 0.40 (0.11) 0.37 (0.11) NRM −0.01 (0.08) 0.29 (0.10) −0.04 (0.09) 0.12 (0.09) c c c c a a a a SM 0.41 (0.09) 0.37 (0.10) 0.41 (0.09) 0.40 (0.12) SM −0.03 (0.08) 0.28 (0.09) −0.03 (0.09) 0.11 (0.09) a a a a a a a a ssGT 0.65 (0.06) 0.59 (0.08) 0.60 (0.08) 0.60 (0.08) ssGT −0.01 (0.07) 0.28(0.07) −0.01 (0.07) 0.13 (0.07) a a a a a a a a ssMF 0.65 (0.06) 0.59 (0.08) 0.60 (0.07) 0.60 (0.08) ssMF −0.01 (0.08) 0.28 (0.07) −0.04 (0.07) 0.13 (0.08) a a a a b b b b ssNRM ssNRM 0.00 (0.08) 0.29 (0.08) −0.02 (0.09) 0.14 (0.09) 0.63 (0.05) 0.56 (0.07) 0.58 (0.06) 0.58 (0.07) a a a a a a ab b ssSM 0.00 (0.07) 0.29 (0.08) −0.03 (0.08) 0.13(0.08) ssSM 0.58 (0.08) 0.60 (0.08) 0.65 (0.04) 0.60 (0.07) Population B (Purebred) Population B (Purebred) a a a a c c c c GT 0.00 (0.07) 0.28 (0.09) −0.03 (0.08) 0.04 (0.07) GT 0.34 (0.12) 0.31 (0.11) 0.34 (0.12) 0.34 (0.12) a a a a c c c c MF 0.00 (0.07) 0.28 (0.10) −0.02 (0.08) 0.04 (0.07) MF 0.33 (0.12) 0.31 (0.12) 0.34 (0.12) 0.34 (0.12) a a a a c c c c NRM 0.01 (0.08) 0.28 (0.10) 0.03 (0.09) 0.05 (0.08) NRM 0.33 (0.11) 0.30 (0.11) 0.31 (0.13) 0.31 (0.13) a a a a c c c c SM 0.01 (0.07) 0.29 (0.08) −0.03 (0.08) 0.04 (0.07) SM 0.35 (0.11) 0.29 (0.11) 0.31 (0.11) 0.32 (0.11) a a a a a a a a ssGT −0.01 (0.06) 0.28 (0.08) −0.03 (0.07) 0.04 (0.06) ssGT 0.53 (0.10) 0.50 (0.08) 0.54 (0.10) 0.53 (0.09) a a a a a a a a ssMF 0.01 (0.06) 0.28 (0.08) −0.01 (0.07) 0.05 (0.07) ssMF 0.55 (0.09) 0.51 (0.07) 0.54 (0.08) 0.54 (0.08) a a a a b b b b ssNRM 0.01 (0.07) 0.28 (0.08) −0.02 (0.08) 0.04 (0.08) ssNRM 0.50 (0.11) 0.50 (0.09) 0.50 (0.09) 0.51 (0.08) a a a a ssSM 0.01 (0.06) 0.28 (0.08) −0.02 (0.07) 0.04 (0.07) b b b b ssSM 0.51 (0.11) 0.48 (0.09) 0.49 (0.08) 0.50 (0.08) Population C (Crossbred) Population C (Crossbred) a a GT 0.00 (0.05) 0.20 (0.07) b b GT 0.62 (0.03) 0.61 (0.03) a a MF 0.00 (0.06) 0.20 (0.07) b b MF 0.62 (0.03) 0.62 (0.03) a a NRM 0.00 (0.06) 0.19 (0.07) NRM 0.48 (0.02) 0.48 (0.02) a a SM 0.00 (0.06) 0.20 (0.06) d d SM a a 0.57 (0.02) 0.57 (0.02) ssGT 0.00 (0.06) 0.20 (0.07) a a a a ssGT 0.63 (0.03) 0.62 (0.03) ssMF 0.00 (0.06) 0.20 (0.07) a a ssMF 0.63 (0.03) 0.62 (0.03) a a ssNRM 0.00 (0.06) 0.21 (0.06) e e ssNRM 0.48 (0.02) 0.49 (0.03) a a ssSM 0.00 (0.06) 0.20 (0.06) c c ssSM 0.57 (0.02) 0.57 (0.02) Median absolute deviations from medians are in parentheses Median absolute deviations are in parentheses Level Bias: Difference between change in predicted and true breeding values relative to in the base population Accuracy: Pearson’s correlation between true breeding values and predicted breeding values Bold: Medians in bold differ significantly from zero Superscripts: Different superscripts denote that medians are significantly Superscripts: Different superscripts denote that medians are significantly different different Superscripts are comparable within combinations of Population and column Superscripts are comparable within combinations of Population and column ss-prefix: Relationship matrices include genomic information ss-prefix: Relationship matrices include genomic information y : A phenotype with additive genetic effects A y : A phenotype with additive genetic effects y : A phenotype with both additive and dominant genetic effects AD y : A phenotype with both additive and dominant genetic effects AD Purebred: True breeding value is for production of purebred animals Purebred: True breeding value is for production of purebred animals F1: True breeding value is for production of F1-animals F1: True breeding value is for production of F1-animals Rotation: True breeding value is for mating with rotationally crossbred animals Rotation: True breeding value is for mating with rotationally crossbred animals Level bias Dispersion bias The level biases were not statistically significantly differ - In general, the dispersion biases for the GT, MF, and SM ent from 0 for the phenotype without a dominant genetic methods were not statistically significantly different from term (Table  3). For the phenotype with a dominant 1 (Table 4). genetic term, the level biases were statistically signifi - cantly different from 0 for mating an animal with another animal from the same population. Poulsen et al. Genetics Selection Evolution (2022) 54:25 Page 12 of 17 Table 4 Median dispersion bias across replicates genetic term (median: 0.78–0.94). The dispersion biases for the NRM method were always statistically signifi - Population y y A AD cantly different from 1. × method Purebred F1 Rotation For Population B, the dispersion biases for the GT and MF methods were not statistically significantly different Population A (Purebred) ab ab ab ab from 1 (median: 0.91–1.12). The dispersion biases for GT 0.98 (0.22) 0.85 (0.22) 0.89 (0.28) 0.86 (0.27) the SM method were only statistically significantly dif - ab b ab ab MF 0.96 (0.22) 0.84 (0.20) 0.88 (0.24) 0.85 (0.24) ferent from 1 when breeding values were predicted with c c c c NRM 0.74 (0.17) 0.62 (0.17) 0.65 (0.15) 0.65 (0.18) genomic information and the phenotype did not include ab b b b SM 0.94 (0.24) 0.78 (0.25) 0.84 (0.26) 0.79 (0.25) a dominant genetic term (median: 1.20). The dispersion a a a a ssGT 1.01 (0.14) 0.90 (0.15) 0.96 (0.14) 0.95 (0.14) biases for the NRM method was statistically significantly ab ab ab ab ssMF 1.00 (0.11) 0.88 (0.12) 0.93 (0.12) 0.90 (0.11) different from 1 in almost all cases (median: 0.67–0.82). c c c c ssNRM 0.76 (0.10) 0.67 (0.11) 0.69 (0.10) 0.66 (0.10) For Population C, the dispersion biases for the GT and b b b b ssSM 0.94 (0.15) 0.84 (0.18) 0.86 (0.15) 0.84 (0.17) MF methods were not statistically significantly different Population B (Purebred) from 1 when the phenotype did not include a dominant a a a bc GT 1.02 (0.44) 1.01 (0.43) 1.01 (0.41) 1.12 (0.40) genetic term (median: 0.99–1.02). The dispersion biases a a a bc MF 0.98 (0.39) 1.02 (0.42) 0.96 (0.44) 1.10 (0.37) for the SM and NRM methods were always statistically d b b b NRM significantly different from 1. 0.82 (0.30) 0.72 (0.33) 0.74 (0.34) 0.73 (0.33) a a a a SM 1.26 (0.57) 0.94 (0.47) 1.15 (0.51) 1.13 (0.50) c a a a Discussion ssGT 1.09 (0.16) 0.98 (0.20) 0.99 (0.17) 0.96 (0.16) c a a a As hypothesized, the GT and MF methods were generally ssMF 1.04 (0.17) 0.91 (0.19) 0.93 (0.16) 0.93 (0.15) d b b b the most accurate and least biased methods for predic- ssNRM 0.80 (0.16) 0.67 (0.15) 0.68 (0.14) 0.67 (0.12) a a a tion of breeding values with phenotypes from rotation- ab ssSM 1.02 (0.27) 0.98 (0.23) 0.97 (0.22) 1.20 (0.30) ally crossbred animals. The SM method was almost as Population C (Crossbred) accurate as the GT and MF methods but was also more a a GT 1.00 (0.06) 0.92 (0.06) biased. The NRM method was the least accurate and a a MF 1.01 (0.06) 0.93 (0.05) most biased of the methods. e e NRM 0.40 (0.04) 0.34 (0.03) c c SM 0.54 (0.05) 0.46 (0.04) The GT and MF methods a a ssGT 0.99 (0.07) 0.92 (0.06) We found that the GT and MF methods performed simi- a a ssMF 1.00 (0.07) 0.93 (0.05) larly for prediction of breeding values with phenotypes d d ssNRM 0.41 (0.04) 0.34 (0.03) from rotationally crossbred animals. This is in accord - b b ssSM 0.55 (0.05) 0.46 (0.04) ance with the fact that the MF method, in theory, can Median absolute deviations from medians are in parentheses account for both breed-specific terms and segregation Dispersion Bias: Linear regression coefficient of true breeding values onto terms from GT method [14]. More specifically, the GT predicted breeding values and MF methods are equivalent when the tranforma- Bold: Medians in bold differ significantly from zero tions of Eq.  22 yield the estimated variance components Superscripts: Different superscripts denote that medians are significantly from the GT method. However, this relies on the accu- different rate estimation of the metafounder relationships which Superscripts are comparable within combinations of Population and column ss-prefix: Relationship matrices include genomic information has some degree of estimation error. Fortunately for the y : A phenotype with additive genetic effects A MF method, it is the relative sizes of γ , γ , and γ which A B AB y : A phenotype with both additive and dominant genetic effects AD determine the relative sizes of the partial additive genetic 2 2 2 Purebred: True breeding value is for production of purebred animals parameters, σ , σ , and σ (Eq. 22). As long as Eq. 22 A A A A B AB F1: True breeding value is for production of F1-animals holds true, changes to the metafounder relationships are Rotation: True breeding value is for mating with rotationally crossbred animals accounted for through changes to the estimated additive genetic variance in the ancestral population, σ . MF One major advantage of the MF method is that genomic For Population A, the dispersion biases for the GT and information can readily be included in the additive rela- MF methods were not statistically significantly different tionship matrix with metafounders using the single-step from 1 for the phenotype without a dominant genetic procedure [14], regardless of the genetic composition of term (median: 0.94–1.01). The dispersion biases for SM the animals in the relationship matrix. On the contrary, method were not statistically significantly different from the single-step procedure has only been developed for 1 when breeding values were predicted without genomic the partial relationship matrices for breed-specific terms information or the phenotype was without a dominant P oulsen et al. Genetics Selection Evolution (2022) 54:25 Page 13 of 17 from the GT method; i.e., a combined partial relation- the NRM method and the pedigree tracing breed-spe- ship matrix for both genotyped and non-genotyped ani- cific genetic effects from Population A. Meanwhile, the mals for segregation terms has not been developed [6]. covariance for partial additive breed-specific effects from This may make the MF method more appropriate than Population B between the same animals was the same for the GT method when rotationally crossbred animals are the GT and SM methods: genotyped. 1 3 3 1 3 SM NRM:B NRM:B B B Based on this study, it is not possible to conclude a = f f a + a = [1 + 1] = , ij s i j d 4 4 4 4 8 whether the GT or MF method is better for the analysis 1 1 1 3 GT B B of these specific populations. a = f + f = 1 + = , ij s d 4 4 2 8 (31) The SM method where superscript NRM:B denotes that the covariance This method was generally as accurate and unbiased as was calculated with the NRM method and the pedigree the GT and MF methods for prediction in purebred ani- that traces breed-specific genetic effects from Population mals but less accurate and more biased for prediction in B; f is the breed proportion from Population B; and the rotationally crossbred animals (Tables 2 and 4). The inac - other terms are as for Eq. 30. curacy and bias of the SM method may be caused by its It is simple to see that the GT and SM methods do inability to properly separate the phenotype into its com- not always produce identical relationships. However, it ponents (Table 1). is challenging to explain how discreprancies between The SM method is only an approximation to the the GT and SM methods across the three partial addi- GT method and discreprancies between the two are tive relationship matrices affect the partitioning of ran - expected. For example, for the GT method and disregard- dom effects. Nevertheless, according to our study, the ing inbreeding, the covariance between siblings depends SM method seems to be a good approximation of the GT on the diagonal elements of their shared parents (Eq.  7). method when the aim is to predict breeding values in Meanwhile, for the SM method, the covariance between purebred animals. siblings depends on the product between their own regression covariates for partial additive genetic effects The NRM method (Eqs.  11, 12, 13). Consequently, the SM method is a bet- This method has the most inaccurate assumptions for ter approximation to the GT method between animals additive genetic effects among the methods investigated. where the weighted average of diagonal and off-diagonal In rotationally crossbred animals between divergent elements of common ancestors is equal to the product purebred populations, the model does not fit the data between the animals’ regression covariates for partial if the partial additive genetic variances due to breed- additive genetic effects and their additive genetic covari - specific effects are not proportional to breed propor - ance according to the NRM method. tions [10], and the segregation variance is not modelled In a rotational crossbreeding system, breed proportions [4]. Therefore, it was expected that this method was the differ across generations. Consequently, the weighted least accurate and most biased among those investigated average of diagonal and off-diagonal elements of com - (Tables 2, 3, 4). mon ancestors can differ from the product between the The NRM method is a common approach for multi- animals’ regression covariates for partial additive genetic breed analyses. The main argument for the NRM method effects and their additive genetic covariance according is that it is commonly implemented into softwares for to the NRM method. For example, in this study and dis- genetic evaluations. However, we argue that the GT, regarding inbreeding, the covariance of partial additive MF, and SM methods either are accessible or can easily breed-specific effects from Population A between full become accessible. Currently, the GT or MF methods sibs i and j from generation 34 and Population C was not may not be implemented in softwares for genetic evalua- the same for the GT and SM methods: tions, but both random regression and the NRM method 1 1 1 1 1 are. The combination of random regression and the NRM SM A A NRM:A NRM:A a = f f a + a = [0 + 1] = , ij i j s d 4 4 4 4 16 method enables the use of the SM method which, in this 1 1 1 1 GT A A study, was more accurate and less biased than the NRM a = f + f = 0 + = , ij s d 4 4 2 8 method (Tables  2, 3, 4). In the future, the GT and MF (30) methods should become accessible through  their imple- where subscripts i, j, s, and d denote animals; f is the mentation into commonly used softwares for genetic breed proportion from Population A; and superscript evaluations. The implementation of both the GT and NRM:A denotes that the covariance was calculated with MF methods is simple as the algorithms for directly Poulsen et al. Genetics Selection Evolution (2022) 54:25 Page 14 of 17 computing their inverse covariance matrices are very A simulation design with less diverged purebred popu- similar to the algorithm for the NRM method [9, 14]. lations would most likely yield the same ranking of the Consequently, the time required for implementing the methods but with less absolute differences between their GT and MF methods should be greatly reduced as a large prediction accuracies. proportion of program code from the NRM method can In this study, only genetic drift caused changes in be reused. All things considered, we do not recommend allele frequencies. In practice, allele frequencies are also the NRM method for genetic analyses with phenotypes affected by selection. Simulating selection would most of rotationally crossbred animals, because its alternatives likely also change the results. However, we have no rea- are more accurate, less biased, and easily accessible. son to believe that selection would change the ranking between the methods, because all the methods theoreti- Simulation design cally can account for selection, and because their mecha- Results from simulation studies are most relevant when nism for doing so is the same [8, 9, 12, 14]. the simulated populations are representative of real pop- ulations. Populations can be described with several Genotypes from crossbred animals parameters, however, the divergence between the popu- It is simpler to incorporate genomic information from lations is a key argument for the relevance of multibreed crossbred animals into some methods than into others. relationship matrices [4]. The magnitude of divergence For the MF, SM, and NRM methods, genomic informa- between two populations can be represented by the ratio tion on crossbreds can be incorporated as for purebred between the segregation variance and the additive genetic animals. For the GT method, it becomes necessary to 2 1 2 1 2 2 trace the breed of origin of alleles to construct genomic variance in F2 animals: σ / σ + σ + σ ; which, AB 2 A 2 B AB relationship matrices for breed-specific terms [13]. in turn, can be calculated using the metafounder rela- Furthermore, to our knowledge, it is not known how tionships (Eq.  22). Using this measure, the average mag- genomic information should be incorporated into par- nitude of divergence between Populations A and B is 15% tial relationship matrices for segregation terms. Although based on the metafounder relationships (Table 1). Mean- it is simple to incorporate genomic information for the while, this measure for the magnitude of divergence is MF, SM, and NRM methods, it is not known whether the 16% between DanBred Landrace and DanBred Yorkshire resulting relationship matrices correctly represent the pigs [32], 15% between Hereford and Zebu cattle [33], additive genetic covariance between animals. In particu- and on average 11% (min: 3%, max: 25%) between sub- lar, this is the case for the SM method and our applica- populations of Manech Tête Rousse sheep [34]. There - tion of the NRM method, as they are approximations. fore, the magnitude of divergence between populations A Although relevant, it was outside the scope of this study and B is representative of the divergence between real to compare the methods in a scenario with genomic populations. information from crossbred animals. It would have been reasonable to compare the meth- ods with a different simulation design, which would most Synthetic breeds likely give a different result. However, the purebred popu - This study was on genetic analyses with rotationally cross - lations need to have diverged from each other; otherwise bred animals, but our results may also apply to other segregation effects would be small. We ensured that the genetic analyses of mixed populations. For example, some purebred populations had diverged by simulating sepa- breeding companies create synthetic breeds. In practice, rate population bottlenecks in the two populations, and not a shared population bottleneck; by only sampling 50 animals (0.2% of the historical population) when found- Table 5 Example pedigree ing the purebred populations; by keeping the effec - tive population sizes small in the purebred populations Id Sire Dam Breed ( N ≈ 50 animals); and by isolating the purebred popu- 1 0 0 A lations for 32 generations prior to the pedigreed genera- 2 0 0 B tions. In a scenario where the purebred populations had 3 0 0 A only slightly diverged from each other, segregation effects 4 0 0 B would be small and the additive genetic variances would 5 1 2 – be the same in the purebred populations. This would 6 3 4 – diminish the argument for partial additive relationship 7 5 6 – matrices for the breed-specific terms and the segrega - 8 5 6 – tion term. In other words, it would be better to regard the 9 1 7 – two purebred populations as one purebred population. P oulsen et al. Genetics Selection Evolution (2022) 54:25 Page 15 of 17 Table 6 Breed-specific relationship matrix for the GT-method with the sample pedigree GT 1 3 5 6 7 8 9 1 1.00 3 1.00 5 0.50 0.50 6 0.50 0.50 7 0.25 0.25 0.25 0.25 0.50 8 0.25 0.25 0.25 0.25 0.25 0.50 9 0.63 0.13 0.38 0.13 0.38 0.25 0.88 Upper triangle and zeroes are omitted Table 7 Breed-specific relationship matrix for the NRM-method with the sample pedigree NRM 1 3 5 6 7 8 9 1 1.00 3 1.00 5 0.50 1.00 6 0.50 1.00 7 0.25 0.25 0.50 0.50 1.00 8 0.25 0.25 0.50 0.50 0.50 1.00 9 0.63 0.13 0.50 0.25 0.63 0.38 1.13 Upper triangle and zeroes are omitted Table 8 Breed-specific relationship matrix for the SM-method with the sample pedigree SM 1 3 5 6 7 8 9 1 1.00 3 1.00 5 0.35 0.50 6 0.35 0.50 7 0.18 0.18 0.25 0.25 0.50 8 0.18 0.18 0.25 0.25 0.25 0.50 9 0.54 0.11 0.31 0.15 0.38 0.23 0.84 Upper triangle and zeroes are omitted Table 9 Relationship matrix for the segregation term with Table 10 Relationship matrix for the segregation term with sample pedigree and the GT-method sample pedigree and the SM-method GT SM 7 8 9 7 8 9 A A AB AB 7 1.00 7 1.00 8 1.00 8 0.50 1.00 9 0.50 0.50 9 0.44 0.27 0.56 Upper triangle and zeroes are omitted Upper triangle and zeroes are omitted Poulsen et al. Genetics Selection Evolution (2022) 54:25 Page 16 of 17 Table 11 Relationship matrix with sample pedigree and the MF-method MF 1 2 3 4 5 6 7 8 9 1 1.33 2 0.10 1.25 3 0.66 0.10 1.33 4 0.10 0.50 0.10 1.25 5 0.72 0.68 0.38 0.30 1.05 6 0.38 0.30 0.72 0.68 0.34 1.05 7 0.55 0.49 0.55 0.49 0.70 0.70 1.17 8 0.55 0.49 0.55 0.49 0.70 0.70 0.70 1.17 9 0.94 0.29 0.60 0.29 0.71 0.54 0.86 0.62 1.27 Upper triangle is omitted. The matrix was calculated with: γ = 0.66 , γ = 0.1 , and γ = 0.50 AB B synthetic breeds are crossbred populations and they are Appendix subject to the same mechanisms as other crossbred popula- Appendix 1: Multibreed relationship matrices with a small tions. The only difference between a rotationally crossbred example pedigree population and a synthetic breed is that sires are not nec- The differences between the NRM, GT, and SM methods essarily purebred for synthetic breeds. Similar to the rota- are easier to understand through examples. This example tionally crossbred populations, the complex distributions of is based on a pedigree with both purebred animals, F1 genetic effects may complicate accurate and unbiased pre - animals, F2 animals, and a F2-backcross animal (Table 5). diction of breeding values in synthetic breeds. Our results We use the GT method as reference because it is theo- may assist with the choice of method for the relationship retically correct. matrix used in genetic analysis of synthetic breeds. The methods yield different additive relationship matri - ces for the term from breed A (Tables 6, 7, 8). The NRM Solving BLUP equation systems method calculates the correct relationships for purebred The choice between methods may also be impacted by animals; but is erroneous after it encounters crossbred their computational requirements. For all the relationship animals. The diagonal elements for crossbred animals are matrices that were studied here, the inverse can be directly not scaled according to their breed proportions, and this computed [8, 9, 14]. However, the resulting equation sys- error affects both diagonal and off-diagonal elements for tems differ in dimensions and sparseness. Using the GT, descendants of the crossbred animals (Tables  6 and  7). SM, or NRM method results in a larger equation system The SM method yields the same diagonal elements as the than with the MF method; especially with large numbers GT method in the absence of inbreeding (Table  8). The of breeds and crossbred animals. Meanwhile, the MF off-diagonal elements between F1 and F2 crossbred ani - method contains more non-zero elements than the other mals are also correct. The off-diagonal elements between methods; and using the MF method with the single-step purebred animals and crossbred animals are erroneous, procedure may require the inversion of one large genomic and so is the off-diagonal element for the F2-backcross relationship matrix rather than the inversion of smaller (animal 9; Table 8). genomic relationship matrices as with the other meth - The methods also yield different partial additive rela - ods. Comparison of computational demands between the tionship matrices for the segregation term (Tables  9 and methods was outside the scope of this study but it could be 10). The SM method calculates non-zero off-diagonal ele - relevant when computer hardware is a limiting factor. ments for related animals where the off-diagonal element is zero for the GT method. Furthermore, the off-diagonal Conclusion elements between animals 7 and 9 are erroneous as is the In the scenarios that  we investigated, models using the diagonal element for animal 9. additive relationship matrix with metafounders [14] or The relationship matrix from the MF method is not the partial relationship matrices by García-Cortés and directly comparable to those from the other methods Toro [9] were generally more accurate and less biased (Table  11) although it is theoretically equal to the GT than those using the partial relationship matrices by method [14]. Strandén and Mäntysaari [12] or the usual numerator relationship matrix [8]. P oulsen et al. Genetics Selection Evolution (2022) 54:25 Page 17 of 17 Acknowledgements 13. Christensen OF, Legarra A, Lund MS, Su G. Genetic evaluation for three- The authors thank M. Henryon and A.C. Sørensen for discussions on the simu- way crossbreeding. Genet Sel Evol. 2015;47:98. lation design. Furthermore, we thank A. Sampathkumar for assistance with 14. Legarra A, Christensen OF, Vitezica ZG, Aguilar I, Misztal I. Ancestral reducing the computational demands of the study. relationships using metafounders: finite ancestral populations and across population relationships. Genetics. 2015;200:455–68. Authors’ contributions 15. Poulsen BG, Nielsen B, Ostersen T, Christensen OF. Genetic associations BGP simulated the data, analyzed the data, and wrote the manuscript. OFC, between stayability and longevity in commercial crossbred sows, and TO, and BN supervised and assisted at all stages of the study, including the stayability in multiplier sows. J Anim Sci. 2020;98:skaa183. writing of the manuscript. All authors read and approved the final manuscript. 16. Sargolzaei M, Schenkel F. Qmsim: a large-scale genome simulator for livestock. Bioinformatics. 2009;25:680–1. Funding 17. Falconer DS, Mackay T. Eec ff tive population size. Introduction to quantita- This work is partly funded by the Innovation Fund Denmark (IFD) under File tive genetics. Harlow: Prentice Hall; 1996. p. 65–72. No. 9065-00070B. IFD has had no role in the design of the study, collection 18. Wellmann R, Bennewitz J. The contribution of dominance to the under- data, data analysis, interpretation of data, and in writing the manuscript. standing of quantitative genetic variation. Genet Res. 2011;93:139–54. 19. Henryon M, Berg P, Ostersen T, Nielsen B, Sørensen AC. Most of the Availability of data and materials benefits from genomic selection can be realized by genotyping a small The datasets used and analysed during the current study are available from proportion of available selection candidates. J Anim Sci. 2012;90:4681–9. the corresponding author on reasonable request. 20. Aguilar I, Misztal I, Johnson DL, Legarra A, Tsuruta S, Lawlor TJ. Hot topic: a unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluation of holstein final score. J Dairy Sci. Declarations 2010;93:743–52. 21. Christensen OF, Lund MS. Genomic prediction when some animals are Ethics approval and consent to participate not genotyped. Genet Sel Evol. 2010;42:2. Not applicable. 22. Vanraden PM. Efficient methods to compute genomic predictions. J Dairy Sci. 2008;91:4414–23. Consent for publication 23. Garcia-Baccino CA, Legarra A, Christensen OF, Misztal I, Pocrnic I, Vitezica Not applicable. ZG, et al. Metafounders are related to fst fixation indices and reduce bias in single-step genomic evaluations. Genet Sel Evol. 2017;49:34. Competing interests 24. Gengler N, Mayeres P, Szydlowski M. A simple method to approximate The authors declare that they have no competing interests. gene content in large pedigree populations: application to the myostatin gene in dual-purpose belgian blue cattle. Animal. 2007;1:21–8. Author details 25. R Core Team. R: a language and environment for statistical computing. R Breeding & Genetics, Danish Agriculture and Food Council, Axelborg, Axeltorv Foundation for Statistical Computing. 2020. https:// www.R- proje ct. org/. 3, Copenhagen W, 1609 Copenhagen, Denmark. Center for Quantita- 26. Eddelbuettel D, Sanderson C. Rcpparmadillo: accelerating r with tive Genetics and Genomics, Aarhus University, Blichers Allé 20, 8830 Tjele, high-performance c++ linear algebra. Comput Stat Data Anal. Denmark. 2014;71:1054–63. 27. Madsen P, Jensen J, Labouriau R, Christensen OF, Sahana G. Dmu—a Received: 26 April 2021 Accepted: 28 February 2022 package for analyzing multivariate mixed models in quantitative genetics and genomics. In: Proceedings of the 10th World Congress of Genetics Applied to Livestock Production: 17–22 August 2014; Vancouver. 2014; pp. 18–22. 28. Falconer DS, Mackay T. Average effect. Introduction to quantitative References genetics. Harlow: Prentice Hall; 1996. p. 112–4. 1. Falconer DS, Mackay T. Correlated response to selection. Introduction to 29. Falconer DS, Mackay T. Breeding values. Introduction to quantitative quantitative genetics. Harlow: Prentice Hall; 1996. p. 317–21. genetics. Harlow: Prentice Hall; 1996. p. 114–6. 2. Wientjes YCJ, Calus MPL. Board invited review: the purebred-crossbred 30. Legarra A, Reverter A. Semi-parametric estimates of population accuracy correlation in pigs: a review of theory, estimates, and implications. J Anim and bias of predictions of breeding values and future phenotypes using Sci. 2017;95:3467–78. the lr method. Genet Sel Evol. 2018;50:53. 3. Oldenbroek K, Waaij LVD. The different crossbreeding systems and their 31. Legarra A, Reverter A. Correction to: semi-parametric estimates of popu- applicability. Textbook animal breeding: animal breeding and genetics lation accuracy and bias of predictions of breeding values and future for BSc students. Wageningen: Centre for Genetic Resources and Animal phenotypes using the lr method. Genet Sel Evol. 2019;51:69. Breeding and Genomics; 2014. p. 236–41. 32. Xiang T, Christensen OF, Legarra A. Technical note: genomic evaluation 4. Lo LL, Fernando RL, Grossman M. Covariance between relatives in multi- for crossbred performance in a single-step approach with metafounders. breed populations: additive model. Theor Appl Genet. 1993;87:423–30. J Anim Sci. 2017;95:1472–80. 5. Wei M, van der Werf JHJ. Maximizing genetic response in crossbreds using 33. Junqueira VS, Lopes PS, Lourenco D, Silva FF, Cardoso FF. Applying the both purebred and crossbred information. Anim Sci. 1994;59:401–13. metafounders approach for genomic evaluation in a multibreed beef 6. Christensen OF, Madsen P, Nielsen B, Su G. Genomic evaluation of both cattle population. Front Genet. 2020;11:fgene.2020.556399. purebred and crossbred performances. Genet Sel Evol. 2014;46:23. 34. Macedo FL, Christensen OF, Astruc JM, Aguilar I, Masuda Y. Bias and accu- 7. Falconer DS, Mackay T. Genetic components of variance. Introduction to racy of dairy sheep evaluations using blup and ssgblup with metafound- quantitative genetics. Harlow: Prentice Hall; 1996. p. 125–31. ers and unknown parent groups. Genet Sel Evol. 2020;52:47. 8. Mrode RA. Genetic covariance between relatives. Linear models for the prediction of animal breeding values. Wallingford: CABI; 2014. p. 22–33. Publisher’s Note 9. García-Cortés LA, Toro MA. Multibreed analysis by splitting the breeding Springer Nature remains neutral with regard to jurisdictional claims in pub- values. Genet Sel Evol. 2006;38:601–15. lished maps and institutional affiliations. 10. Elzo MA. Recursive procedures to compute the inverse of the multiple trait additive genetic covariance matrix in inbred and noninbred multi- breed populations. J Anim Sci. 1990;68:1215–28. 11. Cantet R, Fernando R. Prediction of breeding values with additive animal models for crosses from 2 populations. Genet Sel Evol. 1995;27:323–34. 12. Strandén I, Mäntysaari EA. Use of random regression model as an alterna- tive for multibreed relationship matrix. J Anim Breed Genet. 2013;130:4–9.

Journal

Genetics Selection EvolutionSpringer Journals

Published: Apr 6, 2022

There are no references for this article.