Access the full text.
Sign up today, get DeepDyve free for 14 days.
Insertion or deletion polymorphism (InDel) is one of the main genetic variations in plant genomes. However, there are few studies on InDels across the whole genome in Populus. In this study, we investigated genome-wide InDels in Populus deltoides and Populus simonii and InDel segregation in their F hybrid population with restriction-site associated DNA sequencing (RAD-seq) data. A total of 119,066 InDels were identified in P. deltoides and P. simonii according to the reference genome of Populus trichocarpa, including 58,532 unique InDels in P. deltoides, 54,469 unique InDels in P. simonii, and 6,065 common InDels in both. Meanwhile, the distribution of these InDels was analyzed along chromosomes, indicating that the distribution patterns for both species were largely similar, but the average InDel density was slightly higher in P. deltoides than in P. simonii. GO annotation and enrichment analysis of those genes harboring InDels showed the same patterns between the two poplar species. It is interesting to find that the ratio ( ~ 46%) of the common InDels within genes to all common InDels was higher than that of the InDels within genes to all InDels in P. deltoides (~ 35%) or in P. simonii (~ 34%), possibly indicating that those InDels are more conservative between poplar species. Moreover, investigation of the InDel segregation patterns demonstrated that a large number of Mendelian InDels could be selected for genetic mapping in the F hybrid population. RAD-seq provides genome-wide insights into the InDel distributions in P. deltoides and P. simonii and the segregation patterns in their progeny, providing valuable genomic variation information for genetic and evolutionary studies in Populus. Keywords Populus · Restriction-site associated DNA sequencing · InDels · F hybrid population · Mendelian segregation Introduction long lifespan, and ease of asexual and seed reproduction, this kind of tree has become a model system among forest trees The genus Populus comprises approximately 30 tree spe- (Woolbright et al. 2008; Zhang et al. 2019). With advances cies, naturally distributed in the Northern Hemisphere in sequencing technologies, several species of Populus have (Strauss 1994). Some species not only have many attractive been successively obtained, including P. trichocarpa (Tus- biological characteristics but also possess great economic kan et al. 2006), P. pruinosa (Yang et al. 2017), P. alba (Liu and ecological value (Tong et al. 2016). They are generally et al. 2019b), P. euphratica (Zhang et al. 2020), P. simonii diploid plants (2n = 38), and their genome size is close to (Wu et al. 2020) and P. deltoides (Bai et al. 2021). These that of rice, approximately 480 Mb, 4 times that of Ara- genomic resources provide a fundamental basis for identi- bidopsis thaliana. Due to their small genome, fast growth, fying genetic variations and developing molecular markers between or within Populus species. Next-generation sequencing (NGS) technology allowed us to obtain a large amount of short read data across many Communicated by Yann-Rong Lin. plant individuals in a fast and cost-effective way (Song et Chunfa Tong al. 2015). With available bioinformatics tools, such as BWA firstname.lastname@example.org (Li and Durbin 2009) and SAMtools (Li et al. 2009), a large number of single nucleotide polymorphisms (SNPs) can be Co-Innovation Center for Sustainable Forestry in Southern China, College of Forestry, Nanjing Forestry University, identified and genotyped from the sequencing data of each 210037 Nanjing, China 1 3 Tropical Plant Biology (2022) 15:171–180 individual (Liu et al. 2013). SNPs are the most abundant molecular breeding (Feng et al. 2020; Weber et al. 2002). variations in DNA sequences found in most organisms Two InDels with the same length are extremely unlikely to (Ganal et al. 2009; Hu et al. 2014; McCouch et al. 2010), appear in the same genomic location, which means that the but another form of DNA variation, i.e., a polymorphism shared InDels are considered to have the same origin, thus in the length of the DNA sequence caused by insertion or avoiding the difficulties of subsequent analysis caused by deletion of one or more nucleotides (InDels) at a certain site complexity and specificity (Shedlock and Okada 2000). in the genome (Weber et al. 2002), has received relatively In our previous studies, an F hybrid population descended little attention compared to widely studied SNPs (Liu et al. from a female P. deltoides and male P. simonii was estab- 2013). However, InDels represent the second most abun- lished and successively sequenced with restriction-site asso- dant form of genetic variation in humans and plants (Pena ciated DNA sequencing (RAD-seq) technology (Tong et al. and Pena 2012; Song et al. 2015), and they offer the advan - 2016; Mousavi et al. 2016). Many high-quality (HQ) SNPs tages of a multiallelic nature, codominant inheritance and were extracted across the population to construct high- extensive genome coverage (Das et al. 2015). Compared density genetic linkage maps and to then perform quanti- with SNPs, InDels can be easily identified based on their tative trait locus (QTL) analysis of growth traits, without size (Song et al. 2015). Indeed, InDels have become pow- considering InDels. In this study, we used RAD-seq data to erful molecular markers for species diagnostics (Yamaki et investigate the distribution of InDels in P. deltoides and P. al. 2013), evolutionary studies (Weber et al. 2002), genetic simonii. Moreover, the segregation patterns of these InDels linkage map construction (Song et al. 2015; Li et al. 2015), were also analyzed with progeny from the F hybrid popu- and marker-assisted selection (MAS) breeding (Liu et al. lation. The results facilitate the understanding of the char- 2013). Recently, Zhu et al. (2018) performed restriction- acteristics of InDels in P. deltoides and P. simonii, and the site associated DNA sequencing (RAD-seq) to identify SNP polymorphic InDel markers that follow Mendelian segrega- and InDel markers for constructing a high-density SNP tion law provide a valuable resource for constructing InDel linkage map in Vitis. Meanwhile, Kizil et al. (2020) used genetic maps, conducting marker-assisted selection (MAS) double digested restriction site-associated DNA sequencing breeding, and locating quantitative trait loci (QTLs). (ddRAD-seq) data to develop InDel markers for 95 sesame cultivars. Although the mechanism for the formation of InDels Results remains elusive, various studies have been conducted to investigate their distribution and associated sites or regions Mapping reads to the reference genome in the genome as well as their impact on proteins. Tian et al. (2008) indicated that the occurrence of InDels was generally A total of 915.5 Gb RAD-seq data containing 3,159,482,930 associated with proximal nucleotide divergence. However, paired-end (PE) reads were obtained from P. deltoides and more studies have shown that InDels have a greater impact P. simonii and 47 progeny in the F hybrid population on protein structure and function than SNPs (Ramakrishna (Table 1). After quality control with the NGS QC toolkit et al. 2018). It is well known that InDels can change pro- (Patel and Jain 2012), we obtained 846.9 Gb of HQ read data. tein conformation and lead to major trait differences in The female parent P. deltoides yielded 5.90 Gb (32,457,232) mitochondrial genes (Lin et al. 2017). Therefore, InDels of HQ reads, whereas the male parent P. simonii retained can be used to develop phylogenetic markers. InDel mark- 12.2 Gb (68,406,849) of HQ reads. A total of 828.8 Gb of ers not only have high stability and accuracy but are also HQ reads were obtained from the 47 progeny. With the short easy to amplify through polymerase chain reaction (PCR), read mapping program BWA, 4.92 Gb (27,101,240) of the so they have been proven to be convenient and effective in HQ reads from the female parent and 9.96 Gb (55,888,722) from the male parent were properly mapped to the refer- ence genome of P. trichocarpa. In the progeny, a total of Table 1 Summary of the RAD-seq data for the P. deltoides and P. simo- 687.0 Gb (2,340,191,288) of HQ reads were mapped to the nii and their progeny (average) with the number of bases in brackets reference genome. The mapped HQ reads of the female and Sample Sample Raw reads HQ reads Mapped male parents reached 13-fold and 25-fold effective genome number number (Gb) number reads num- coverage depths, respectively. For the progeny, the effective (Gb) ber (Gb) coverage depth ranged from 15- to 47-fold. Table 1 sum- P. deltoides 1 34,569,761 32,457,232 27,101,240 marizes the raw, HQ, and mapped data of the female and (6.31) (5.90) (4.92) P. simonii 1 71,183,852 68,406,849 55,888,722 male parents as well as the average for the 47 progeny. More (12.71) (12.20) (9.96) detailed information for each sample data is presented in Progeny 47 64,972,964 60,603,042 49,791,304 Table S1. (19.08) (17.63) (14.61) 1 3 Tropical Plant B iology (2022) 15:171–180 173 Distribution of InDels in P. deltoides and P. simonii highest number of InDels was detected on chromosome 1 in both species because its length is the longest. In contrast, A total of 64,597 InDels were obtained in P. deltoides, chromosome 19 possessed the lowest number of InDels in including 32,862 (50.9%) inserted InDels and 31,735 each species. Overall, the number of InDels on chromo- (49.1%) deleted InDels, while in P. simonii 60,534 InDels somes was largely proportional to chromosome length, with were detected, of which 29,828 (49.3%) were inserted a correlation coefficient of 0.891 for P. deltoides and 0.897 InDels and 30,706 (50.7%) were deleted InDels (Table 2). for P. simonii. Furthermore, we calculated the number of Here, the inserted and deleted InDels refer to those in which InDels per 1 Mb on each chromosome. The average number one of the alleles is inserted and deleted compared with the of InDels was 166 per 1 Mb in P. deltoides, ranging from 138 reference, respectively. Table 2 lists the number of InDels for chromosome 11 to 192 for chromosome 9. Meanwhile, detected across the 19 chromosomes. As expected the Table 2 Summary of the number and frequency of InDels within chromosomes in P. deltoides and P. simonii Chr. Length (Mb) Female Male InDels Female frequency Male frequency Female unique Male unique Com- InDels (No.) (No.) (InDels/Mb) (InDels/Mb) InDels (No.) InDels (No.) mon InDels (No.) 1 49.8 8232 7775 165 156 7460 7003 772 2 25.3 4531 4180 179 165 4131 3780 400 3 21.7 3811 3510 176 162 3418 3117 393 4 24.2 3833 3515 158 145 3474 3156 359 5 25.0 4104 4003 164 160 3712 3611 392 6 27.6 4972 4498 180 163 4516 4042 456 7 15.6 2564 2392 164 153 2325 2153 239 8 19.2 3558 3423 185 178 3197 3062 361 9 13.0 2496 2262 192 174 2248 2014 248 10 22.8 3715 3693 163 162 3362 3340 353 11 19.3 2654 2515 138 130 2422 2283 232 12 15.6 2404 2186 154 140 2207 1989 197 13 15.7 2554 2346 163 149 2305 2097 249 14 17.8 3031 2726 172 153 2724 2419 307 15 15.3 2656 2442 174 160 2427 2213 229 16 14.7 2415 2300 164 156 2182 2067 233 17 15.2 2272 2274 149 150 2068 2070 204 18 16.3 2591 2339 159 143 2349 2097 242 19 15.7 2204 2155 140 137 2005 1956 199 Total 389.8 64,597 60,534 166 155 58,532 54,469 6065 Fig. 1 Distribution of the InDel length in P. deltoides and P. simonii. The x-axis represents the InDel length, where the negative numbers indicate the deletions and positive numbers indicate the insertions. The y-axis represents the number of InDels at each length 1 3 174 Tropical Plant Biology (2022) 15:171–180 in P. simonii the average number was 155 with a range from InDels in each species was essentially proportional to the 130 for chromosome 11 to 178 for chromosome 8. chromosome length, with a correlation coefficient of 0.897 Considering the positions, we found 6,065 (~ 5%) InDels for P. deltoides and 0.900 for P. simonii. Furthermore, chro- at the same loci in the genomes of the two species. There mosome 1 harbored the largest number of unique InDels, were 58,532 and 54,469 unique InDels in the two species, with 7,460 unique in P. deltoides, 7,003 unique in P. simonii, accounting for 49% and 46% of all InDels identified, respec - and 772 common InDels unique to both. In contrast, chro- tively (Table 2). We observed that the number of unique mosome 19 contained the fewest InDels, with 2,005 unique InDels in P. deltoides, 1,956 unique InDels in P. simonii, and 772 common unique InDels in both. Except for chromo- Table 3 Distribution of the InDel length in P. deltoides and P. simonii some 17, the number of InDels in each chromosome of P. Length (bp) P. deltoides Percent- P. simonii Per- deltoides was greater than that in P. simonii. InDels (No.) age (%) InDels (No.) cent- The distribution of InDel length was also investigated in age (%) the two species (Fig. 1). In P. deltoides, single-nucleotide 1 19,445 30.1 17,984 29.7 InDels were the most common type, followed by dinucle- 2 10,441 16.2 9441 15.6 otide InDels, and these two types accounted for 46.3% of 3 7825 12.1 7160 11.8 the total InDels. Among all InDels, 90.3% were less than 4 8452 13.1 7836 13.0 or equal to 10 bp, 5.8% were between 11 and 15 bp, and 5 ~ 7 7571 11.7 7258 12.0 3.9% were greater than 15 bp long. In P. simonii, single- 8 ~ 10 4588 7.1 4746 7.8 nucleotide and dinucleotide InDels accounted for 45.3% 11 ~ 15 3731 5.8 3545 5.9 of the total InDels, with 89.9% less than or equal to 10 bp, 2544 3.9 2564 4.2 > 15 5.9% between 11 and 15 bp, and 4.2% more than 15 bp long Total 64,597 100 60,534 100 Fig. 2 Circular representation of the distribution of InDels in P. deltoides and P. simonii along the 19 chromosomes. (A) The 19 chromosomes are shown in different colors. (B) The number of InDels is shown in sliding windows of 1 Mb in each chro- mosome for P. deltoides. (C) The number of InDels is shown in sliding windows of 1 Mb in each chromosome for P. simonii 1 3 Tropical Plant Biology (2022) 15:171–180 175 (Table 3). Apparently, there was a tendency for the number InDels were located in intergenic regions, but only 34.32% of InDels to gradually decrease in accordance with increas- (20,775) of InDels were located in genic regions, of which ing length. 12,729 InDels were in intron regions, 6,394 were in UTR Figure 2 shows the distribution of InDel numbers in slid- regions and 1,652 were in CDS regions. We also found that ing windows of 1 Mb along each chromosome for the two there were 2,790 common InDels for both species located species. The InDel number was unevenly distributed on a in the gene regions, including the CDS, UTR, and intron, single chromosome. In P. deltoides, the number of InDels accounting for 46.00% of the total number of common per 1 Mb ranged from 11 to 274 (Table S2). Among them, InDels. A total of 37,861 unique InDels from the two spe- the largest density of InDels was found on chromosome 8, cies were located in the gene regions, of which P. deltoides while the lowest density was found on chromosome 6. In and P. simonii contained 19,876 and 17,985, accounting for P. simonii, the number of InDels per 1 Mb ranged from 4 34% and 33% of the total number of their unique InDels, to 258, with the largest density on chromosome 15 and the respectively. Among the 37,861 unique InDels, 11,828 (P. lowest on chromosome 8 (Table S3). Moreover, we found deltoides: 6,232 and P. simonii: 5,565) were located in UTR 102 high-density regions with > 200 InDels per Mb and 15 regions, 23,233 (P. deltoides: 12,184 and P. simonii: 11,049) low-density regions with < 50 InDels per Mb in P. deltoides. in intron regions, and 2,800 (P. deltoides: 1,429 and P. simo- Similarly, 59 high-density and 13 low-density regions were nii: 1,371) in CDS regions. Interestingly, we found that the found in P. simonii (Table S4). Most of the chromosomes in ratio (~ 46%) of the common InDels within genes to all both parents were composed of a mixture of high-density common InDels was higher than that of the InDels within and low-density InDel regions that were randomly distrib- genes to all InDels in P. deltoides (~ 35%) or in P. simonii uted in chromosomes. It was noticed that in both species, (~ 34%), possibly indicating that those InDels are more con- there were 8 chromosomes containing no low-density InDel servative between poplar species. regions. In addition, in P. simonii, there were no high-den- It is also interesting to investigate the distribution of sity regions on chromosome 11, and neither high-density the number of InDels in the CDS region per gene because nor low-density regions were found on chromosome 12. they possibly accumulated during evolution. We identified 1,710 and 1,652 InDels in the CDS regions of P. deltoides Functional annotation of InDels and P. simonii, respectively. In P. deltoides, InDels in the CDS region were distributed in 1,543 different genes, with The annotation of the P. trichocarpa reference genome 507 genes harboring two or more InDels. In P. simonii, v4.0 was used to uncover the distribution of InDels within InDels were distributed in 1,450 different genes, with 510 distinct genomic regions. According to the gene structure genes harboring two or more InDels (Table 5). On average, of the reference genome, InDels occurred more frequently 1.11 InDels per gene were detected in P. deltoides, while in intergenic regions than in genic regions. In P. deltoi- 1.14 InDels per gene presented in P. simonii. InDels in des, 64.91% (41,931) of InDels were located in intergenic the CDS region can result in two different types of vari - regions, whereas 35.09% (22,666) of InDels were located ants, frameshift (FS) and nonframeshift (NFS) (Lin et al. in genic regions, of which 13,864 InDels were in intron 2017). There were more NFS InDels than FS InDels in the regions, 7,092 were in UTR regions and 1,710 were in CDS CDS region (Table S5). We detected 570 and 545 FS InDels regions (Table 4). In P. simonii, 65.68% (39,759) of the in P. deltoides and P. simonii, respectively. FS InDels are Table 5 Distribution of numbers of InDels per gene in the CDS regions Table 4 Location distribution of the InDels in P. deltoides and P. simo- nii with percentages in brackets Number of InDels Number of genes in P. Number of Locations InDels in InDels in Unique Unique Com- deltoides genes in P. P. deltoides P. InDels InDels mon simonii simonii in P. in P. InDels 1 1036 940 deltoides simonii 2 301 289 CDS 1710 (2.65) 1652 1429 (2.44) 1371 281 3 92 93 (2.73) (2.52) (4.63) 4 54 59 UTR 7092 6394 6263 5565 829 5 10 20 (10.98) (10.56) (10.70) (10.22) (13.67) 6 13 13 Intron 13,864 12,729 12,184 11,049 1680 7 5 5 (21.46) (21.03) (20.82) (20.28) (27.70) 8 8 11 Intergenic 41,931 39,759 38,656 36,484 3275 9 5 2 (64.91) (65.68) (66.04) (66.98) (54.00) ≥ 10 19 18 Total 64,597 (100) 60,534 58,532 (100) 54,469 6065 Total 1543 1450 (100) (100) (100) 1 3 176 Tropical Plant Biology (2022) 15:171–180 usually considered more deleterious. It is possible that some instances of multiple InDels in the same gene could serve as Fig. 3 Gene Ontology (GO) functional annotations of genes containing InDels within the CDS region for P. simonii (A) and P. deltoides (B). The horizontal axis indicates the GO classification types, and the vertical axis represents the number of annotated protein-coding genes 1 3 Tropical Plant Biology (2022) 15:171–180 177 Table 6 Mendelian segregation patterns of InDels markers compensatory InDels, thus restoring the translation frame Female genotype Male Expectation seg- Num- and resulting in less deleterious mutation (Liu et al. 2015). genotype regation ratio ber of InDels that occur in functionally important regions of InDels genes (typically CDS regions) can affect gene function ab aa 1:1 5811 through frameshifts and structural changes in proteins aa ab 1:1 5294 (Zhang et al. 2016). To better understand the potential func- ab ab 1:2:1 36 tions of InDels within genes, GO term enrichment analysis Total 11,141 of genes containing InDels within the CDS region was per- formed. These genes were classified into three categories: that the heterozygosity of P. simonii was slightly lower than biological processes, cellular components, and molecular that of P. deltoides. Interestingly, the number of SNPs in functions. We found that these genes in the two species P. simonii was also slightly less than that in P. deltoides, exhibited similar categorizing patterns (Fig. 3). Catego- which were called with the same RAD-seq data and refer- ries based on biological processes revealed that the mutant ence genome as in our previous study (Tong et al. 2016), genes were related to 22 biological processes; the three most resulting in a total of 836,895 SNP sites in the same two overrepresented GO terms were cellular process, metabolic parents. We noticed that the total number of SNPs found in process and single-organism process, suggesting that these the two species was approximately 6-fold greater than that mutated genes were involved in a broad range of physiolog- of InDels. Such a phenomenon, that the number of SNPs ical functions. In P. deltoides, these genes were ultimately was much greater than that of InDels, could be expected classified into 14 categories based on cellular components, because SNPs are the most abundant genomic variants while in P. simonii, they were classified into 17 categories; in most species (Ganal et al. 2009; McCouch et al. 2010; the three GO terms cell, cell part and organelle were the Hu et al. 2014). For example, Liu et al. (2019a) identified most abundant. Based on the molecular function category, 7,511,731 SNPs and 255,218 InDels between two tea culti- these genes were ultimately classified into 15 categories; vars, Camellia sinensis var. sinensis and Camellia sinensis binding and catalytic activity were mainly enriched, while var. assamica, where the number of SNPs is approximately other functions only accounted for a small part. 28-fold larger than the InDels. The average densities of InDels in P. deltoides and Analysis of InDel segregation in the progeny P. simonii were approximately 166 and 155 InDels/Mb, respectively, which were higher than that of pepper (71 At each site of all the InDels in the two species (Table 2), we InDels/Mb) (Qin et al. 2014) but much lower than those called genotypes across the two parents and 47 progeny and of cucumber (916 InDels/Mb) (Qi et al. 2013) and tomato performed a chi-squared test for the Mendelian segregation (1,448 InDels/Mb) (Lin et al. 2014). The main reason for ratio. If one of the parental genotypes or 20% of the prog- such significant difference may be attributed to the unique eny genotypes were not called at an InDel site, the site was genome composition or structure of different plant species removed from the dataset. Consequently, a total of 11,141 (Liu et al. 2019a). Simultaneously, we found that the aver- InDel sites were identified for segregation in the progeny age density of InDels on chromosome 19 of P. deltoides and and followed Mendelian segregation ratios of 1:1 and 1:2:1 P. simonii was slightly lower than that on the other chromo- with p ≥ 0.01 (Tables S6-S8). The segregation types of these somes except for chromosome 11. This may be related to InDels included ab×aa, aa×ab, and ab×ab, with numbers of the implication that chromosome 19 was considered to be 5,811, 5,294, and 36, respectively, where the first two letters responsible for sex determination through a ZW system in represent the female parent genotype and the last two the Populus (Yin et al. 2008). male parent genotype (Table 6). It is important to understand the positions of genetic variations in the genome. In P. deltoides, the majority of InDels (64.91%) were more frequently located in intergenic Discussion regions, which may be related to the lower pressure of nat- ural selection and/or domestication in these regions (Bar- In this study, by mapping the HQ RAD-seq reads to the refer- reiro et al. 2008), while the rest (22,666) were found to be ence genome of P. trichocarpa, we found 64,597 and 60,534 in genetic regions, of which only 1,710 InDels were within InDels in female P. deltoides and male P. simonii, respec- CDS regions (Table 4). A similar situation was also observed tively. Because there were 6,065 common InDels in the two in P. simonii. The results showed that a small number of species, these InDels amounted to a total of 119,066 unique InDels were distributed in the CDS region, which could be variant sites (Table 2). The number of InDels in P. simonii explained by the fact that the CDS region only accounts for was slightly less than that in P. deltoides, which indicated a small part of the whole genome in Populus and is more 1 3 178 Tropical Plant Biology (2022) 15:171–180 conserved than other regions (Liu et al. 2019a). InDels reference genome sequence. However, for such a purpose, occurring in the CDS region often have a greater impact the sample size should be increased to much larger than the on genes. FS InDels change the coding sequence of the current study to allow more precise linkage analysis. reading frame starting from the locus of insertion/deletion, producing different protein sequences or premature termi - nation of protein sequences (Lin et al. 2017). These effects Materials and methods are generally considered deleterious, so these InDels may be removed from the population through purification selection Plant materials and sequence data (de la Chaux et al. 2007; Taylor et al. 2004). However, it is possible that multiple InDels in the same gene can also be The plant materials came from an F hybrid population used to restore the coding sequence of the reading frame to of P. deltoides and P. simonii, which was generated from ameliorate the deleterious effects caused by FS InDels. 2009 to 2011 (Tong et al. 2016). The female P. deltoides Since Populus belongs to outbred forest trees with a ‘I-69’ was chosen from Siyang Forest Farm (SFF), Jiangsu long generation time and high heterozygosity, it is almost Province, China, while the male P. simonii ‘L-3’ was col- impossible to obtain a genetic mapping population like in lected from forestland in Luoning County, Henan Province, inbred lines such as the traditional BC and F populations China. Approximately 500 progeny were planted in Xiashu (Wu et al. 2000; Zhang et al. 2009). An F hybrid population Forest Farm of Nanjing Forest University, Jurong County, is usually derived by crossing two individuals for linkage Jiangsu Province, China. In previous studies, we used the mapping in outbred species, especially in forest trees (Grat- enzyme EcoRI to digest the genomic DNA and performed tapaglia and Sederoff 1994; Tong et al. 2020). Maliepaard et RAD sequencing of the two parents and their 418 progeny al. (1997) summarized that the molecular makers in such an (Tong et al. 2016; Mousavi et al. 2016). The PE read data F population possibly segregate in various types, such as for each individual are available based on the SRR acces- ab×aa, aa×ab, and ab×ab. Subsequently, great efforts have sion numbers as listed in Tong et al. (2020). Because most been made to develop statistical methods for genetic linkage individuals have lower genome coverage data, we chose the mapping with these different segregation types of markers two parents and 47 progeny with the highest coverage data (Wu et al. 2002; Tong et al. 2010). In our previous studies for identifying InDel markers and performing the segrega- (Tong et al. 2016, 2020; Mousavi et al. 2016), we only used tion analysis in this study. The accession numbers for these SNPs to construct the linkage maps of P. deltoides and P. selected individual data are also listed in Table S1. simonii. However, it is interesting to use InDels as molecular markers for genetic linkage mapping (Song et al. 2015; Li Identification of InDels et al. 2015). We therefore investigated the segregation pat- terns at all detected InDel sites across the 47 progeny in the The procedure of calling InDel genotypes for each indi- current study. As a result, only 9.36% (11,141) of the total vidual was as follows. (1) The command “mem” of BWA InDel loci were found to follow the Mendelian segregation (Burrows–Wheeler Aligner) software (Li and Durbin 2009) ratio with p ≥ 0.01. Other InDel loci were excluded because was used to align the RAD-seq PE reads to the reference they either had more missing genotypes (≥ 10) or presented genome of P. trichocarpa (v4.0; http://www.phytozome. distorted segregation (p < 0.01) in the progeny. This result net), generating a SAM (sequence alignment/map) format was similar to the situation for calling SNPs in our previous file for each sample. We chose P. trichocarpa as the refer- study (Tong et al. 2016), in which 836,895 SNPs were iden- ence sequence because it is the first genome sequence in tified but only 2,545 (0.30%) were used for linkage map - Populus and has been updated many times since its publica- ping due to uncalled genotypes or distorted segregation. As tion. Simulating the digestion of this reference genome by expected, and similar to the result in SNP calling (Tong et al. EcoRI showed that there are a total of 110,418 enzyme sites, 2016), the majority of Mendelian InDels were segregated in of which 35,990 are within 18,660 genes. (2) Each SAM file the ratio of 1:1 with segregation types of ab×aa and aa×ab was converted to a BAM file and then sorted and indexed (Table 6). This can be attributed to the fact that the two par- with SAMtools software (Li et al. 2009). (3) With the BAM ents belong to different species, and each has high heterozy - files, BCFtools software was used to generate BCF files. gosity in Populus. Our results demonstrated that abundant (4) Two parental VCF (variant call format) (Danecek et InDels could be selected as Mendelian markers with RAD- al. 2011) files were generated from the BCF files with the seq data for genetic linkage mapping studies in an F hybrid command “bcftools call -m –v”. (5) The InDel sites were population. Unlike the traditional PCR-based method (Song extracted from the parental VCF files and saved as a list site et al. 2015; Li et al. 2015), each sample in the mapping file for calling InDel genotypes. (6) For each individual, a population was genotyped by mapping its short reads to a VCF file was generated with its BCF file and the list site file 1 3 Tropical Plant Biology (2022) 15:171–180 179 indicated otherwise in a credit line to the material. If material is not generated above. (7) The InDel genotype of each individual included in the article’s Creative Commons licence and your intended at each site was extracted from its VCF file such that the use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright read depth of an allele was at least 3 and the genotype qual- holder. To view a copy of this licence, visit http://creativecommons. ity was greater than 30. org/licenses/by/4.0/. Location and functional annotation of InDels References The InDel sites were determined according to the reference genome of P. trichocarpa (http://www.phytozome.net). Bai SJ, Wu HN, Zhang JP, Pan ZL, Zhao W et al (2021) Genome The InDels on chromosomes were annotated as genic or assembly of Salicaceae Populus deltoides (Eastern Cottonwood) I-69 based on nanopore sequencing and Hi-C technologies. J intergenic. The genic InDels were classified as CDS, UTR, Hered 112(3):303–310 and intron according to their localization. Next, the genes Barreiro LB, Laval G, Quach H, Patin E, Quintana-Murci L (2008) containing InDels were annotated by first aligning their Natural selection has driven population differentiation in modern coding sequences (CDSs) to the nonredundant protein data- humans. Nat Genet 40(3):340–345 Danecek P, Auton A, Abecasis G, Albers CA, Banks E et al (2011) base (NR) with BLAST and then mapping the blast hits to The variant call format and VCFtools. Bioinformatics Gene Ontology (GO) terms with Blast2GO (https://www. 27(15):2156–2158 blast2go.com). Das S, Upadhyaya HD, Srivastava R, Bajaj D, Gowda CLL et al (2015) Genome-wide insertion-deletion (InDel) marker discov- ery and genotyping for genomics-assisted breeding applications InDel segregation analysis in chickpea. DNA Res 22(5):377–386 de la Chaux N, Messer PW, Arndt PF (2007) DNA indels in coding At each InDel site, the genotypes of the two parents and regions reveal selective constraints on protein evolution in the 47 progeny were tabulated for analysis of segregation in human lineage. BMC Evol Biol 7:191 Feng JJ, Zhu HY, Zhang M, Zhang XX, Guo LP et al (2020) Devel- the population. Then, a chi-square test was performed to opment and utilization of an InDel marker linked to the fertility check whether each InDel followed Mendelian segregation restorer genes of CMS-D8 and CMS-D2 in cotton. Mol Biol Rep ratios, such as 1:1 and 1:2:1, in the progeny. If its p value 47(2):1275–1282 was greater than 0.01 and the number of missing genotypes Ganal MW, Altmann T, Roder MS (2009) SNP identification in crop plants. Curr Opin Plant Biol 12(2):211–217 in the progeny was less than 10 (20%), the InDel site was Grattapaglia D, Sederoff R (1994) Genetic linkage maps of Eucalyptus considered to follow the Mendelian segregation ratio. grandis and Eucalyptus urophylla using a pseudo-testcross: map- ping strategy and RAPD markers. Genetics 137:1121–1137 Supplementary Information The online version contains Hu YY, Mao BG, Peng Y, Sun YD, Pan YL et al (2014) Deep re- supplementary material available at https://doi.org/10.1007/s12042- sequencing of a widely used maintainer line of hybrid rice for 022-09312-y. discovery of DNA polymorphisms and evaluation of genetic diversity. Mol Genet Genomics 289(3):303–315 Author Contributions Conceptualization, C.T. and Z.P.; methodology, Kizil S, Basak M, Guden B, Tosun HS, Uzun B et al (2020) Genome- Z.P. and Z.L.; formal analysis, Z.P., Z.L. and J.Z.; investigation, Z.P., wide discovery of InDel markers in sesame (Sesamum indicum Z.L., J.Z., S.B. and W.Z.; writing-original draft preparation, Z.P.; writ- L.) using ddRADSeq. Plants 9:1262 ing-review and editing, C.T.; supervision, C.T.; project administration, Li H, Durbin R (2009) Fast and accurate short read alignment with C.T.; funding acquisition, C.T. All authors have read and agreed to the Burrows-Wheeler transform. Bioinformatics 25:1754–1760 published version of the manu-script. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J et al (2009) The Sequence Alignment/Map format and SAMtools. Bioinformatics 25:2078–2079 Funding This research was funded by the National Natural Science Li W, Cheng J, Wu Z, Qin C, Tan S et al (2015) An InDel-based link- Foundation of China, grant number 31870654, and the Priority Aca- age map of hot pepper (Capsicum annuum). Mol Breed 35(1):32 demic Program Development of Jiangsu Higher Education Institutions. Lin T, Zhu GT, Zhang JH, Xu XY, Yu QH et al (2014) Genomic anal- yses provide insights into the history of tomato breeding. Nat Declarations Genet 46(11):1220–1226 Lin MX, Whitmire S, Chen J, Farrel A, Shi XH et al (2017) Effects of Conflict of interest The authors declare that they have no known com- short indels on protein structure and function in human genomes. peting financial interests or personal relationships that could have ap - Sci Rep 7:9313 peared to influence the work reported in this paper. Liu B, Wang Y, Zhai W, Deng J, Wang H et al (2013) Development of InDel markers for Brassica rapa based on whole-genome re- Open Access This article is licensed under a Creative Commons sequencing. Theor Appl Genet 126(1):231–239 Attribution 4.0 International License, which permits use, sharing, Liu MM, Watson LT, Zhang LQ (2015) Predicting the combined effect adaptation, distribution and reproduction in any medium or format, of multiple genetic variants. Hum Genomics 9(1):18 as long as you give appropriate credit to the original author(s) and the Liu SR, An YL, Tong W, Qin XJ, Samarina L et al (2019a) Character- source, provide a link to the Creative Commons licence, and indicate ization of genome-wide genetic variations between two varieties if changes were made. The images or other third party material in this of tea plant (Camellia sinensis) and development of InDel mark- article are included in the article’s Creative Commons licence, unless ers for genetic research. BMC Genomics 20(1):935 1 3 180 Tropical Plant Biology (2022) 15:171–180 Liu YJ, Wang XR, Zeng QY (2019b) De novo assembly of white pop- Tuskan GA, Difazio S, Jansson S, Bohlmann J, Grigoriev I et al (2006) lar genome and genetic diversity of white poplar population in The genome of black cottonwood, Populus trichocarpa (Torr. & Irtysh River basin in China. Sci China-Life Sci 62(5):609–618 Gray). Science 313:1596–1604 McCouch SR, Zhao KY, Wright M, Tung CW, Ebana K et al (2010) Weber JL, David D, Heil J, Fan Y, Zhao CF et al (2002) Human Development of genome-wide SNP assays for rice. Breed Sci diallelic insertion/deletion polymorphisms. Am J Hum Genet 60(5):524–535 71(4):854–862 Mousavi M, Tong C, Liu F, Tao S, Wu J et al (2016) De novo SNP Woolbright SA, DiFazio S, Yin T, Martinsen GD, Zhang X et al (2008) discovery and genetic linkage mapping in poplar using restriction A dense linkage map of hybrid cottonwood (Populus fremontii × site associated DNA and whole-genome sequencing technologies. P. angustifolia) contributes to long-term ecological research and BMC Genomics 17:656 comparison mapping in a model forest tree. Heredity 100:59–70 Patel RK, Jain M (2012) NGS QC Toolkit: A toolkit for quality control Wu RL, Han YF, Hu JJ, Fang JJ, Li L et al (2000) An integrated genetic of next generation sequencing data. PLoS ONE 7(2):e30619 map of Populus deltoides based on amplified fragment length Pena HB, Pena SDJ (2012) Automated Genotyping of a Highly Infor- polymorphisms. Theor Appl Genet 100:1249–1256 mative Panel of 40 Short Insertion-Deletion Polymorphisms Wu RL, Ma CX, Painter I, Zeng ZB (2002) Simultaneous maximum Resolved in Polyacrylamide Gels for Forensic Identification and likelihood estimation of linkage and linkage phases in outcross- Kinship Analysis. Transfus Med Hemotherapy 39(3):211–216 ing species. Theor Popul Biol 61:349–363 Qi JJ, Liu X, Shen D, Miao H, Xie BY et al (2013) A genomic variation Wu HN, Yao D, Chen YH, Yang WG, Zhao W et al (2020) De novo map provides insights into the genetic basis of cucumber domes- genome assembly of Populus simonii further supports that Popu- tication and diversity. Nat Genet 45(12):1510–1515 lus simonii and Populus trichocarpa belong to different sections. Qin C, Yu CS, Shen YO, Fang XD, Chen L et al (2014) Whole-genome Genes Genomes Genetics 10(2):455–466 sequencing of cultivated and wild peppers provides insights into Yamaki S, Ohyanagi H, Yamasaki M, Eiguchi M, Miyabayashi Capsicum domestication and specialization. Proc Natl Acad Sci T et al (2013) Development of INDEL markers to discrimi- USA 111(14):5135–5140 nate all genome types rapidly in the genus Oryza. Breed Sci Ramakrishna G, Kaur P, Nigam D, Chaduvula PK, Yadav S et al (2018) 63(3):246–254 Genome-wide identification and characterization of InDels and Yang W, Wang K, Zhang J, Ma J, Liu J et al (2017) The draft genome SNPs in Glycine max and Glycine soja for contrasting seed per- sequence of a desert tree Populus pruinosa. Gigascience 6(9):1–7 meability traits. BMC Plant Biol 18:141 Yin T, DiFazio SP, Gunter LE, Zhang X, Sewell MM et al (2008) Shedlock AM, Okada N (2000) SINE insertions: powerful tools for Genome structure and emerging evidence of an incipient sex molecular systematics. BioEssays 22(2):148–160 chromosome in Populus (Article). Genome Res 18(3):422–430 Song X, Wei H, Cheng W, Yang S, Zhao Y et al (2015) Development Zhang B, Tong CF, Yin TM, Zhang XY, Zhuge QQ et al (2009) Detec- of INDEL markers for genetic mapping based on whole genome tion of quantitative trait loci influencing growth trajectories of resequencing in soybean. G3 5:2793–279912 adventitious roots in Populus using functional mapping. Tree Strauss BSH (1994) Floral phenology and morphology of black Genet Genomes 5:539–552 cottonwood, Populus trichocarpa (Salicaceae). Am J Bot Zhang JZ, Liu SR, Hu CG (2016) Identifying the genome-wide genetic 81(5):562–567 variation between precocious trifoliate orange and its wild type Taylor MS, Ponting CP, Copley RR (2004) Occurrence and conse- and developing new markers for genetics research. DNA Res quences of coding sequence insertions and deletions in mamma- 23(4):403–414 lian genomes. Genome Res 14(4):555–566 Zhang BY, Zhu WX, Diao S, Wu XJ, Lu JQ et al (2019) The poplar Tian D, Wang Q, Zhang P, Araki H, Yang S et al (2008) Single-nucle- pangenome provides insights into the evolutionary history of the otide mutation rate increases close to insertions/deletions in genus. Commun Biology 2:215 eukaryotes. Nature 455(7209):105–108 Zhang ZY, Chen Y, Zhang JL, Ma XZ, Li YL et al (2020) Improved Tong C, Zhang B, Shi J (2010) A hidden Markov model approach genome assembly provides new insights into genome evolu- to multilocus linkage analysis in a full-sib family. Tree Genet tion in a desert poplar (Populus euphratica). Mol Ecol Resour Genomes 6:651–662 20(3):781–794 Tong C, Li H, Wang Y, Li X, Ou J et al (2016) Construction of Zhu JC, Guo YS, Su K, Liu ZD, Ren ZH et al (2018) Construction high-density linkage maps of Populus deltoides × P. simonii of a highly saturated Genetic Map for Vitis by Next-generation using restriction-site associated DNA sequencing. PLoS ONE Restriction Site-associated DNA Sequencing. BMC Plant Biol 11(3):e0150692 18(1):347 Tong C, Yao D, Wu H, Chen Y, Yang W et al (2020) High-quality SNP linkage maps improved QTL mapping and genome assembly in Publisher’s Note Springer Nature remains neutral with regard to juris- Populus. J Hered 111(6):515–530 dictional claims in published maps and institutional affiliations. 1 3
Tropical Plant Biology – Springer Journals
Published: Jun 1, 2022
Keywords: Populus; Restriction-site associated DNA sequencing; InDels; F1 hybrid population; Mendelian segregation
Access the full text.
Sign up today, get DeepDyve free for 14 days.