Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Chromosome Visualization Tool: A Whole Genome Viewer

Chromosome Visualization Tool: A Whole Genome Viewer Hindawi Publishing Corporation International Journal of Plant Genomics Volume 2011, Article ID 373875, 4 pages doi:10.1155/2011/373875 Methodology Report 1 2 Ethalinda K. S. Cannon and Steven B. Cannon Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA 50011, USA United States Department of Agriculture-Agricultural Research Service, Corn Insects and Crop Genetics Research Unit, Ames, IA 50011, USA Correspondence should be addressed to Ethalinda K. S. Cannon, ekcannon@iastate.edu Received 14 September 2011; Accepted 4 November 2011 Academic Editor: Pierre Sourdille Copyright © 2011 E. K. S. Cannon and S. B. Cannon. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. CViT (chromosome visualization tool) is a Perl utility for quickly generating images of features on a whole genome at once. It reads GFF3-formated data representing chromosomes (linkage groups or pseudomolecules) and sets of features on those chromosomes. It can display features on any chromosomal unit system, including genetic (centimorgan), cytological (centiMcClintock), and DNA unit (base-pair) coordinates. CViT has been used to track sequencing progress (status of genome sequencing, location and number of gaps), to visualize BLAST hits on a whole genome view, to associate maps with one another, to locate regions of repeat densities to display syntenic regions, and to visualize centromeres and knobs on chromosomes. 1. Introduction freely downloaded from SourceForge at http://sourceforge .net/projects/cvit/. Visualizing features on a whole genome (all chromosomes CViT was initially developed to support the Medicago together) can be informative for many reasons: for identify- truncatula sequencing project [1] where it was used to dis- ing genome-wide patterns such as gene or repeat densities, play the assembled bacterial artificial chromosomes (BACs) for viewing internal duplications or synteny, for assessing and the status of the sequencing for each BAC. CViT was clustering of genes or repeats or other features, for compar- also wrapped in web pages to create interactive tools: to dis- ing chromosomal structures such as centromeres and peri- centromeric regions, or for looking for associations between play BLAST [2] hits on the whole genome (see http://www different types of genomic features. Several very capable ge- .medicagohapmap.org/advanced search page.php?seq)and nome browsers enable visualization of single chromosomes to search where BACs of interest are anchored on the pseu- or regions, but few visualization tools have been developed domolecules. For other projects, it has also been used to dis- for whole-genome-at-a-time views. We present CViT (chro- play genetic maps and contig assemblies from related species [3] and to correlate genetic and cytogenetic maps (http:// mosome visualization tool), for viewing a wide range of ge- nomic features on an arbitrary set of linear regions— planthub.gdcb.iastate.edu/lawrencelab/Morgan2McClintock/ typically, all of the chromosomes or linkage groups for a ge- Version3.0/)[4]. It has been integrated into model organism nome. databases at the Medicago genome sequencing and HapMap CViT is a set of Perl scripts that generate a PNG (portable projects (http://medicago.org/ and http://medicagohapmap network graphics) image of features on chromosomes. It can .org/)[5], at MaizeGDB (http://maizegdb.org/)[6], at the be executed as a standalone Unix command line utility or PrOject Portal for corn (POPcorn: http://popcorn wrapped in a web page for either static display or as part of .maizegdb.org/)[7], and at the legume-family clade database, the Legume Information System (LIS: http://comparative- an interactive online tool. The characteristics of the output images are highly configurable. A package containing the legumes.org/)[8]. It has also been used to generate analyses CViT code itself along with documentation, examples, sup- and images for publication on genome structure and evolu- tion [9–11]. porting scripts, and sample web implementations can be 2 International Journal of Plant Genomics Several other whole genome viewers exist, but each has a in PHP. It requires libgd (http://bitbucket.org/pierrejoye/gd- different set of capabilities than CViT. One example of a libgd) and the GD Perl packages GD and GD::Arrow whole genome viewer is Flash GViewer (http://gmod.org/ (http://search.cpan.org/dist/GD/). It has been tested on sev- wiki/Flash GViewer), which was developed by the GMOD eral Linux, Unix, and Apple OS X platforms and is expected project (including the GBrowse browser). GViewer is entirely to be operational on any Unix variant that can run Perl and web based (being implemented in Flash), whereas CViT can libgd. be used either as a standalone command line utility or em- The input data, in GFF format, must at a minimum con- bedded in a web page. GViewer also uses an XML format tain information about at least one chromosome. All features for its data, while CViT uses GFF-formated data to make it are related to the chromosome(s) by the value in the first compatible with GBrowse and other browsers. Being imple- (seqid) column. The coordinates of each feature must lie mented in Flash, GViewer slows down when many (100 s) within the start and end coordinates of its chromosome. Fea- of features are displayed. CViT can usually display several tures can be named with the “name” attribute in the last (at- thousand features, as the features are placed on a raster image tribute) column, with the name optionally displayed on the and then handed off to the web client. GViewer is an interac- image, and grouped together with the “class” attribute, with tive tool, while CViT requires additional web programming each class of features displayed in a different color. to be made interactive in a web page; on the other hand, The term “chromosome” here refers to the backbone GViewer does not produce images for publication, which is used to display features. It could in fact be a linkage group, a key use for CViT. CViT also shares some of the capabilities pseudomolecule, BAC, contig, gene, or any stretch of DNA or of another whole genome viewer, the CIRCOS genome data genetic sequence upon which features can be placed. Similar- visualizer [12], with the primary exception being the linear ly, the coordinate system can be based on any unit of mea- chromosome layout in CViT and circular layout in CIRCOS. The circular layout in CIRCOS facilitates display of relation- sure, such as base pairs, centimorgans, centiMcClintocks [4], ships between chromosomes, through arcs that travel within or microns. the circle that is circumscribed by the ring of chromosomes. A “feature” can be just about anything that can be asso- CViT is also capable of displaying within- or between- ciated with “chromosome” coordinates—for example, cen- genome synteny relationships, as shown in Figure 1,but may tromeres, markers, BACs, BLAST hits, repetitive elements, or have a greater strength in displaying other kinds of features, gene loci. Feature densities (such as for genes or repetitive where the linear chromosome layout allows separation of elements) can also be displayed using the histogram glyph. features across whatever linear scale has been selected. In this CViT output includes three files: a PNG image displaying respect, CViT bears some resemblance to genome browsers the chromosomes and features, a legend image describing the such as GBrowse [13], the Ensembl Genome Browser [14], feature glyphs, and a file of feature names and coordinates IGB (the Integrated Genome Browser), Artemis [15], and where they are located on the image. The coordinate file can the UCSC genome browser [16]—although those browsers be used to create interactive web pages—for example, to are designed primarily for close, interactive examination of create an HTML image map to enable clicking on features single chromosomes at various scales rather than large scale to getmoreinformation aboutthemortolink outtoother patterns. CViT may therefore often be useful for providing online resources. a genome-wide overview, with the task of close, interactive Manipulation of the output image is enabled by a simple examination of single chromosomal regions left to one of the but extensive configuration file. This file enables control of genome browsers above. This has been done, for example, in almost all aspects of the output image without touching the the implementation at LIS, where BLAST hits displayed in code, including selecting fonts for labels. Two freely available CViT link to 100 kb GBrowse windows around the hit in the True Type fonts are included in the download package and browser for the corresponding genome. any True Type Font can be added. Colors, transparency, sizes The GFF3 data format (http://www.sequenceontology .org/gff3.shtml), referred to from here on as just “GFF,” was of the glyph, location of glyphs relative to the chromosome, selected because of its ease of use and effective representation location of their text labels, and the appearance of the chro- mosomes themselves and their spacing are all under config- of genetic and genomic information and also because it ena- bles sharing data with GBrowse [13] and other GFF-capable uration control. browsers. As much as possible, we used GBrowse data and Add-on scripts include blastp to gff.pl, which generates a configuration conventions to enable display of the same data GFF file for CViT if provided a GFF file of peptides and either in both CViT and GBrowse. a tabular BLAST output file or a two-column hash of query Add-on scripts are provided in the download package to IDs and peptide IDs, and clusterHSP.pl, which collapses adja- aid in preparation of GFF files—for example, for conversion cent BLAST HSPs that occur within a sliding window (often from BLAST output to GFF. In addition, some web imple- appropriate for a peptide or cDNA query against pseudo- mentations are provided which can be used as is or modified molecule sequences). The web implementations provided to fit specific needs or to serve as examples of how CViT with the package include a simple “CViT-BLAST” imple- could be used as part of a larger online resource. mentation which can be used as is or modified to fit specific needs. There is also a web utility named “CViT-web” which 2. Implementation provides a web interface for generating CViT images. For some users this may be easier than modifying the configu- CViT consists of a package of Perl scripts along with a set of add-on scripts and some basic web implementations written ration file. International Journal of Plant Genomics 3 ×10 Gm01 Gm02 Gm03 Gm04 Gm05 Gm06 Gm07 Gm08 Gm09 Gm10 Gm11 Gm12 Gm13 Gm14 Gm15 Gm16 Gm17 Gm18 Gm19 Gm20 bp 30 Figure 1: Duplicated segments within the soybean (Glycine max) genome. Colored blocks to the left of each chromosome show regions of correspondence with chromosomes of the same color. For example, the light blue blocks at the top of Gm09 correspond with regions on the light blue Gm15, and vice versa. These correspondences are remnants after the Glycine genome duplication. Locations of centromeric repeats are shown as black rectangles over the chromosomes. Regions lacking internal correspondences (generally near chromosome centers) mark the approximate locations of the gene-poor pericentromeres. This figure is modified from the Legume Information System, where sequence- based searches can be made against the Glycine max, Medicago truncatula,and Lotus japonicus genomes, with CViT images displaying the sequence homologies and the synteny relationships among these genomes. ×10 Chr1Chr2Chr3Chr4Chr5Chr6Chr7Chr8 Chr9Chr10 bp 150 Figure 2: Gene density on the 10 chromosomes of Zea mays. Gene density is shown on the Zea mays inbred line B73 RefGen v2 genome assembly [17]. Probable locations of the centromeres are displayed as black bars positioned over the chromosomes [18]. The density of the filtered gene set gene calls is displayed as green bars to the right of the chromosomes, with bar length indicating the number of genes per 400 kbp. 4 International Journal of Plant Genomics 3. Biological Examples japonicus genomes,” Proceedings of the National Academy of Sciences of the United States of America, vol. 103, no. 40, pp. Examples of two online instances of CViT are the soybean, 14959–14964, 2006. Medicago truncatula,and Lotus japonicus genomes at LIS [6] C.J.Lawrence, L. C. Harper,M.L.Schaeffer et al., “MaizeGDB: the maize model organism database for basic, (http://comparative-legumes.org/) and the maize genome at translational, and applied research,” International Journal of MaizeGDB (http://maizegdb.org/). Plant Genomics, vol. 2008, Article ID 496957, 2008. The display of the soybean genome (Figure 1)illustrates [7] E.K.S.Cannon, S. M. Birkett, B. M. Braunetal., “POPcorn: the use of CViT for showing internal synteny (chromosomal an online resource providing access to distributed and diverse correspondences) from a whole genome duplication that is maize project data,” submitted to. International Journal of estimated to have occurred in the Glycine genus between ∼5 Plant Genomics. and 13 Mya [19, 20]. The implementation at LIS also shows [8] S. B. Cannon, G. D. May, and S. A. Jackson, “Three sequenced correspondences between the three sequenced genomes legume genomes and many crop species: rich opportunities for (in addition to the correspondences within the duplicated translational genomics,” Plant Physiology, vol. 151, no. 3, pp. soybean genome). The LIS implementation also allows a 970–977, 2009. sequence search with a multi-FASTA file. Sequence matches [9] C. Ameline-Torregrosa, B. B. Wang, M. S. O’Bleness et al., (BLAST hits) are color coded and link out to the genome “Identification and characterization of nucleotide-binding site-leucine-rich repeat genes in the model plant Medicago browser for the target genome, with browser views centered truncatula,” Plant Physiology, vol. 146, no. 1, pp. 5–21, 2008. around each hit. [10] D. J. Bertioli, M. C. Moretzsohn, L. H. Madsen et al., “An The display of the maize genome (Figure 2) illustrates the analysis of synteny of Arachis with Lotus and Medicago sheds use of CViT for showing gene density, based on gene models new light on the structure, stability and evolution of legume generatedaspartofthe Zea mays inbred B73 genome se- genomes,” BMC Genomics, vol. 10, article 45, 2009. quencing project [17]. The MaizeGDB use of CViT also in- [11] S. B. Cannon and R. C. Shoemaker, “Evolutionary and com- cludes a BLAST utility that displays color-coded hits on the parative analyses of the soybean genome,” Breeding Science.In reference genome with links to GBrowse for closer investiga- press. tion. [12] M. Krzywinski, J. Schein, I. Birol et al., “Circos: an information aesthetic for comparative genomics,” Genome Research, vol. 19, no. 9, pp. 1639–1645, 2009. Acknowledgments [13] L. D. Stein, C. Mungall, S. Shu et al., “The generic genome browser: a building block for a model organism system data- This work was supported in part by Grants from the Nation- base,” Genome Research, vol. 12, no. 10, pp. 1599–1610, 2002. al Science Foundation Plant Genome Research Program [14] P. Flicek, B. L. Aken, B. Ballester et al., “Ensembl’s 10th year,” DBI-0321460 to Nevin Young (for initial development by Nucleic Acids Research, vol. 38, supplement 1, pp. D557–D562, S.B.Cannon and E.K.S.Cannon), DBI-0743804 to Carolyn Lawrence and DBI-1027527 to Patrick Schnable. The authors [15] J. W. Nicol, G. A. Helt, S. G. Blanchard Jr., A. Raja, and A. thank Carolyn Lawrence for critical reading of the paper, E. Loraine, “The integrated genome browser: free software for Shelley Wang and Atif Ahmed for contributions to earlier distribution and exploration of genome-scale datasets,” Bioin- formatics, vol. 25, no. 20, pp. 2730–2731, 2009. versions of the software, Reka Keleman for integrating CViT [16] J. Zhu, J. Z. Sanborn, S. Benz et al., “The UCSC cancer genom- in the Morgan2McClintock translator, and Benjamin Mu- ics browser,” Nature Methods, vol. 6, no. 4, pp. 239–240, 2009. laosmanovic for contributions to accessory scripts and to the [17] P. S. Schnable, D. Ware, R. S. Fulton et al., “The B73 maize LIS implementation of CViT. genome: complexity, diversity, and dynamics,” Science, vol. 326, no. 5956, pp. 1112–1115, 2009. [18] T. K. Wolfgruber, A. Sharma, K. L. Schneider et al., “Maize References centromere structure and evolution: sequence analysis of cen- [1] N. D. Young, S. B. Cannon, S. Sato et al., “Sequencing the tromeres 2 and 5 reveals dynamic loci shaped primarily by retrotransposons,” PLoS Genetics, vol. 5, no. 11, Article ID genespaces of Medicago truncatula and Lotus japonicus,” Plant Physiology, vol. 137, no. 4, pp. 1174–1181, 2005. e1000743, 2009. [19] J. J. Doyle and A. N. Egan, “Dating the origins of polyploidy [2] S. F. Altschul, T. L. Madden, A. A. Schaffer et al., “Gapped BLAST and PSI-BLAST: a new generation of protein database events,” New Phytologist, vol. 186, no. 1, pp. 73–85, 2010. search programs,” Nucleic Acids Research, vol. 25, no. 17, pp. [20] J. Schmutz, S. B. Cannon, J. Schlueter et al., “Genome sequence of the palaeopolyploid soybean,” Nature, vol. 463, no. 7278, 3389–3402, 1997. [3] R. W. Innes, C. Ameline-Torregrosa, T. Ashfield et al., “Dif- pp. 178–183, 2010. ferential accumulation of retroelements and diversification of NB-LRR disease resistance genes in duplicated regions follow- ing polyploidy in the ancestor of soybean,” Plant Physiology, vol. 148, no. 4, pp. 1740–1759, 2008. [4] C. J. Lawrence, T. E. Seigfried, H. W. Bass, and L. K. Anderson, “Predicting chromosomal locations of genetically mapped loci in maize using the Morgan2McClintock Translator,” Genetics, vol. 172, no. 3, pp. 2007–2009, 2006. [5] S. B. Cannon, L. Sterck, S. Rombauts et al., “Legume genome evolution viewed through the Medicago truncatula and Lotus International Journal of Peptides Advances in International Journal of BioMed Stem Cells Virolog y Research International International Genomics Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 Journal of Nucleic Acids International Journal of Zoology Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 Submit your manuscripts at http://www.hindawi.com The Scientific Journal of Signal Transduction World Journal Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 International Journal of Advances in Genetics Anatomy Biochemistry Research International Research International Microbiology Research International Bioinformatics Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 Enzyme Journal of International Journal of Molecular Biology Archaea Research Evolutionary Biology International Marine Biology Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png International Journal of Plant Genomics Hindawi Publishing Corporation

Chromosome Visualization Tool: A Whole Genome Viewer

Loading next page...
 
/lp/hindawi-publishing-corporation/chromosome-visualization-tool-a-whole-genome-viewer-Ko4ynJ52j1
Publisher
Hindawi Publishing Corporation
Copyright
Copyright © 2011 Ethalinda K. S. Cannon and Steven B. Cannon. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
ISSN
1687-5370
DOI
10.1155/2011/373875
Publisher site
See Article on Publisher Site

Abstract

Hindawi Publishing Corporation International Journal of Plant Genomics Volume 2011, Article ID 373875, 4 pages doi:10.1155/2011/373875 Methodology Report 1 2 Ethalinda K. S. Cannon and Steven B. Cannon Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, IA 50011, USA United States Department of Agriculture-Agricultural Research Service, Corn Insects and Crop Genetics Research Unit, Ames, IA 50011, USA Correspondence should be addressed to Ethalinda K. S. Cannon, ekcannon@iastate.edu Received 14 September 2011; Accepted 4 November 2011 Academic Editor: Pierre Sourdille Copyright © 2011 E. K. S. Cannon and S. B. Cannon. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. CViT (chromosome visualization tool) is a Perl utility for quickly generating images of features on a whole genome at once. It reads GFF3-formated data representing chromosomes (linkage groups or pseudomolecules) and sets of features on those chromosomes. It can display features on any chromosomal unit system, including genetic (centimorgan), cytological (centiMcClintock), and DNA unit (base-pair) coordinates. CViT has been used to track sequencing progress (status of genome sequencing, location and number of gaps), to visualize BLAST hits on a whole genome view, to associate maps with one another, to locate regions of repeat densities to display syntenic regions, and to visualize centromeres and knobs on chromosomes. 1. Introduction freely downloaded from SourceForge at http://sourceforge .net/projects/cvit/. Visualizing features on a whole genome (all chromosomes CViT was initially developed to support the Medicago together) can be informative for many reasons: for identify- truncatula sequencing project [1] where it was used to dis- ing genome-wide patterns such as gene or repeat densities, play the assembled bacterial artificial chromosomes (BACs) for viewing internal duplications or synteny, for assessing and the status of the sequencing for each BAC. CViT was clustering of genes or repeats or other features, for compar- also wrapped in web pages to create interactive tools: to dis- ing chromosomal structures such as centromeres and peri- centromeric regions, or for looking for associations between play BLAST [2] hits on the whole genome (see http://www different types of genomic features. Several very capable ge- .medicagohapmap.org/advanced search page.php?seq)and nome browsers enable visualization of single chromosomes to search where BACs of interest are anchored on the pseu- or regions, but few visualization tools have been developed domolecules. For other projects, it has also been used to dis- for whole-genome-at-a-time views. We present CViT (chro- play genetic maps and contig assemblies from related species [3] and to correlate genetic and cytogenetic maps (http:// mosome visualization tool), for viewing a wide range of ge- nomic features on an arbitrary set of linear regions— planthub.gdcb.iastate.edu/lawrencelab/Morgan2McClintock/ typically, all of the chromosomes or linkage groups for a ge- Version3.0/)[4]. It has been integrated into model organism nome. databases at the Medicago genome sequencing and HapMap CViT is a set of Perl scripts that generate a PNG (portable projects (http://medicago.org/ and http://medicagohapmap network graphics) image of features on chromosomes. It can .org/)[5], at MaizeGDB (http://maizegdb.org/)[6], at the be executed as a standalone Unix command line utility or PrOject Portal for corn (POPcorn: http://popcorn wrapped in a web page for either static display or as part of .maizegdb.org/)[7], and at the legume-family clade database, the Legume Information System (LIS: http://comparative- an interactive online tool. The characteristics of the output images are highly configurable. A package containing the legumes.org/)[8]. It has also been used to generate analyses CViT code itself along with documentation, examples, sup- and images for publication on genome structure and evolu- tion [9–11]. porting scripts, and sample web implementations can be 2 International Journal of Plant Genomics Several other whole genome viewers exist, but each has a in PHP. It requires libgd (http://bitbucket.org/pierrejoye/gd- different set of capabilities than CViT. One example of a libgd) and the GD Perl packages GD and GD::Arrow whole genome viewer is Flash GViewer (http://gmod.org/ (http://search.cpan.org/dist/GD/). It has been tested on sev- wiki/Flash GViewer), which was developed by the GMOD eral Linux, Unix, and Apple OS X platforms and is expected project (including the GBrowse browser). GViewer is entirely to be operational on any Unix variant that can run Perl and web based (being implemented in Flash), whereas CViT can libgd. be used either as a standalone command line utility or em- The input data, in GFF format, must at a minimum con- bedded in a web page. GViewer also uses an XML format tain information about at least one chromosome. All features for its data, while CViT uses GFF-formated data to make it are related to the chromosome(s) by the value in the first compatible with GBrowse and other browsers. Being imple- (seqid) column. The coordinates of each feature must lie mented in Flash, GViewer slows down when many (100 s) within the start and end coordinates of its chromosome. Fea- of features are displayed. CViT can usually display several tures can be named with the “name” attribute in the last (at- thousand features, as the features are placed on a raster image tribute) column, with the name optionally displayed on the and then handed off to the web client. GViewer is an interac- image, and grouped together with the “class” attribute, with tive tool, while CViT requires additional web programming each class of features displayed in a different color. to be made interactive in a web page; on the other hand, The term “chromosome” here refers to the backbone GViewer does not produce images for publication, which is used to display features. It could in fact be a linkage group, a key use for CViT. CViT also shares some of the capabilities pseudomolecule, BAC, contig, gene, or any stretch of DNA or of another whole genome viewer, the CIRCOS genome data genetic sequence upon which features can be placed. Similar- visualizer [12], with the primary exception being the linear ly, the coordinate system can be based on any unit of mea- chromosome layout in CViT and circular layout in CIRCOS. The circular layout in CIRCOS facilitates display of relation- sure, such as base pairs, centimorgans, centiMcClintocks [4], ships between chromosomes, through arcs that travel within or microns. the circle that is circumscribed by the ring of chromosomes. A “feature” can be just about anything that can be asso- CViT is also capable of displaying within- or between- ciated with “chromosome” coordinates—for example, cen- genome synteny relationships, as shown in Figure 1,but may tromeres, markers, BACs, BLAST hits, repetitive elements, or have a greater strength in displaying other kinds of features, gene loci. Feature densities (such as for genes or repetitive where the linear chromosome layout allows separation of elements) can also be displayed using the histogram glyph. features across whatever linear scale has been selected. In this CViT output includes three files: a PNG image displaying respect, CViT bears some resemblance to genome browsers the chromosomes and features, a legend image describing the such as GBrowse [13], the Ensembl Genome Browser [14], feature glyphs, and a file of feature names and coordinates IGB (the Integrated Genome Browser), Artemis [15], and where they are located on the image. The coordinate file can the UCSC genome browser [16]—although those browsers be used to create interactive web pages—for example, to are designed primarily for close, interactive examination of create an HTML image map to enable clicking on features single chromosomes at various scales rather than large scale to getmoreinformation aboutthemortolink outtoother patterns. CViT may therefore often be useful for providing online resources. a genome-wide overview, with the task of close, interactive Manipulation of the output image is enabled by a simple examination of single chromosomal regions left to one of the but extensive configuration file. This file enables control of genome browsers above. This has been done, for example, in almost all aspects of the output image without touching the the implementation at LIS, where BLAST hits displayed in code, including selecting fonts for labels. Two freely available CViT link to 100 kb GBrowse windows around the hit in the True Type fonts are included in the download package and browser for the corresponding genome. any True Type Font can be added. Colors, transparency, sizes The GFF3 data format (http://www.sequenceontology .org/gff3.shtml), referred to from here on as just “GFF,” was of the glyph, location of glyphs relative to the chromosome, selected because of its ease of use and effective representation location of their text labels, and the appearance of the chro- mosomes themselves and their spacing are all under config- of genetic and genomic information and also because it ena- bles sharing data with GBrowse [13] and other GFF-capable uration control. browsers. As much as possible, we used GBrowse data and Add-on scripts include blastp to gff.pl, which generates a configuration conventions to enable display of the same data GFF file for CViT if provided a GFF file of peptides and either in both CViT and GBrowse. a tabular BLAST output file or a two-column hash of query Add-on scripts are provided in the download package to IDs and peptide IDs, and clusterHSP.pl, which collapses adja- aid in preparation of GFF files—for example, for conversion cent BLAST HSPs that occur within a sliding window (often from BLAST output to GFF. In addition, some web imple- appropriate for a peptide or cDNA query against pseudo- mentations are provided which can be used as is or modified molecule sequences). The web implementations provided to fit specific needs or to serve as examples of how CViT with the package include a simple “CViT-BLAST” imple- could be used as part of a larger online resource. mentation which can be used as is or modified to fit specific needs. There is also a web utility named “CViT-web” which 2. Implementation provides a web interface for generating CViT images. For some users this may be easier than modifying the configu- CViT consists of a package of Perl scripts along with a set of add-on scripts and some basic web implementations written ration file. International Journal of Plant Genomics 3 ×10 Gm01 Gm02 Gm03 Gm04 Gm05 Gm06 Gm07 Gm08 Gm09 Gm10 Gm11 Gm12 Gm13 Gm14 Gm15 Gm16 Gm17 Gm18 Gm19 Gm20 bp 30 Figure 1: Duplicated segments within the soybean (Glycine max) genome. Colored blocks to the left of each chromosome show regions of correspondence with chromosomes of the same color. For example, the light blue blocks at the top of Gm09 correspond with regions on the light blue Gm15, and vice versa. These correspondences are remnants after the Glycine genome duplication. Locations of centromeric repeats are shown as black rectangles over the chromosomes. Regions lacking internal correspondences (generally near chromosome centers) mark the approximate locations of the gene-poor pericentromeres. This figure is modified from the Legume Information System, where sequence- based searches can be made against the Glycine max, Medicago truncatula,and Lotus japonicus genomes, with CViT images displaying the sequence homologies and the synteny relationships among these genomes. ×10 Chr1Chr2Chr3Chr4Chr5Chr6Chr7Chr8 Chr9Chr10 bp 150 Figure 2: Gene density on the 10 chromosomes of Zea mays. Gene density is shown on the Zea mays inbred line B73 RefGen v2 genome assembly [17]. Probable locations of the centromeres are displayed as black bars positioned over the chromosomes [18]. The density of the filtered gene set gene calls is displayed as green bars to the right of the chromosomes, with bar length indicating the number of genes per 400 kbp. 4 International Journal of Plant Genomics 3. Biological Examples japonicus genomes,” Proceedings of the National Academy of Sciences of the United States of America, vol. 103, no. 40, pp. Examples of two online instances of CViT are the soybean, 14959–14964, 2006. Medicago truncatula,and Lotus japonicus genomes at LIS [6] C.J.Lawrence, L. C. Harper,M.L.Schaeffer et al., “MaizeGDB: the maize model organism database for basic, (http://comparative-legumes.org/) and the maize genome at translational, and applied research,” International Journal of MaizeGDB (http://maizegdb.org/). Plant Genomics, vol. 2008, Article ID 496957, 2008. The display of the soybean genome (Figure 1)illustrates [7] E.K.S.Cannon, S. M. Birkett, B. M. Braunetal., “POPcorn: the use of CViT for showing internal synteny (chromosomal an online resource providing access to distributed and diverse correspondences) from a whole genome duplication that is maize project data,” submitted to. International Journal of estimated to have occurred in the Glycine genus between ∼5 Plant Genomics. and 13 Mya [19, 20]. The implementation at LIS also shows [8] S. B. Cannon, G. D. May, and S. A. Jackson, “Three sequenced correspondences between the three sequenced genomes legume genomes and many crop species: rich opportunities for (in addition to the correspondences within the duplicated translational genomics,” Plant Physiology, vol. 151, no. 3, pp. soybean genome). The LIS implementation also allows a 970–977, 2009. sequence search with a multi-FASTA file. Sequence matches [9] C. Ameline-Torregrosa, B. B. Wang, M. S. O’Bleness et al., (BLAST hits) are color coded and link out to the genome “Identification and characterization of nucleotide-binding site-leucine-rich repeat genes in the model plant Medicago browser for the target genome, with browser views centered truncatula,” Plant Physiology, vol. 146, no. 1, pp. 5–21, 2008. around each hit. [10] D. J. Bertioli, M. C. Moretzsohn, L. H. Madsen et al., “An The display of the maize genome (Figure 2) illustrates the analysis of synteny of Arachis with Lotus and Medicago sheds use of CViT for showing gene density, based on gene models new light on the structure, stability and evolution of legume generatedaspartofthe Zea mays inbred B73 genome se- genomes,” BMC Genomics, vol. 10, article 45, 2009. quencing project [17]. The MaizeGDB use of CViT also in- [11] S. B. Cannon and R. C. Shoemaker, “Evolutionary and com- cludes a BLAST utility that displays color-coded hits on the parative analyses of the soybean genome,” Breeding Science.In reference genome with links to GBrowse for closer investiga- press. tion. [12] M. Krzywinski, J. Schein, I. Birol et al., “Circos: an information aesthetic for comparative genomics,” Genome Research, vol. 19, no. 9, pp. 1639–1645, 2009. Acknowledgments [13] L. D. Stein, C. Mungall, S. Shu et al., “The generic genome browser: a building block for a model organism system data- This work was supported in part by Grants from the Nation- base,” Genome Research, vol. 12, no. 10, pp. 1599–1610, 2002. al Science Foundation Plant Genome Research Program [14] P. Flicek, B. L. Aken, B. Ballester et al., “Ensembl’s 10th year,” DBI-0321460 to Nevin Young (for initial development by Nucleic Acids Research, vol. 38, supplement 1, pp. D557–D562, S.B.Cannon and E.K.S.Cannon), DBI-0743804 to Carolyn Lawrence and DBI-1027527 to Patrick Schnable. The authors [15] J. W. Nicol, G. A. Helt, S. G. Blanchard Jr., A. Raja, and A. thank Carolyn Lawrence for critical reading of the paper, E. Loraine, “The integrated genome browser: free software for Shelley Wang and Atif Ahmed for contributions to earlier distribution and exploration of genome-scale datasets,” Bioin- formatics, vol. 25, no. 20, pp. 2730–2731, 2009. versions of the software, Reka Keleman for integrating CViT [16] J. Zhu, J. Z. Sanborn, S. Benz et al., “The UCSC cancer genom- in the Morgan2McClintock translator, and Benjamin Mu- ics browser,” Nature Methods, vol. 6, no. 4, pp. 239–240, 2009. laosmanovic for contributions to accessory scripts and to the [17] P. S. Schnable, D. Ware, R. S. Fulton et al., “The B73 maize LIS implementation of CViT. genome: complexity, diversity, and dynamics,” Science, vol. 326, no. 5956, pp. 1112–1115, 2009. [18] T. K. Wolfgruber, A. Sharma, K. L. Schneider et al., “Maize References centromere structure and evolution: sequence analysis of cen- [1] N. D. Young, S. B. Cannon, S. Sato et al., “Sequencing the tromeres 2 and 5 reveals dynamic loci shaped primarily by retrotransposons,” PLoS Genetics, vol. 5, no. 11, Article ID genespaces of Medicago truncatula and Lotus japonicus,” Plant Physiology, vol. 137, no. 4, pp. 1174–1181, 2005. e1000743, 2009. [19] J. J. Doyle and A. N. Egan, “Dating the origins of polyploidy [2] S. F. Altschul, T. L. Madden, A. A. Schaffer et al., “Gapped BLAST and PSI-BLAST: a new generation of protein database events,” New Phytologist, vol. 186, no. 1, pp. 73–85, 2010. search programs,” Nucleic Acids Research, vol. 25, no. 17, pp. [20] J. Schmutz, S. B. Cannon, J. Schlueter et al., “Genome sequence of the palaeopolyploid soybean,” Nature, vol. 463, no. 7278, 3389–3402, 1997. [3] R. W. Innes, C. Ameline-Torregrosa, T. Ashfield et al., “Dif- pp. 178–183, 2010. ferential accumulation of retroelements and diversification of NB-LRR disease resistance genes in duplicated regions follow- ing polyploidy in the ancestor of soybean,” Plant Physiology, vol. 148, no. 4, pp. 1740–1759, 2008. [4] C. J. Lawrence, T. E. Seigfried, H. W. Bass, and L. K. Anderson, “Predicting chromosomal locations of genetically mapped loci in maize using the Morgan2McClintock Translator,” Genetics, vol. 172, no. 3, pp. 2007–2009, 2006. [5] S. B. Cannon, L. Sterck, S. Rombauts et al., “Legume genome evolution viewed through the Medicago truncatula and Lotus International Journal of Peptides Advances in International Journal of BioMed Stem Cells Virolog y Research International International Genomics Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 Journal of Nucleic Acids International Journal of Zoology Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 Submit your manuscripts at http://www.hindawi.com The Scientific Journal of Signal Transduction World Journal Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 International Journal of Advances in Genetics Anatomy Biochemistry Research International Research International Microbiology Research International Bioinformatics Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 Enzyme Journal of International Journal of Molecular Biology Archaea Research Evolutionary Biology International Marine Biology Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014

Journal

International Journal of Plant GenomicsHindawi Publishing Corporation

Published: Dec 19, 2011

References