Get 20M+ Full-Text Papers For Less Than $1.50/day. Subscribe now for You or Your Team.

Learn More →

Modified S-transform as a tool to identify secondary structure elements in RNA

Modified S-transform as a tool to identify secondary structure elements in RNA IntroductionSignal processing is a technology frequently used in various disciplines, where information can be processed and implemented. Signal processing algorithms are based on the mathematical concepts that detect these signals [1]. These signals are broadly designated in different physical or symbolic formats. Signal processing is applicable in any area of engineering and sciences to reveal the hidden information. Some of the areas where signal processing methods have been applied successfully are bioinformatics, genomics, proteomics, forensic sciences, etc. Genomic signal processing is a specific application of the signal processing method defined as the analysis and processing of genomic signals for gaining biological knowledge and the translation of that knowledge into systems-based applications. In the case of DNA, different types of periodicities have been reported in the literature [2], and these are summarized in Table 1.Table 1:List of periodicities identified in the DNA.PeriodicityBiological structure3Exonic regions repeats5–6Telomeric/subtelomeric repeats10–11DNA bendability (helical repeat structure)48–50Centromeric repeats68Beta-satellite DNA102Nucleosomal structure in eukaryotes105–106Isochores (low G+C content)~135Dimeric Alu repeats structure~165A rich homopolymeric DNA sequence in Alu repeats~171α Satellite DNA~300Alu repeats~680DNA bend sitesFor example, periodicity 3 is related to exonic repeats, periodicities 5 and 6 are related to the telomeric/subtelomeric repeats, and periodicity 10 is related to DNA bendability. MicroRNAs (miRNAs) are small, single-strand non-coding RNAs about 16–22 nucleotides in length, which play an important role in gene regulation by targeting specific messenger RNAs (mRNAs) for cleavage or translational repression [3]. miRNAs are also involved in many important biological processes, affecting the stability and translation of mRNAs and negatively regulating gene expression in post-transcriptional processes [3, 4]. The normal expression of miRNAs has been associated with various diseases, including cancer, making them interesting therapeutic targets [5]. Also, upregulation and/or downregulation of miRNAs occur in various diseases in comparison to their normal expression [6] to design small molecules that can modulate the miRNA’s function, and identification of their secondary structural elements is important [4]. In this article, the Stockwell-transform-based algorithm, i.e. the modified S-transform (MST) has been applied to detect the periodicity present in miRNA, and we have correlated the identified periodicities with miRNA’s secondary structures.RNA secondary structureThe RNA is a ribonucleic acid polymer made of nucleotides or bases, A, C, G and U, where uracil (U) is chemically similar to thymine (T) in the DNA. RNA is generally a single-stranded molecule [6]. The nucleotides A/U and C/G can form hydrogen bonds, which are commonly known as complementary base pairs. Sometimes, the bases G and U can also form pairs [6]. In a given RNA sequence, if a complementary segment exists, then these segments can form consecutive base pairs that help the RNA to fold onto itself. Such folding results in a two-dimensional structure known as RNA secondary structure, which further folds onto itself to form a tertiary structure [6].Identification of the secondary structures is crucial for any RNA-based study, as it gives insight about its function. The different secondary structures are:Stem-loopIt is an essential part of the RNA secondary structure of RNA. It can guide RNA folding, determine interactions in a ribozyme, protect mRNA from degradation, serve as a recognition motif for RNA binding proteins or act as a substrate for enzymatic reactions [7] (Figure 1A).Figure 1:Secondary structure elements of RNA.(A) A schematic representation of RNA with secondary structure elements higlighted. (B) Schematic representation of a Pseudoknot in RNA. Adapted from [8].Bulges and internal loopsBulges and internal loops form when two double-helical tracts are separated on either one (bulge) or both strands (internal loops) by one or more unpaired nucleotides. Internal loops containing equal numbers of bases on each strand are symmetric, whereas they are asymmetric when the number of bases is different. The presence of an internal loop or bulge reduces thermodynamic stability, when compared to a perfect double helix, but unpaired nucleotides are more readily accessible to protein or nucleic acid ligands, which often recognize such sites [7] (Figure 1A).PseudoknotsWhen complementary sequences consisting of a hairpin or internal loop and a single-stranded region interact with each other by Watson-Crick base pairing, a pseudoknot is formed. There can be two alternative hairpin formations, when a pseudoknot forms between a hairpin loop and a complementary single-stranded region. The formation of a pseudoknot creates an extended helical region through helical stacking of the hairpin double-helical stem and the newly formed loop-loop interaction helix [8, 9] (Figure 1B).The RNA secondary structure is essentially governed by the base paring of nucleotides. Different computational methods have been proposed to determine the ‘optimal base pairing’ of RNA in an efficient manner. Such algorithms are typically called RNA folding algorithm [8, 9]. However, not much has been done using signal processing methods for RNA-based signal processing [10]. Although there are many RNA secondary structure prediction algorithms/methods available, in this article, a transform-based method has been presented for detecting secondary structure. One such method that can be used in detection of RNA secondary structure is the MST.Materials and methodsS-transformThe S-transform of signal x(t) is defined as [11](1)S(t,f)=∫−∞∞x(τ)G(t−τ)e−j2πfτdτ,$$S(t,f) = \mathop \smallint \limits_{ - \infty }^\infty x(\tau )G(t - \tau ){e^{ - j2\pi f\tau }}d\tau ,$$where G(t−τ)=(1σ2π)e−(t−τ)2/2σ2$G(t - \tau ) = \left( {\frac{1}{{\sigma \sqrt {2\pi } }}} \right){e^{ - {{(t - \tau )}^2}/2{\sigma ^2}}}$is the Gaussian window function, σ is the standard deviation, τ is the time, t is the central position of the window and f is the frequency. The advantage of the S-transform over other signal processing methods is that the standard deviation σ is a function of frequency (σ=1f ),$\left( {\sigma = \frac{1}{f}} \right),$so the length of the Gaussian window varies with respect to the frequency and hence the equation above becomes(2)S(t,f)=∫−∞∞x(τ)(f2π)e−(t−τ)2/2σ2e−j2πfτdτ$$S(t,f) = \mathop \smallint \limits_{ - \infty }^\infty x(\tau )\left( {\frac{f}{{\sqrt {2\pi } }}} \right){e^{ - {{(t - \tau )}^2}/2{\sigma ^2}}}{e^{ - j2\pi f\tau }}d\tau $$The S-transform is also known as short-time Fourier transforms with variable window. However, the S-transform has some limitations. Specifically, it provides the progressive time frequency resolution but has the problem of poor energy concentration in the time frequency plane because of Gaussian window is inversely proportional to frequency. In the next section, the energy concentration in the time frequency plane has improved by selecting an appropriate control parameter γ for the window length of the Gaussian window to reduce the spectral leakages.Modified S-transformThe MST is a modified version of the S-transform method [12], where it has an appropriate value of the parameter γ to control the window length for each specific frequency to improve the time and frequency resolution as well. In the MST for higher frequencies, the window length is small, and for lower frequencies, the window length is large [12]. Therefore, the MST can capture all the frequencies. Thus, to get the appropriate window length to detect the corresponding period (period=1/frequency), we have to introduce the frequency-dependent control parameter γ using simulation studies [10], which is given in the following:(3)γ(f)=15f+0.8,$$\gamma (f) = 15f + 0.8,$$where γ is a frequency-dependent control parameter for window length and MST is defined as(4)S(t,f)=∫−∞∞x(τ)(f(γ(f)2π)e−(t−τ)2f2/2(γ(f))2e−j2πfτdτ.$$S(t,f) = \mathop \smallint \limits_{ - \infty }^\infty x(\tau )\left( {\frac{f}{{(\gamma (f)\sqrt {2\pi } }}} \right){e^{ - {{(t - \tau )}^2}{f^2}/2{{(\gamma (f))}^2}}}{e^{ - j2\pi f\tau }}d\tau .$$High confidence dataThe miRNA sequences used in the study is obtained from the miRBase v21 database [13]. These sequences, totaling 1996, are labeled as ‘high confidence’, as they are of high quality and obtained from RNA deep sequencing experiments. The secondary structures of these high-confidence sequences are predicted from the Mfold server [14, 15]. The core algorithm predicts a minimum free energy for folding that must contain any particular base pair. In this, optimal and sub-optimal secondary structures are predicted based on the minimum free energy [15].In this study, the MST algorithm was applied to detect periodicities between 2 and 11 in miRNA sequences and analyze the predicted secondary structure. The methodology is shown in Figure 2. Specifically, we checked for any sequences that exhibited periodicities 10 and 11, as the number of nucleotides in a turn of β-DNA is 10.5 [16]. These selected sequences were analyzed in detail. Detailed explanation of the results has been presented in the next section.Figure 2:Flowchart of methodology for identification of RNA secondary structures.Results and discussionPeriodicities 10 and 11 have been previously reported to be related to helical function in DNA [17]. The reason is that the average length of a helical turn in β-DNA is about 10.5 nucleotides [16]. Extrapolating this, we hypothesized that periodicities 10 and 11 are also possibly responsible for the helical function in RNA.There were 1996 sequences of high confidence obtained from miRBase database. The MST-based algorithm was run on these high-confidence/high-quality miRNA sequences for the periodicities from 2 to 11, and the time-periodicity plot was obtained. The summary of the various periodicities found in this dataset is represented a by histogram (Figure 3).Figure 3:Histogram indicating the periodicity identified in the 1996 sequences.The number at the top of the bar indicates the number of sequences that exhibit the particular periodicity.There were 166 sequences that showed a periodicity of 10 and 160 sequences that showed a periodicity of 11, along with other periodicities, i.e. 2–9. To check whether periodicities 10 and 11 are indeed involved in helical formation (i.e. secondary structure formation of the stem region) in RNA, we only selected the sequences with dominating periodicities 10 and 11. Seventeen sequences showed a periodicity of only 10 and 11 and their time periodicity are shown in Figure 4.Figure 4:Time-periodicity plot for 17 sequences.Red and dark blue, high- and low-power spectrum densities, respectively. Vertical axis (bottom to top), periodicities 2–11; horizontal axis, position of the nucleotides.The time-periodicity plot of 17 sequences (Figure 4) shows that the periodicities of 10 and 11 predominate in these sequences because a high-power spectrum density is associated with dominant periodicity in the sequences. While in 10 sequences (MI00000136, MI00000140, MI0000388, MI0000393, MI00000650, MI0000694, MI0003135, MI0010522, MI0010523 and MI0017546), the high-power spectrum density is in the entire sequence length, in the rest of the seven sequences, the high-power spectrum density is localized either at the center of the sequence or at the ends of the sequence.To verify the type of secondary structures present in these 17 sequences, Mfold was run to predict the optimal secondary structures. The Mfold results for these 17 sequences are shown in Figure 5. The Mfold results of 1979 sequences and its corresponding periodicities obtained from MST is tabulated in Supplementary Material.Figure 5:Mfold results for sequences containing only periodicities 10 and 11.The Mfold results show that those sequences with periodicities of only 10 and 11 are majorly forming stem type of secondary structures. These stem regions are helical in nature, and it is highly likely that the MST algorithm was able to successfully identify the periodicity associated with the helical formation in RNA.For periodicities 10 and 11, the consensus sequences were identified, which showed the justification of the applicability of the MST for the RNA sequences, in this case, for miRNA sequences. The repeating patterns of periodicities 10 and 11 for the 17 selected sequences are given in Table 2.Table 2:List of consensus sequences for periodicities 10 and 11.miRNAPeriodicity 10Periodicity 11MI0000045−ACAUCCGUCXUMI0000136UCACUGUGCU−MI0000140CUAUAUAGUUCUGAAUUAXUAMI0000388−GUXCUCGUUCUMI0000393AAUAXUUCUGUGUGCAXCGACMI0000650GCCUGGGAGUCUGAXGXCUGUMI0000694UCCUGGGUAUUXGXGCGGUGGMI0003135−UAXUAUAUCAXMI0010522UAUXCUUUUA−MI0010523UAUXAGUUGU−MI0013585−AGXUAUAUCAXMI0016284−UUGAACUUGAGMI0016286UXAUAAACUU−MI0017546UAUUAAAXUU−MI0017561−AUUAUUAUUAGMI0017735CCXAUGCGCGUXAUCUXGCGGMI0022215ACUGGGGGAXAAXAGCAGUCGFrom the results shown in Figures 4 and 5, it has been validated that sequences having the periodicity ‘10’ and ‘11’ are associated with the secondary structure ‘stem’. For the confirmation of the above results, we have also found the location of the repeated patterns of periodicities 10 and 11 and the residues of the secondary structure in miRNA using MST-based algorithm and Mfold software. A comparative summary of the ‘stem’ residue and the location of the repeated patterns of periodicities 10 and 11 are given in Table 3.Table 3:MST results validation with Mfold results.Accession IDPeriodicityMfoldStemBulgeLoopMI00000045Period 11 (1–25,65–100)4–8,91–95,11–35,65–89,38–45,56–639,10,90,36,37,6446–55MI0000136Period 10 (1–60)1–7,57–63,10–12,52–54,16–20,44–48,23–26,38–4127–27MI0000140Period 10 (1–60), period 11 (1–60)1–2,63–64,5–10,55–60,13,52,16–20,45–49,24–25,40–41,28–3826,3929–37MI0000388Period 11 (1–80)1–3,81–83,6,78,9–11,73–75,15,69,18–22,61–65,25,59,28–36,48–5623,6037–47MI0000393Period 10 (1–70), period 11 (1–70)1,69,4–12,58–66,15–17,53–55,19–22,47–50,26–29,41–448,5230–40MI0000650Period 10 (1–70), period 11 (1–70)1,68,5–7,62–64,9–13,55–59,16–22,46–52,25–26,42–43,29,408,61,27,4130–39MI0000694Period 10 (1–70), period 11 (1–70)1,69,5–7,63–65,9–13,56–60,16–22,47–53,25–25,43–44,29–30,40–418,62,27,4231–39MI0003135Period 11 (1–80)2–6,77–81,9–17,66–74,20,64,22–27,56–61,31–34,49–52,37,4718,65,21,63,35,4838–46MI0010522Period 10 (1–80)16–25,72–81,28–34,63–69,37–42,56–61,44,5335,62,43,5545–52MI0010523Period 10 (1–80)16–25,72–81,28–34,63–69,37–42,56–61,44,5335,62,43,5545–52MI0013585Period 11 (1–80,100–110)5–105,8–9,101–102,11–15,91–95,18–22,84–88,25–29,77–81, 32–35,71–74,39–43,63–67,46,6010,10047–59MI0016284Period 11 (20–80)6,94,10–16,84–90,20–31,69–80,35–45,55–6546–54MI0016286Period 10 (1–25,85–100)6–10,86–90,13–23,73–83,26–29,67–70,32–37,60–65,41–44,51–5430,6645–50MI0017546Period 10 (1–70)1–6,58–63,12–23,39–50,26–28,35–3724,3829–34MI0017561Period 11 (35–80)6–8,88–90,11,86,14–26,70–82,29–32,64–67,36,60,39–42,53–569,8743–52MI0017735Period 10 (1–50), period 11 (1–50)6–8,92–94,11,89,16–18,83–85,21–26,75–80,29–34,67–72,37–38, 64–65,42–43,60–61,46–47,56–5535,6648–54MI0022215Period 10 (1–30,60–90), period 11 (1–30,65–95)1–5,82–86,8–27,61–80,51–58,30–33,47–50,36–37,43–446,8152–57,38–42ConclusionmiRNAs are involved in many important biological processes, affecting stability, translating mRNAs and negatively regulating gene expression in post-transcriptional processes [18]. In this article, we have successfully applied the MST for the detection of periodicity present in miRNA sequences and correlated them with the secondary structures of miRNA predicted via the Mfold software. The results obtained from the Mfold web server were correlated with the high-power spectrum density of periodicities 10 and 11. From the results, we conclude that the regions in sequences (exhibiting periodicities 10 and 11) exhibiting a high-power spectrum density (shown as red in Figure 4) correlate in terms of the Mfold prediction, where the stem region is predicted. In other words, the high-power spectrum density is shown in regions that tend to form a helical stem secondary structure. Thus, this algorithm can be applied for the study of the secondary structures of miRNAs.Author contributions: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.Research funding: None declared.Employment or leadership: None declared.Honorarium: None declared.Competing interests: The funding organization(s) played no role in the study design; in the collection, analysis, and interpretation of data; in the writing of the report; or in the decision to submit the report for publication.References1.Moura J. What is signal processing? [President’s message]. IEEE Signal Process Mag 2009;26:6.MouraJWhat is signal processing? [President’s message]IEEE Signal Process Mag20092662.Damasevicius R. Complexity estimation of genetic sequences using information-theoretic and frequency analysis methods. Informatica 2010;21:13–30.DamaseviciusRComplexity estimation of genetic sequences using information-theoretic and frequency analysis methodsInformatica20102113303.Liu B, Fang L, Liu F, Wang X, Chen J, Chou KC. Identification of real microRNA precursors with a pseudo structure status composition approach. PLoS One 2015;10:e0121501.LiuBFangLLiuFWangXChenJChouKCIdentification of real microRNA precursors with a pseudo structure status composition approachPLoS One201510e01215014.Liu B, Childs-Disney JL, Znosko BM, Wang D, Fallahi M, Gallo SM, et al. Analysis of secondary structural elements in human microRNA hairpin precursors. BMC Bioinform 2016;17:112.LiuBChilds-DisneyJLZnoskoBMWangDFallahiMGalloSMAnalysis of secondary structural elements in human microRNA hairpin precursorsBMC Bioinform2016171125.Ardenkani AM, Naeini MM. The role of microRNAs in human diseases’. Avicenna J Med Biotechnol 2010;2:161–79.ArdenkaniAMNaeiniMMThe role of microRNAs in human diseases’Avicenna J Med Biotechnol20102161796.Yoon BJ, Vaidyanathan RP. Computational identification and analysis of noncoding RNAs – unearthing the buried treasures in the genome. IEEE Signal Process Mag 2007;24:64–74.YoonBJVaidyanathanRPComputational identification and analysis of noncoding RNAs – unearthing the buried treasures in the genomeIEEE Signal Process Mag20072464747.Svoboda P, Cara AD. Hairpin RNA: a secondary structure of primary importance. Cell Mol Life Sci 2006;63:901–8.SvobodaPCaraADHairpin RNA: a secondary structure of primary importanceCell Mol Life Sci20066390188.RNA structure (molecular biology). Available at: http://what-when-how.com/molecular-biology/rna-structure-molecular-biology/. Accessed: 18 Jun 2017.RNA structure (molecular biology)Available at: http://what-when-how.com/molecular-biology/rna-structure-molecular-biology/Accessed: 18 Jun 20179.Moss WN. Computational prediction of RNA secondary structure. In: Lorsch J, editor. Methods in Enzymology: RNA. San Diego, USA: Elsevier, 2013:3–65.MossWNComputational prediction of RNA secondary structureLorschJMethods in Enzymology: RNASan Diego, USAElsevier201336510.Stockwell RG, Mansinha L, Lowe RP. Localization of the complex spectrum: the S transform. IEEE Trans Signal Process 1996;44:998–1001.StockwellRGMansinhaLLoweRPLocalization of the complex spectrum: the S transformIEEE Trans Signal Process199644998100111.Borkar PS, Mahajan AR. Different RNA secondary structure prediction methods. In: Electronic Systems, Signal Processing and Computing Technologies (ICESC), 2014 International Conference, 9 Jan 2014. Nagpur, India: IEEE, 2014:228–30.BorkarPSMahajanARDifferent RNA secondary structure prediction methodsIn:Electronic Systems, Signal Processing and Computing Technologies (ICESC), 2014 International Conference, 9 Jan 2014Nagpur, IndiaIEEE20142283012.Sharma SD, Saxena R, Sharma SN. Short tandem repeats detection in DNA sequences using modified S-transform. Int J Adv Eng Technol 2015;8:233–45.SharmaSDSaxenaRSharmaSNShort tandem repeats detection in DNA sequences using modified S-transformInt J Adv Eng Technol201582334513.Kozomara A, Griffiths-Jones S. miRBase: annotating high confidence microRNAs using deep sequencing data. Nucl Acids Res 2013;42:D68–73.KozomaraAGriffiths-JonesSmiRBase: annotating high confidence microRNAs using deep sequencing dataNucl Acids Res201342D687314.The Mfold web server. Available at: http://unafold.rna.albany.edu/?q=mfold. Accessed: 18 Jun 2017.The Mfold web serverAvailable at: http://unafold.rna.albany.edu/?q=mfoldAccessed: 18 Jun 201715.Zuker M. Mfold web server for nucleic acid folding and hybridization prediction. Nucl Acids Res 2003;31:3406–15.ZukerMMfold web server for nucleic acid folding and hybridization predictionNucl Acids Res20033134061516.Trifonov EN, Sussman JL. The pitch of chromatin DNA is reflected in its nucleotide sequence. Proc Natl Acad Sci USA 1980;77:3816–20.TrifonovENSussmanJLThe pitch of chromatin DNA is reflected in its nucleotide sequenceProc Natl Acad Sci USA19807738162017.Mrazek J. Comparative analysis of sequence periodicity among prokaryotic genomes points to differences in nucleoid structure and a relationship to gene expression. J Bacteriol 2010;192:3763–72.MrazekJComparative analysis of sequence periodicity among prokaryotic genomes points to differences in nucleoid structure and a relationship to gene expressionJ Bacteriol201019237637218.Li Z, Rana TM. Therapeutic targeting of microRNAs: current status and future challenges. Nat Rev Drug Discov 2014;13:622–38.LiZRanaTMTherapeutic targeting of microRNAs: current status and future challengesNat Rev Drug Discov20141362238Supplemental Material:The online version of this article offers supplementary material (https://doi.org/10.1515/bams-2017-0023). http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Bio-Algorithms and Med-Systems de Gruyter

Modified S-transform as a tool to identify secondary structure elements in RNA

Loading next page...
 
/lp/de-gruyter/modified-s-transform-as-a-tool-to-identify-secondary-structure-Fu2FSdTYzB

References

References for this paper are not available at this time. We will be adding them shortly, thank you for your patience.

Publisher
de Gruyter
Copyright
©2017 Walter de Gruyter GmbH, Berlin/Boston
ISSN
1896-530X
eISSN
1896-530X
DOI
10.1515/bams-2017-0023
Publisher site
See Article on Publisher Site

Abstract

IntroductionSignal processing is a technology frequently used in various disciplines, where information can be processed and implemented. Signal processing algorithms are based on the mathematical concepts that detect these signals [1]. These signals are broadly designated in different physical or symbolic formats. Signal processing is applicable in any area of engineering and sciences to reveal the hidden information. Some of the areas where signal processing methods have been applied successfully are bioinformatics, genomics, proteomics, forensic sciences, etc. Genomic signal processing is a specific application of the signal processing method defined as the analysis and processing of genomic signals for gaining biological knowledge and the translation of that knowledge into systems-based applications. In the case of DNA, different types of periodicities have been reported in the literature [2], and these are summarized in Table 1.Table 1:List of periodicities identified in the DNA.PeriodicityBiological structure3Exonic regions repeats5–6Telomeric/subtelomeric repeats10–11DNA bendability (helical repeat structure)48–50Centromeric repeats68Beta-satellite DNA102Nucleosomal structure in eukaryotes105–106Isochores (low G+C content)~135Dimeric Alu repeats structure~165A rich homopolymeric DNA sequence in Alu repeats~171α Satellite DNA~300Alu repeats~680DNA bend sitesFor example, periodicity 3 is related to exonic repeats, periodicities 5 and 6 are related to the telomeric/subtelomeric repeats, and periodicity 10 is related to DNA bendability. MicroRNAs (miRNAs) are small, single-strand non-coding RNAs about 16–22 nucleotides in length, which play an important role in gene regulation by targeting specific messenger RNAs (mRNAs) for cleavage or translational repression [3]. miRNAs are also involved in many important biological processes, affecting the stability and translation of mRNAs and negatively regulating gene expression in post-transcriptional processes [3, 4]. The normal expression of miRNAs has been associated with various diseases, including cancer, making them interesting therapeutic targets [5]. Also, upregulation and/or downregulation of miRNAs occur in various diseases in comparison to their normal expression [6] to design small molecules that can modulate the miRNA’s function, and identification of their secondary structural elements is important [4]. In this article, the Stockwell-transform-based algorithm, i.e. the modified S-transform (MST) has been applied to detect the periodicity present in miRNA, and we have correlated the identified periodicities with miRNA’s secondary structures.RNA secondary structureThe RNA is a ribonucleic acid polymer made of nucleotides or bases, A, C, G and U, where uracil (U) is chemically similar to thymine (T) in the DNA. RNA is generally a single-stranded molecule [6]. The nucleotides A/U and C/G can form hydrogen bonds, which are commonly known as complementary base pairs. Sometimes, the bases G and U can also form pairs [6]. In a given RNA sequence, if a complementary segment exists, then these segments can form consecutive base pairs that help the RNA to fold onto itself. Such folding results in a two-dimensional structure known as RNA secondary structure, which further folds onto itself to form a tertiary structure [6].Identification of the secondary structures is crucial for any RNA-based study, as it gives insight about its function. The different secondary structures are:Stem-loopIt is an essential part of the RNA secondary structure of RNA. It can guide RNA folding, determine interactions in a ribozyme, protect mRNA from degradation, serve as a recognition motif for RNA binding proteins or act as a substrate for enzymatic reactions [7] (Figure 1A).Figure 1:Secondary structure elements of RNA.(A) A schematic representation of RNA with secondary structure elements higlighted. (B) Schematic representation of a Pseudoknot in RNA. Adapted from [8].Bulges and internal loopsBulges and internal loops form when two double-helical tracts are separated on either one (bulge) or both strands (internal loops) by one or more unpaired nucleotides. Internal loops containing equal numbers of bases on each strand are symmetric, whereas they are asymmetric when the number of bases is different. The presence of an internal loop or bulge reduces thermodynamic stability, when compared to a perfect double helix, but unpaired nucleotides are more readily accessible to protein or nucleic acid ligands, which often recognize such sites [7] (Figure 1A).PseudoknotsWhen complementary sequences consisting of a hairpin or internal loop and a single-stranded region interact with each other by Watson-Crick base pairing, a pseudoknot is formed. There can be two alternative hairpin formations, when a pseudoknot forms between a hairpin loop and a complementary single-stranded region. The formation of a pseudoknot creates an extended helical region through helical stacking of the hairpin double-helical stem and the newly formed loop-loop interaction helix [8, 9] (Figure 1B).The RNA secondary structure is essentially governed by the base paring of nucleotides. Different computational methods have been proposed to determine the ‘optimal base pairing’ of RNA in an efficient manner. Such algorithms are typically called RNA folding algorithm [8, 9]. However, not much has been done using signal processing methods for RNA-based signal processing [10]. Although there are many RNA secondary structure prediction algorithms/methods available, in this article, a transform-based method has been presented for detecting secondary structure. One such method that can be used in detection of RNA secondary structure is the MST.Materials and methodsS-transformThe S-transform of signal x(t) is defined as [11](1)S(t,f)=∫−∞∞x(τ)G(t−τ)e−j2πfτdτ,$$S(t,f) = \mathop \smallint \limits_{ - \infty }^\infty x(\tau )G(t - \tau ){e^{ - j2\pi f\tau }}d\tau ,$$where G(t−τ)=(1σ2π)e−(t−τ)2/2σ2$G(t - \tau ) = \left( {\frac{1}{{\sigma \sqrt {2\pi } }}} \right){e^{ - {{(t - \tau )}^2}/2{\sigma ^2}}}$is the Gaussian window function, σ is the standard deviation, τ is the time, t is the central position of the window and f is the frequency. The advantage of the S-transform over other signal processing methods is that the standard deviation σ is a function of frequency (σ=1f ),$\left( {\sigma = \frac{1}{f}} \right),$so the length of the Gaussian window varies with respect to the frequency and hence the equation above becomes(2)S(t,f)=∫−∞∞x(τ)(f2π)e−(t−τ)2/2σ2e−j2πfτdτ$$S(t,f) = \mathop \smallint \limits_{ - \infty }^\infty x(\tau )\left( {\frac{f}{{\sqrt {2\pi } }}} \right){e^{ - {{(t - \tau )}^2}/2{\sigma ^2}}}{e^{ - j2\pi f\tau }}d\tau $$The S-transform is also known as short-time Fourier transforms with variable window. However, the S-transform has some limitations. Specifically, it provides the progressive time frequency resolution but has the problem of poor energy concentration in the time frequency plane because of Gaussian window is inversely proportional to frequency. In the next section, the energy concentration in the time frequency plane has improved by selecting an appropriate control parameter γ for the window length of the Gaussian window to reduce the spectral leakages.Modified S-transformThe MST is a modified version of the S-transform method [12], where it has an appropriate value of the parameter γ to control the window length for each specific frequency to improve the time and frequency resolution as well. In the MST for higher frequencies, the window length is small, and for lower frequencies, the window length is large [12]. Therefore, the MST can capture all the frequencies. Thus, to get the appropriate window length to detect the corresponding period (period=1/frequency), we have to introduce the frequency-dependent control parameter γ using simulation studies [10], which is given in the following:(3)γ(f)=15f+0.8,$$\gamma (f) = 15f + 0.8,$$where γ is a frequency-dependent control parameter for window length and MST is defined as(4)S(t,f)=∫−∞∞x(τ)(f(γ(f)2π)e−(t−τ)2f2/2(γ(f))2e−j2πfτdτ.$$S(t,f) = \mathop \smallint \limits_{ - \infty }^\infty x(\tau )\left( {\frac{f}{{(\gamma (f)\sqrt {2\pi } }}} \right){e^{ - {{(t - \tau )}^2}{f^2}/2{{(\gamma (f))}^2}}}{e^{ - j2\pi f\tau }}d\tau .$$High confidence dataThe miRNA sequences used in the study is obtained from the miRBase v21 database [13]. These sequences, totaling 1996, are labeled as ‘high confidence’, as they are of high quality and obtained from RNA deep sequencing experiments. The secondary structures of these high-confidence sequences are predicted from the Mfold server [14, 15]. The core algorithm predicts a minimum free energy for folding that must contain any particular base pair. In this, optimal and sub-optimal secondary structures are predicted based on the minimum free energy [15].In this study, the MST algorithm was applied to detect periodicities between 2 and 11 in miRNA sequences and analyze the predicted secondary structure. The methodology is shown in Figure 2. Specifically, we checked for any sequences that exhibited periodicities 10 and 11, as the number of nucleotides in a turn of β-DNA is 10.5 [16]. These selected sequences were analyzed in detail. Detailed explanation of the results has been presented in the next section.Figure 2:Flowchart of methodology for identification of RNA secondary structures.Results and discussionPeriodicities 10 and 11 have been previously reported to be related to helical function in DNA [17]. The reason is that the average length of a helical turn in β-DNA is about 10.5 nucleotides [16]. Extrapolating this, we hypothesized that periodicities 10 and 11 are also possibly responsible for the helical function in RNA.There were 1996 sequences of high confidence obtained from miRBase database. The MST-based algorithm was run on these high-confidence/high-quality miRNA sequences for the periodicities from 2 to 11, and the time-periodicity plot was obtained. The summary of the various periodicities found in this dataset is represented a by histogram (Figure 3).Figure 3:Histogram indicating the periodicity identified in the 1996 sequences.The number at the top of the bar indicates the number of sequences that exhibit the particular periodicity.There were 166 sequences that showed a periodicity of 10 and 160 sequences that showed a periodicity of 11, along with other periodicities, i.e. 2–9. To check whether periodicities 10 and 11 are indeed involved in helical formation (i.e. secondary structure formation of the stem region) in RNA, we only selected the sequences with dominating periodicities 10 and 11. Seventeen sequences showed a periodicity of only 10 and 11 and their time periodicity are shown in Figure 4.Figure 4:Time-periodicity plot for 17 sequences.Red and dark blue, high- and low-power spectrum densities, respectively. Vertical axis (bottom to top), periodicities 2–11; horizontal axis, position of the nucleotides.The time-periodicity plot of 17 sequences (Figure 4) shows that the periodicities of 10 and 11 predominate in these sequences because a high-power spectrum density is associated with dominant periodicity in the sequences. While in 10 sequences (MI00000136, MI00000140, MI0000388, MI0000393, MI00000650, MI0000694, MI0003135, MI0010522, MI0010523 and MI0017546), the high-power spectrum density is in the entire sequence length, in the rest of the seven sequences, the high-power spectrum density is localized either at the center of the sequence or at the ends of the sequence.To verify the type of secondary structures present in these 17 sequences, Mfold was run to predict the optimal secondary structures. The Mfold results for these 17 sequences are shown in Figure 5. The Mfold results of 1979 sequences and its corresponding periodicities obtained from MST is tabulated in Supplementary Material.Figure 5:Mfold results for sequences containing only periodicities 10 and 11.The Mfold results show that those sequences with periodicities of only 10 and 11 are majorly forming stem type of secondary structures. These stem regions are helical in nature, and it is highly likely that the MST algorithm was able to successfully identify the periodicity associated with the helical formation in RNA.For periodicities 10 and 11, the consensus sequences were identified, which showed the justification of the applicability of the MST for the RNA sequences, in this case, for miRNA sequences. The repeating patterns of periodicities 10 and 11 for the 17 selected sequences are given in Table 2.Table 2:List of consensus sequences for periodicities 10 and 11.miRNAPeriodicity 10Periodicity 11MI0000045−ACAUCCGUCXUMI0000136UCACUGUGCU−MI0000140CUAUAUAGUUCUGAAUUAXUAMI0000388−GUXCUCGUUCUMI0000393AAUAXUUCUGUGUGCAXCGACMI0000650GCCUGGGAGUCUGAXGXCUGUMI0000694UCCUGGGUAUUXGXGCGGUGGMI0003135−UAXUAUAUCAXMI0010522UAUXCUUUUA−MI0010523UAUXAGUUGU−MI0013585−AGXUAUAUCAXMI0016284−UUGAACUUGAGMI0016286UXAUAAACUU−MI0017546UAUUAAAXUU−MI0017561−AUUAUUAUUAGMI0017735CCXAUGCGCGUXAUCUXGCGGMI0022215ACUGGGGGAXAAXAGCAGUCGFrom the results shown in Figures 4 and 5, it has been validated that sequences having the periodicity ‘10’ and ‘11’ are associated with the secondary structure ‘stem’. For the confirmation of the above results, we have also found the location of the repeated patterns of periodicities 10 and 11 and the residues of the secondary structure in miRNA using MST-based algorithm and Mfold software. A comparative summary of the ‘stem’ residue and the location of the repeated patterns of periodicities 10 and 11 are given in Table 3.Table 3:MST results validation with Mfold results.Accession IDPeriodicityMfoldStemBulgeLoopMI00000045Period 11 (1–25,65–100)4–8,91–95,11–35,65–89,38–45,56–639,10,90,36,37,6446–55MI0000136Period 10 (1–60)1–7,57–63,10–12,52–54,16–20,44–48,23–26,38–4127–27MI0000140Period 10 (1–60), period 11 (1–60)1–2,63–64,5–10,55–60,13,52,16–20,45–49,24–25,40–41,28–3826,3929–37MI0000388Period 11 (1–80)1–3,81–83,6,78,9–11,73–75,15,69,18–22,61–65,25,59,28–36,48–5623,6037–47MI0000393Period 10 (1–70), period 11 (1–70)1,69,4–12,58–66,15–17,53–55,19–22,47–50,26–29,41–448,5230–40MI0000650Period 10 (1–70), period 11 (1–70)1,68,5–7,62–64,9–13,55–59,16–22,46–52,25–26,42–43,29,408,61,27,4130–39MI0000694Period 10 (1–70), period 11 (1–70)1,69,5–7,63–65,9–13,56–60,16–22,47–53,25–25,43–44,29–30,40–418,62,27,4231–39MI0003135Period 11 (1–80)2–6,77–81,9–17,66–74,20,64,22–27,56–61,31–34,49–52,37,4718,65,21,63,35,4838–46MI0010522Period 10 (1–80)16–25,72–81,28–34,63–69,37–42,56–61,44,5335,62,43,5545–52MI0010523Period 10 (1–80)16–25,72–81,28–34,63–69,37–42,56–61,44,5335,62,43,5545–52MI0013585Period 11 (1–80,100–110)5–105,8–9,101–102,11–15,91–95,18–22,84–88,25–29,77–81, 32–35,71–74,39–43,63–67,46,6010,10047–59MI0016284Period 11 (20–80)6,94,10–16,84–90,20–31,69–80,35–45,55–6546–54MI0016286Period 10 (1–25,85–100)6–10,86–90,13–23,73–83,26–29,67–70,32–37,60–65,41–44,51–5430,6645–50MI0017546Period 10 (1–70)1–6,58–63,12–23,39–50,26–28,35–3724,3829–34MI0017561Period 11 (35–80)6–8,88–90,11,86,14–26,70–82,29–32,64–67,36,60,39–42,53–569,8743–52MI0017735Period 10 (1–50), period 11 (1–50)6–8,92–94,11,89,16–18,83–85,21–26,75–80,29–34,67–72,37–38, 64–65,42–43,60–61,46–47,56–5535,6648–54MI0022215Period 10 (1–30,60–90), period 11 (1–30,65–95)1–5,82–86,8–27,61–80,51–58,30–33,47–50,36–37,43–446,8152–57,38–42ConclusionmiRNAs are involved in many important biological processes, affecting stability, translating mRNAs and negatively regulating gene expression in post-transcriptional processes [18]. In this article, we have successfully applied the MST for the detection of periodicity present in miRNA sequences and correlated them with the secondary structures of miRNA predicted via the Mfold software. The results obtained from the Mfold web server were correlated with the high-power spectrum density of periodicities 10 and 11. From the results, we conclude that the regions in sequences (exhibiting periodicities 10 and 11) exhibiting a high-power spectrum density (shown as red in Figure 4) correlate in terms of the Mfold prediction, where the stem region is predicted. In other words, the high-power spectrum density is shown in regions that tend to form a helical stem secondary structure. Thus, this algorithm can be applied for the study of the secondary structures of miRNAs.Author contributions: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.Research funding: None declared.Employment or leadership: None declared.Honorarium: None declared.Competing interests: The funding organization(s) played no role in the study design; in the collection, analysis, and interpretation of data; in the writing of the report; or in the decision to submit the report for publication.References1.Moura J. What is signal processing? [President’s message]. IEEE Signal Process Mag 2009;26:6.MouraJWhat is signal processing? [President’s message]IEEE Signal Process Mag20092662.Damasevicius R. Complexity estimation of genetic sequences using information-theoretic and frequency analysis methods. Informatica 2010;21:13–30.DamaseviciusRComplexity estimation of genetic sequences using information-theoretic and frequency analysis methodsInformatica20102113303.Liu B, Fang L, Liu F, Wang X, Chen J, Chou KC. Identification of real microRNA precursors with a pseudo structure status composition approach. PLoS One 2015;10:e0121501.LiuBFangLLiuFWangXChenJChouKCIdentification of real microRNA precursors with a pseudo structure status composition approachPLoS One201510e01215014.Liu B, Childs-Disney JL, Znosko BM, Wang D, Fallahi M, Gallo SM, et al. Analysis of secondary structural elements in human microRNA hairpin precursors. BMC Bioinform 2016;17:112.LiuBChilds-DisneyJLZnoskoBMWangDFallahiMGalloSMAnalysis of secondary structural elements in human microRNA hairpin precursorsBMC Bioinform2016171125.Ardenkani AM, Naeini MM. The role of microRNAs in human diseases’. Avicenna J Med Biotechnol 2010;2:161–79.ArdenkaniAMNaeiniMMThe role of microRNAs in human diseases’Avicenna J Med Biotechnol20102161796.Yoon BJ, Vaidyanathan RP. Computational identification and analysis of noncoding RNAs – unearthing the buried treasures in the genome. IEEE Signal Process Mag 2007;24:64–74.YoonBJVaidyanathanRPComputational identification and analysis of noncoding RNAs – unearthing the buried treasures in the genomeIEEE Signal Process Mag20072464747.Svoboda P, Cara AD. Hairpin RNA: a secondary structure of primary importance. Cell Mol Life Sci 2006;63:901–8.SvobodaPCaraADHairpin RNA: a secondary structure of primary importanceCell Mol Life Sci20066390188.RNA structure (molecular biology). Available at: http://what-when-how.com/molecular-biology/rna-structure-molecular-biology/. Accessed: 18 Jun 2017.RNA structure (molecular biology)Available at: http://what-when-how.com/molecular-biology/rna-structure-molecular-biology/Accessed: 18 Jun 20179.Moss WN. Computational prediction of RNA secondary structure. In: Lorsch J, editor. Methods in Enzymology: RNA. San Diego, USA: Elsevier, 2013:3–65.MossWNComputational prediction of RNA secondary structureLorschJMethods in Enzymology: RNASan Diego, USAElsevier201336510.Stockwell RG, Mansinha L, Lowe RP. Localization of the complex spectrum: the S transform. IEEE Trans Signal Process 1996;44:998–1001.StockwellRGMansinhaLLoweRPLocalization of the complex spectrum: the S transformIEEE Trans Signal Process199644998100111.Borkar PS, Mahajan AR. Different RNA secondary structure prediction methods. In: Electronic Systems, Signal Processing and Computing Technologies (ICESC), 2014 International Conference, 9 Jan 2014. Nagpur, India: IEEE, 2014:228–30.BorkarPSMahajanARDifferent RNA secondary structure prediction methodsIn:Electronic Systems, Signal Processing and Computing Technologies (ICESC), 2014 International Conference, 9 Jan 2014Nagpur, IndiaIEEE20142283012.Sharma SD, Saxena R, Sharma SN. Short tandem repeats detection in DNA sequences using modified S-transform. Int J Adv Eng Technol 2015;8:233–45.SharmaSDSaxenaRSharmaSNShort tandem repeats detection in DNA sequences using modified S-transformInt J Adv Eng Technol201582334513.Kozomara A, Griffiths-Jones S. miRBase: annotating high confidence microRNAs using deep sequencing data. Nucl Acids Res 2013;42:D68–73.KozomaraAGriffiths-JonesSmiRBase: annotating high confidence microRNAs using deep sequencing dataNucl Acids Res201342D687314.The Mfold web server. Available at: http://unafold.rna.albany.edu/?q=mfold. Accessed: 18 Jun 2017.The Mfold web serverAvailable at: http://unafold.rna.albany.edu/?q=mfoldAccessed: 18 Jun 201715.Zuker M. Mfold web server for nucleic acid folding and hybridization prediction. Nucl Acids Res 2003;31:3406–15.ZukerMMfold web server for nucleic acid folding and hybridization predictionNucl Acids Res20033134061516.Trifonov EN, Sussman JL. The pitch of chromatin DNA is reflected in its nucleotide sequence. Proc Natl Acad Sci USA 1980;77:3816–20.TrifonovENSussmanJLThe pitch of chromatin DNA is reflected in its nucleotide sequenceProc Natl Acad Sci USA19807738162017.Mrazek J. Comparative analysis of sequence periodicity among prokaryotic genomes points to differences in nucleoid structure and a relationship to gene expression. J Bacteriol 2010;192:3763–72.MrazekJComparative analysis of sequence periodicity among prokaryotic genomes points to differences in nucleoid structure and a relationship to gene expressionJ Bacteriol201019237637218.Li Z, Rana TM. Therapeutic targeting of microRNAs: current status and future challenges. Nat Rev Drug Discov 2014;13:622–38.LiZRanaTMTherapeutic targeting of microRNAs: current status and future challengesNat Rev Drug Discov20141362238Supplemental Material:The online version of this article offers supplementary material (https://doi.org/10.1515/bams-2017-0023).

Journal

Bio-Algorithms and Med-Systemsde Gruyter

Published: Dec 20, 2017

References