Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Tandem repeats detection in DNA sequences using Kaiser window based adaptive S-transform

Tandem repeats detection in DNA sequences using Kaiser window based adaptive S-transform IntroductionGenomic data sequence comprises four characters A, T, C and G representing the four nucleotides adenine, thymine, cytosine, and guanine, respectively. In certain DNA regions periodic repetition of particular character patterns is observed. These repeats can be in tandem or dispersed. Tandem repeats (TRs) have two or more contiguous copies of the nucleotide patterns in a DNA sequence, whereas dispersed repeats consist of two or more non-adjacent copies of a nucleotide pattern [1]. The identification of repeats in the DNA plays an important role because of their biological functionality and association with diseases [2]. Satellites, minisatellites, and microsatellites are three major subclasses of the TRs. Satellite repeats are long tandem arrays of size greater than 100 base pairs (bp), minisatellites are TRs of short units of about 9–80 bp, while microsatellites are highly repetitive sequences of 1–6 bp [3]. Numbers of deterministic and stochastic algorithms for the identification of TRs have been reported [4]. Tandem repeat finder (TRF) developed by Benson is a popular probabilistic method [5]. In deterministic category, signal processing (SP) based methods have been reported for the identification of TRs. Mapping of nucleotide bases into numeric domain allows the application of SP tools in TR detections [6]. Recently, a SP based method that uses adaptive S-transform (AST) has been reported for the identification of the microsatellites [3]. Gaussian window has been employed in AST algorithm. AST algorithm addressed the limitations of the other SP based methods like DFT-based spectral repeat finder [7], short time periodicity transform [8], exactly periodic subspace decomposition [9], quaternion periodicity transform [10], and auto-regressive model [1]. Also, it has been established in [3] that the performance of AST algorithm is superior to TRF, parametric spectral estimation (PSE), empirical mode wavelet decomposition [11], and Fireμsat2 [12]. However, AST applicability is restricted to only microsatellites and is not efficient and accurate in identifying TRs with periods greater than six. In this work this limitation of the AST algorithm has been addressed by using Kaiser window in place of Gaussian window and by making certain changes in the existing AST algorithm. S-transform is described in Materials and methods section. In Results section the proposed algorithm has been discussed. Simulation studies for the performance evaluation along with a comparative study with other methods are carried out in Comparative simulation study section. Finally, the paper is concluded in the Conclusions section.Materials and methodsS-transform using Kaiser windowThe S-transform is a hybrid of the short-time Fourier transform (STFT) and the wavelet transform (WT) [13]. STFT with its fixed window width is not able to detect and resolve low frequencies and suffers from poor time resolution at high frequencies. Wavelet transform addresses this issue, but it produces time-scale plots that are unsuitable for intuitive visual analysis. In contrast to the window of fixed length in STFT, S-transform employs frequency-dependent window function. Use of frequency-dependent window function in S-transform leads to sharp time localization at high frequencies and high frequency resolution at low frequencies. Therefore, S-transform can be employed for the analysis of signals with short duration. The phase information referenced to the time origin, which is missing in the WT, is also provided by the S-transform since it uses a Fourier kernel. Hence, S-transform is useful for time-frequency analysis of non-stationary signals as it enjoys the advantages of both STFT and WT. However, its performance suffers from poor energy concentration and large noise amplitudes at high frequencies for a noisy signal. Methods for energy concentration enhancement have been reported in [14]. Sejdic et al. [15] have used a frequency-dependent Kaiser window for the S-transform computation to provide diminished leakage of the signal components with improved concentration measure (CM). Time-frequency resolution has been improvised by optimizing the energy distribution in the time-frequency plane by minimizing the smearing of signal components in both time and frequency domains. Time-frequency concentration of the S-transform has been enhanced by a frequency-dependent Kaiser window given by equation (1):(1)Wk(t,f)=I0(α(f)1−t2)I0(α(f))$${W_k}(t,f) = \frac{{{I_0}\left( {\alpha (f)\sqrt {1 - {t^2}} } \right)}}{{{I_0}(\alpha (f))}}$$where I0(·) is the zeroth-order Bessel function of the first kind, and α(f) is a frequency-dependent parameter. Since window shape parameter, α(f), is frequency dependent, it can be tuned at each TR period to maximize the corresponding CM, as CM quantitatively evaluates the energy concentration of a time-frequency distribution. S-transform using frequency-dependent Kaiser window is obtained using equation (2).(2)Sα(m,n)=∑l=0N−1x(l)I0(α1−(m−l)2/I0(α))exp(−j2πnlN)$${S_\alpha }(m,n) = \sum\nolimits_{l = 0}^{N - 1} {x(l){I_0}\left( {\alpha \sqrt {1 - {{(m - l)}^2}} /{I_0}(\alpha )} \right)\exp \left( {\frac{{ - j2\pi nl}}{N}} \right)} $$CM for this time-frequency representation is obtained using equation (3):(3)CM(n)=∑m=0N−1(Sα(m,n)¯)r$${\text{CM(}}n{\text{)}} = \sum\nolimits_{m = 0}^{N - 1} {{{(\overline {{S_\alpha }(m,n)} )}^r}} $$where Sα(m,n)¯=(Sα(m,n))/(∑m=0N−1|Sα(m,n)|)$\overline {{S_\alpha }(m,n)} = ({S_\alpha }(m,n))/\left( {\sum\nolimits_{m = 0}^{N - 1} {|{S_\alpha }(m,n)|} } \right)$and the value of r is 1.1 which follows from [14]. While determining the optimal value of α to maximize the CM, two requirements should be met:Time-frequency tiling of Kaiser window based S-transform (KWST) should be similar to the tiling of the Gaussian window based S-transform (GWST).Chosen values of parameter α should produce auto terms [15] with similar widths in the time and frequency domains as obtained with the Gaussian window.The behavior of the Kaiser window for various values of the parameter α is displayed in Figure 1. As the value of α increases, the window becomes narrower in the time domain. Therefore, for a frequency-dependent Kaiser window to provide good frequency resolution at low frequencies and sharp time resolution at higher frequencies, the variation of α must be proportional to the frequency, i.e.Figure 1:Behavior of Kaiser window with parameter α.a(f)=βf$a(f) = \beta f$(4)where β is a constant. In [15], a linear variation of the parameter α(f) has been proposed asa(f)=πf$a(f) = \pi f$(5)This condition provides auto terms with almost the same widths in the time and frequency domain as the Gaussian window while providing narrower main lobe than Gaussian window. As Kaiser window enhances the energy concentration of the signals in comparison to the GWST, it has been used in this work.Further, to gain additional improvements in the energy concentration of the S-transform, an adaptive algorithm has been used for the automatic evaluation of the parameter α in place of deciding value of α using a linear relationship given by equation (5).Adaptive S-transform algorithm with Kaiser windowFollowing are the steps involved in Kaiser window based AST (KWAST). Starting with a period of 2, these steps are repeated up to a period of 15 to obtain optimized values of α at each period of interest.Mapping of DNA character sequence into a numeric sequenceComplex mapping scheme has been used to convert the character sequence into a numeric sequence because it reduces the computational complexity by 75% [6]. The following mapping performs this conversionA↔1+i, T↔1−i, G↔−1−i,$A \leftrightarrow 1 + i,{\text{ }}T \leftrightarrow 1 - i,{\text{ }}G \leftrightarrow - 1 - i,$C↔–1+i, where i stands for imaginary unit.S-transform computationIn this work in order to capture small as well as large periods, a Kaiser window with a length of 100 has been used. To provide the initial values to the steepest descend optimization algorithm, S-transform has been determined for α=1 and α=2. The implementation of S-transform with Kaiser window has been carried out in frequency domain using equations (6–8) [16].(6)S(pT,nNT)=∑m=0N−1X((m+n)NT)G(m)exp(i2πmpN)$$S\left( {pT,\frac{n}{{NT}}} \right) = \sum\nolimits_{m = 0}^{N - 1} {X\left( {\frac{{(m + n)}}{{NT}}} \right)G(m){\text{exp}}\left( {\frac{{i2\pi mp}}{N}} \right)} $$(7)G(m)=FFT(I0α1−(m−l)2)/I0(α)$$G(m) = FFT\left( {{I_0}\alpha \sqrt {1 - {{(m - l)}^2}} } \right)/{I_0}(\alpha )$$G(m) is the Fourier transform of the Kaiser window, and X(n) the Fourier transform of x(l) is given by(8)X(n/NT)=1N∑l=0N−1x(lT)exp(−(i2πnl)/N)$$X(n/NT) = \frac{1}{N}\sum\nolimits_{l = 0}^{N - 1} {x(lT){\text{exp(}} - (i2\pi nl)/N{\text{)}}} $$Calculation of concentration measureCM is calculated for the S-transform results of step (ii) using equation (3).Optimize α for maximizing CMTo determine the optimized value of α, least mean square (LMS) [17] algorithm has been used. The iterative relation used for this purpose is given by equation (9).α(k)=α(k−1)+20[CM(k−1)−CM(k−2)]/ [α(k−1)−α(k−2)]$\alpha (k) = \alpha (k - 1) + 20[{\text{CM(}}k - 1{\text{)}} - {\text{CM(}}k - 2{\text{)}}]/[\alpha (k - 1) - \alpha (k - 2)]$(9)At each period the value of α that maximizes the CM is obtained by iterative relation (9). Iterative computation starts with k=3, and the initial value for α(k–1)=α(2) is assigned as 2 and α(k–2)=α(1) is assigned 1. Corresponding CM(2) and CM(1) values are then calculated using equation (3) with α=2 and 1, respectively. After this initialization, the value of k is incremented by 1 at every iteration till it reaches its final value of 100. At every iteration value of α, and CM is updated using their previous values in equation (9). After 100 iterations, the value of α that provides maximum CM is then recorded as the optimum value of α for that period. The optimized α has been determined for the periods 2–15 for the sequence AE001381, using these steps, and the variations of α with respect to period for Kaiser window are shown in Figure 2. The variation of the standard deviation of Gaussian window with period using the AST algorithm for the sequence AE001381 is also shown in Figure 3. To capture large periods using Gaussian window, its standard deviation should increase proportionately. However, Figure 3 illustrates that the Gaussian window fails to achieve this adaptation in its standard deviation, this being the reason for the poor performance of Gaussian window based AST (GWAST) at large periods. In contrast, the Kaiser window shape parameter α in Figure 2 becomes smaller to capture large periods. This is in line with the relationship given by (4).Figure 2:Variation of optimal α values with period for Kaiser window.Figure 3:Variation of optimum standard deviations with period for Gaussian window.Because of this, the performance of KWAST is better than GWAST, at higher periods, without compromising with its ability to detect microsatellites. Time-frequency representation obtained using KWAST for AE001381 using the optimal α values is shown in Figure 4. This sequence has been selected for demonstration as it comprises short as well as long TRs. Final detection results are obtained by subjecting the time-frequency results of Figure 4 to screening, pre-processing, and verification phases. The methods for these steps are exactly similar to that described in [3] for the AST. In screening phase at every base position, maximum spectrum value is assigned a value of 1 if it is greater than a set threshold, else a 0 value is assigned to all spectrum values at that base position, resulting in a binary time-frequency representation. To determine the threshold value, at a base position, the average of the spectrum value is determined. This average is then multiplied with a constant 1.5 (determined experimentally) that fixes the threshold value for a particular base position.Figure 4:Time-frequency representation for AE001381 using KWAST.The screening phase thus takes care of occurrences of multiple periods at the same base position. Candidate repeats after screening phase are then subjected to preprocessing that further minimizes the false detections by removing the repeats that do not satisfy the condition of minimum two contiguous repeats for a valid TR. Verification phase finally checks false positives by allowing the retention of repeats with a permissible mismatch ratio of 0.6, which is similar to that which has been taken in earlier reported methods [1, 3]. Results obtained after screening, pre-processing, and verification phases are shown in Figures 5–7, respectively.Figure 5:Binary time-frequency representation using KWAST for AE001381 after screening.Figure 6:Binary time-frequency representation using KWAST for AE001381 after pre-processing.Figure 7:Binary time-frequency representation using KWAST for AE001381 after verification.ResultsFor comparative study genomic sequences with accession numbers X64775, AC004848, M65145, and AE001381 have been selected [18]. As these sequences have repeats up to a period of 15, they have been analyzed for periods 2–15 only. Analysis of these sequences has been carried out using TRF, PSE, GWAST, and KWAST. For the validation of the approximate tandem repeats (ATRs), a mismatch ratio of 0.6 [1] has been used for PSE, GWAST, and KWAST methods.Comparative simulation studyThe comparative results are shown in Table 1. To get a better picture of relative performance of the four methods, tabulated total detections have been plotted in Figure 8 with respect to period. The following inferences can be drawn from these results:Table 1:Comparative results.MethodpDetected copies in a sequenceTotal detections at a periodTotal TR detections at all periodsX64775AC004848M65145AE001381TRF2––18297315491314.7––2034.74–10.3–3545.35–––17176–––32327––––08–––18189––––010–––040411–––030312–––060613–––030314–––050515–––0808PSE2–2018297335540324.7––2044.74–10.3–3545.35–––17176–––32327–07––078–––18189––––010–––040411––4.51115.512–––060613–––030314–––050515–––0808GWAST204201534938875434837–04894–10191922215–211305396––06–067–05––058––––09––04–0410––––011––––012––02–0213––––014––––015––––0KWAST2–4444639727172136414155536464–02151151325–09–33426––0218207–040718298–14–58729–––141410–––020211–0406081812–0203040913–––020214––0300315–––0505Figure 8:Comparative TRs detection performance.GWAST and KWAST exhibit better performance in detecting microsatellites. Except at period 4, KWAST performance is superior to GWAST for microsatellites detections.GWAST performance in detecting ATRs with period greater than 6 is poor. However, KWAST performance for large repeats is almost at par with TRF and PSE.The total number of detections using KWAST are significantly large as compared to those of other methods.ConclusionsIn this work an algorithm for identifying TRs in the DNA sequence has been proposed. The method employs frequency-dependent Kaiser window in S-transform for enhancing CM. The optimum window shape parameter at different periods has been determined using LMS algorithm. The proposed method detected more microsatellites without compromising with large period detections. Thus, the proposed algorithm can be used for microsatellites as well as for minisatellites detection with better performance than other signal processing based reported methods.Author contributions: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.Research funding: None declared.Employment or leadership: None declared.Honorarium: None declared.Competing interests: The funding organization(s) played no role in the study design; in the collection, analysis, and interpretation of data; in the writing of the report; or in the decision to submit the report for publication.References1.Zhou H, Du L, Yan H. Detection of tandem repeats in DNA sequences based on parametric spectral estimation. IEEE Trans Inf Technol Biomed 2009;13:747–55.1927302010.1109/TITB.2008.920626ZhouHDuLYanHDetection of tandem repeats in DNA sequences based on parametric spectral estimationIEEE Trans Inf Technol Biomed200913747552.Mitas M. Trinucleotide repeats associated with human disease. Nucleic Acids Res 1997;25:2245–54.917107310.1093/nar/25.12.2245MitasMTrinucleotide repeats associated with human diseaseNucleic Acids Res1997252245543.Sharma SD, Saxena R, Sharma SN. Identification of microsatellites in DNA using adaptive S-transform. IEEE J Biomed Health Inform 2015;19:1097–105.10.1109/JBHI.2014.2330901http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000356511900037&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=b7bc2757938ac7a7a821505f8243d9f324951712SharmaSDSaxenaRSharmaSNIdentification of microsatellites in DNA using adaptive S-transformIEEE J Biomed Health Inform20151910971054.Grover A, Aishwarya V, Sharma PC. Searching microsatellites in DNA sequences: approaches used and tools developed. Physiol Mol Biol Plants 2012;18:11–9.2357303610.1007/s12298-011-0098-yGroverAAishwaryaVSharmaPCSearching microsatellites in DNA sequences: approaches used and tools developedPhysiol Mol Biol Plants2012181195.Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 1999;27:573–80.10.1093/nar/27.2.573BensonGTandem repeats finder: a program to analyze DNA sequencesNucleic Acids Res199927573806.Sharma SD, Shakya K, Sharma SN. Evaluation of DNA mapping schemes for exon detection. In: Computer, Communication and Electrical Technology (ICCCET), International Conference. IEEE, Tamilnadu, India, Mar 18, 2011:71–4.SharmaSDShakyaKSharmaSNEvaluation of DNA mapping schemes for exon detectionComputer, Communication and Electrical Technology (ICCCET), International ConferenceIEEE, Tamilnadu, IndiaMar 1820117147.Sharma D, Issac B, Raghava GP, Ramaswamy R. Spectral repeat finder (SRF): identification of repetitive sequences using Fourier transformation. Bioinformatics 2004;20:1405–12.10.1093/bioinformatics/bth10314976032SharmaDIssacBRaghavaGPRamaswamyRSpectral repeat finder (SRF): identification of repetitive sequences using Fourier transformationBioinformatics2004201405128.Buchner M, Janjarasjitt S. Detection and visualization of tandem repeats in DNA sequences. IEEE Trans Signal Process 2003;51:2280–7.10.1109/TSP.2003.815396BuchnerMJanjarasjittSDetection and visualization of tandem repeats in DNA sequencesIEEE Trans Signal Process200351228079.Ravi G, Divya S, Ankush M, Kuldip S. A novel signal processing measure to identify exact and inexact tandem repeat patterns in DNA sequences. EURASIP J Bioinform Syst Biol 2007;2007:43596.RaviGDivyaSAnkushMKuldipSA novel signal processing measure to identify exact and inexact tandem repeat patterns in DNA sequencesEURASIP J Bioinform Syst Biol200720074359610.Brodzik AK. Quaternionic periodicity transform: an algebraic solution to the tandem repeat detection problem. Bioinformatics 2007;23:694–700.1723705710.1093/bioinformatics/btl674http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000245511800006&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=b7bc2757938ac7a7a821505f8243d9f3BrodzikAKQuaternionic periodicity transform: an algebraic solution to the tandem repeat detection problemBioinformatics20072369470011.Jiang R, Yan H. Detection and 2-dimensional display of short tandem repeats based on signal decomposition. Int J Data Min Bioinform 2011;5:661–90.http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000299296400005&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=b7bc2757938ac7a7a821505f8243d9f310.1504/IJDMB.2011.04541622295750JiangRYanHDetection and 2-dimensional display of short tandem repeats based on signal decompositionInt J Data Min Bioinform201156619012.de Ridder C, Kourie DG, Watson BW, Fourie TR, Reyneke PV. Fine-tuning the search for microsatellites. J Discrete Algorithms 2013;20:21–37.10.1016/j.jda.2012.12.007de RidderCKourieDGWatsonBWFourieTRReynekePVFine-tuning the search for microsatellitesJ Discrete Algorithms201320213713.Stockwell RG, Mansinha L, Lowe RP. Localization of the complex spectrum: the S transform. IEEE Trans Signal Process 1996;44:998–1001.10.1109/78.492555StockwellRGMansinhaLLoweRPLocalization of the complex spectrum: the S transformIEEE Trans Signal Process199644998100114.Pei SC, Wang PW. Energy concentration enhancement using window width optimization in S transform. In: Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference. IEEE, Dallas, TX, USA, Mar 14, 2010:4106–9.PeiSCWangPWEnergy concentration enhancement using window width optimization in S transformAcoustics Speech and Signal Processing (ICASSP), 2010 IEEE International ConferenceIEEE, Dallas, TX, USAMar 1420104106915.Sejdic E, Djurovic I, Jiang J. S-transform with frequency dependent Kaiser window. In: Acoustics, Speech and Signal Processing. ICASSP 2007. IEEE International Conference. IEEE, Honolulu, HI, USA, Apr 15, 2007;3:III–1165–68.SejdicEDjurovicIJiangJS-transform with frequency dependent Kaiser windowAcoustics, Speech and Signal Processing. ICASSP 2007. IEEE International ConferenceIEEE, Honolulu, HI, USAApr 15, 20073III–11656816.Chilukuri MV, Dash PK. Multiresolution S-transform-based fuzzy recognition system for power quality events. IEEE Trans Power Del 2004;19:323–30.10.1109/TPWRD.2003.820180ChilukuriMVDashPKMultiresolution S-transform-based fuzzy recognition system for power quality eventsIEEE Trans Power Del2004193233017.Stanković L. A measure of some time-frequency distributions concentration. Sign Proc 2001;81:621–31.10.1016/S0165-1684(00)00236-XStankovićLA measure of some time-frequency distributions concentrationSign Proc2001816213118.National Center for Biotechnology Information: Available at: www.ncbi.nlm.nih.gov. Accessed: Mar 2015.National Center for Biotechnology Information: Available at: www.ncbi.nlm.nih.govAccessed: Mar 2015 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Bio-Algorithms and Med-Systems de Gruyter

Tandem repeats detection in DNA sequences using Kaiser window based adaptive S-transform

Loading next page...
 
/lp/de-gruyter/tandem-repeats-detection-in-dna-sequences-using-kaiser-window-based-JtMl0wwYs4

References (17)

Publisher
de Gruyter
Copyright
©2017 Walter de Gruyter GmbH, Berlin/Boston
ISSN
1896-530X
eISSN
1896-530X
DOI
10.1515/bams-2017-0014
Publisher site
See Article on Publisher Site

Abstract

IntroductionGenomic data sequence comprises four characters A, T, C and G representing the four nucleotides adenine, thymine, cytosine, and guanine, respectively. In certain DNA regions periodic repetition of particular character patterns is observed. These repeats can be in tandem or dispersed. Tandem repeats (TRs) have two or more contiguous copies of the nucleotide patterns in a DNA sequence, whereas dispersed repeats consist of two or more non-adjacent copies of a nucleotide pattern [1]. The identification of repeats in the DNA plays an important role because of their biological functionality and association with diseases [2]. Satellites, minisatellites, and microsatellites are three major subclasses of the TRs. Satellite repeats are long tandem arrays of size greater than 100 base pairs (bp), minisatellites are TRs of short units of about 9–80 bp, while microsatellites are highly repetitive sequences of 1–6 bp [3]. Numbers of deterministic and stochastic algorithms for the identification of TRs have been reported [4]. Tandem repeat finder (TRF) developed by Benson is a popular probabilistic method [5]. In deterministic category, signal processing (SP) based methods have been reported for the identification of TRs. Mapping of nucleotide bases into numeric domain allows the application of SP tools in TR detections [6]. Recently, a SP based method that uses adaptive S-transform (AST) has been reported for the identification of the microsatellites [3]. Gaussian window has been employed in AST algorithm. AST algorithm addressed the limitations of the other SP based methods like DFT-based spectral repeat finder [7], short time periodicity transform [8], exactly periodic subspace decomposition [9], quaternion periodicity transform [10], and auto-regressive model [1]. Also, it has been established in [3] that the performance of AST algorithm is superior to TRF, parametric spectral estimation (PSE), empirical mode wavelet decomposition [11], and Fireμsat2 [12]. However, AST applicability is restricted to only microsatellites and is not efficient and accurate in identifying TRs with periods greater than six. In this work this limitation of the AST algorithm has been addressed by using Kaiser window in place of Gaussian window and by making certain changes in the existing AST algorithm. S-transform is described in Materials and methods section. In Results section the proposed algorithm has been discussed. Simulation studies for the performance evaluation along with a comparative study with other methods are carried out in Comparative simulation study section. Finally, the paper is concluded in the Conclusions section.Materials and methodsS-transform using Kaiser windowThe S-transform is a hybrid of the short-time Fourier transform (STFT) and the wavelet transform (WT) [13]. STFT with its fixed window width is not able to detect and resolve low frequencies and suffers from poor time resolution at high frequencies. Wavelet transform addresses this issue, but it produces time-scale plots that are unsuitable for intuitive visual analysis. In contrast to the window of fixed length in STFT, S-transform employs frequency-dependent window function. Use of frequency-dependent window function in S-transform leads to sharp time localization at high frequencies and high frequency resolution at low frequencies. Therefore, S-transform can be employed for the analysis of signals with short duration. The phase information referenced to the time origin, which is missing in the WT, is also provided by the S-transform since it uses a Fourier kernel. Hence, S-transform is useful for time-frequency analysis of non-stationary signals as it enjoys the advantages of both STFT and WT. However, its performance suffers from poor energy concentration and large noise amplitudes at high frequencies for a noisy signal. Methods for energy concentration enhancement have been reported in [14]. Sejdic et al. [15] have used a frequency-dependent Kaiser window for the S-transform computation to provide diminished leakage of the signal components with improved concentration measure (CM). Time-frequency resolution has been improvised by optimizing the energy distribution in the time-frequency plane by minimizing the smearing of signal components in both time and frequency domains. Time-frequency concentration of the S-transform has been enhanced by a frequency-dependent Kaiser window given by equation (1):(1)Wk(t,f)=I0(α(f)1−t2)I0(α(f))$${W_k}(t,f) = \frac{{{I_0}\left( {\alpha (f)\sqrt {1 - {t^2}} } \right)}}{{{I_0}(\alpha (f))}}$$where I0(·) is the zeroth-order Bessel function of the first kind, and α(f) is a frequency-dependent parameter. Since window shape parameter, α(f), is frequency dependent, it can be tuned at each TR period to maximize the corresponding CM, as CM quantitatively evaluates the energy concentration of a time-frequency distribution. S-transform using frequency-dependent Kaiser window is obtained using equation (2).(2)Sα(m,n)=∑l=0N−1x(l)I0(α1−(m−l)2/I0(α))exp(−j2πnlN)$${S_\alpha }(m,n) = \sum\nolimits_{l = 0}^{N - 1} {x(l){I_0}\left( {\alpha \sqrt {1 - {{(m - l)}^2}} /{I_0}(\alpha )} \right)\exp \left( {\frac{{ - j2\pi nl}}{N}} \right)} $$CM for this time-frequency representation is obtained using equation (3):(3)CM(n)=∑m=0N−1(Sα(m,n)¯)r$${\text{CM(}}n{\text{)}} = \sum\nolimits_{m = 0}^{N - 1} {{{(\overline {{S_\alpha }(m,n)} )}^r}} $$where Sα(m,n)¯=(Sα(m,n))/(∑m=0N−1|Sα(m,n)|)$\overline {{S_\alpha }(m,n)} = ({S_\alpha }(m,n))/\left( {\sum\nolimits_{m = 0}^{N - 1} {|{S_\alpha }(m,n)|} } \right)$and the value of r is 1.1 which follows from [14]. While determining the optimal value of α to maximize the CM, two requirements should be met:Time-frequency tiling of Kaiser window based S-transform (KWST) should be similar to the tiling of the Gaussian window based S-transform (GWST).Chosen values of parameter α should produce auto terms [15] with similar widths in the time and frequency domains as obtained with the Gaussian window.The behavior of the Kaiser window for various values of the parameter α is displayed in Figure 1. As the value of α increases, the window becomes narrower in the time domain. Therefore, for a frequency-dependent Kaiser window to provide good frequency resolution at low frequencies and sharp time resolution at higher frequencies, the variation of α must be proportional to the frequency, i.e.Figure 1:Behavior of Kaiser window with parameter α.a(f)=βf$a(f) = \beta f$(4)where β is a constant. In [15], a linear variation of the parameter α(f) has been proposed asa(f)=πf$a(f) = \pi f$(5)This condition provides auto terms with almost the same widths in the time and frequency domain as the Gaussian window while providing narrower main lobe than Gaussian window. As Kaiser window enhances the energy concentration of the signals in comparison to the GWST, it has been used in this work.Further, to gain additional improvements in the energy concentration of the S-transform, an adaptive algorithm has been used for the automatic evaluation of the parameter α in place of deciding value of α using a linear relationship given by equation (5).Adaptive S-transform algorithm with Kaiser windowFollowing are the steps involved in Kaiser window based AST (KWAST). Starting with a period of 2, these steps are repeated up to a period of 15 to obtain optimized values of α at each period of interest.Mapping of DNA character sequence into a numeric sequenceComplex mapping scheme has been used to convert the character sequence into a numeric sequence because it reduces the computational complexity by 75% [6]. The following mapping performs this conversionA↔1+i, T↔1−i, G↔−1−i,$A \leftrightarrow 1 + i,{\text{ }}T \leftrightarrow 1 - i,{\text{ }}G \leftrightarrow - 1 - i,$C↔–1+i, where i stands for imaginary unit.S-transform computationIn this work in order to capture small as well as large periods, a Kaiser window with a length of 100 has been used. To provide the initial values to the steepest descend optimization algorithm, S-transform has been determined for α=1 and α=2. The implementation of S-transform with Kaiser window has been carried out in frequency domain using equations (6–8) [16].(6)S(pT,nNT)=∑m=0N−1X((m+n)NT)G(m)exp(i2πmpN)$$S\left( {pT,\frac{n}{{NT}}} \right) = \sum\nolimits_{m = 0}^{N - 1} {X\left( {\frac{{(m + n)}}{{NT}}} \right)G(m){\text{exp}}\left( {\frac{{i2\pi mp}}{N}} \right)} $$(7)G(m)=FFT(I0α1−(m−l)2)/I0(α)$$G(m) = FFT\left( {{I_0}\alpha \sqrt {1 - {{(m - l)}^2}} } \right)/{I_0}(\alpha )$$G(m) is the Fourier transform of the Kaiser window, and X(n) the Fourier transform of x(l) is given by(8)X(n/NT)=1N∑l=0N−1x(lT)exp(−(i2πnl)/N)$$X(n/NT) = \frac{1}{N}\sum\nolimits_{l = 0}^{N - 1} {x(lT){\text{exp(}} - (i2\pi nl)/N{\text{)}}} $$Calculation of concentration measureCM is calculated for the S-transform results of step (ii) using equation (3).Optimize α for maximizing CMTo determine the optimized value of α, least mean square (LMS) [17] algorithm has been used. The iterative relation used for this purpose is given by equation (9).α(k)=α(k−1)+20[CM(k−1)−CM(k−2)]/ [α(k−1)−α(k−2)]$\alpha (k) = \alpha (k - 1) + 20[{\text{CM(}}k - 1{\text{)}} - {\text{CM(}}k - 2{\text{)}}]/[\alpha (k - 1) - \alpha (k - 2)]$(9)At each period the value of α that maximizes the CM is obtained by iterative relation (9). Iterative computation starts with k=3, and the initial value for α(k–1)=α(2) is assigned as 2 and α(k–2)=α(1) is assigned 1. Corresponding CM(2) and CM(1) values are then calculated using equation (3) with α=2 and 1, respectively. After this initialization, the value of k is incremented by 1 at every iteration till it reaches its final value of 100. At every iteration value of α, and CM is updated using their previous values in equation (9). After 100 iterations, the value of α that provides maximum CM is then recorded as the optimum value of α for that period. The optimized α has been determined for the periods 2–15 for the sequence AE001381, using these steps, and the variations of α with respect to period for Kaiser window are shown in Figure 2. The variation of the standard deviation of Gaussian window with period using the AST algorithm for the sequence AE001381 is also shown in Figure 3. To capture large periods using Gaussian window, its standard deviation should increase proportionately. However, Figure 3 illustrates that the Gaussian window fails to achieve this adaptation in its standard deviation, this being the reason for the poor performance of Gaussian window based AST (GWAST) at large periods. In contrast, the Kaiser window shape parameter α in Figure 2 becomes smaller to capture large periods. This is in line with the relationship given by (4).Figure 2:Variation of optimal α values with period for Kaiser window.Figure 3:Variation of optimum standard deviations with period for Gaussian window.Because of this, the performance of KWAST is better than GWAST, at higher periods, without compromising with its ability to detect microsatellites. Time-frequency representation obtained using KWAST for AE001381 using the optimal α values is shown in Figure 4. This sequence has been selected for demonstration as it comprises short as well as long TRs. Final detection results are obtained by subjecting the time-frequency results of Figure 4 to screening, pre-processing, and verification phases. The methods for these steps are exactly similar to that described in [3] for the AST. In screening phase at every base position, maximum spectrum value is assigned a value of 1 if it is greater than a set threshold, else a 0 value is assigned to all spectrum values at that base position, resulting in a binary time-frequency representation. To determine the threshold value, at a base position, the average of the spectrum value is determined. This average is then multiplied with a constant 1.5 (determined experimentally) that fixes the threshold value for a particular base position.Figure 4:Time-frequency representation for AE001381 using KWAST.The screening phase thus takes care of occurrences of multiple periods at the same base position. Candidate repeats after screening phase are then subjected to preprocessing that further minimizes the false detections by removing the repeats that do not satisfy the condition of minimum two contiguous repeats for a valid TR. Verification phase finally checks false positives by allowing the retention of repeats with a permissible mismatch ratio of 0.6, which is similar to that which has been taken in earlier reported methods [1, 3]. Results obtained after screening, pre-processing, and verification phases are shown in Figures 5–7, respectively.Figure 5:Binary time-frequency representation using KWAST for AE001381 after screening.Figure 6:Binary time-frequency representation using KWAST for AE001381 after pre-processing.Figure 7:Binary time-frequency representation using KWAST for AE001381 after verification.ResultsFor comparative study genomic sequences with accession numbers X64775, AC004848, M65145, and AE001381 have been selected [18]. As these sequences have repeats up to a period of 15, they have been analyzed for periods 2–15 only. Analysis of these sequences has been carried out using TRF, PSE, GWAST, and KWAST. For the validation of the approximate tandem repeats (ATRs), a mismatch ratio of 0.6 [1] has been used for PSE, GWAST, and KWAST methods.Comparative simulation studyThe comparative results are shown in Table 1. To get a better picture of relative performance of the four methods, tabulated total detections have been plotted in Figure 8 with respect to period. The following inferences can be drawn from these results:Table 1:Comparative results.MethodpDetected copies in a sequenceTotal detections at a periodTotal TR detections at all periodsX64775AC004848M65145AE001381TRF2––18297315491314.7––2034.74–10.3–3545.35–––17176–––32327––––08–––18189––––010–––040411–––030312–––060613–––030314–––050515–––0808PSE2–2018297335540324.7––2044.74–10.3–3545.35–––17176–––32327–07––078–––18189––––010–––040411––4.51115.512–––060613–––030314–––050515–––0808GWAST204201534938875434837–04894–10191922215–211305396––06–067–05––058––––09––04–0410––––011––––012––02–0213––––014––––015––––0KWAST2–4444639727172136414155536464–02151151325–09–33426––0218207–040718298–14–58729–––141410–––020211–0406081812–0203040913–––020214––0300315–––0505Figure 8:Comparative TRs detection performance.GWAST and KWAST exhibit better performance in detecting microsatellites. Except at period 4, KWAST performance is superior to GWAST for microsatellites detections.GWAST performance in detecting ATRs with period greater than 6 is poor. However, KWAST performance for large repeats is almost at par with TRF and PSE.The total number of detections using KWAST are significantly large as compared to those of other methods.ConclusionsIn this work an algorithm for identifying TRs in the DNA sequence has been proposed. The method employs frequency-dependent Kaiser window in S-transform for enhancing CM. The optimum window shape parameter at different periods has been determined using LMS algorithm. The proposed method detected more microsatellites without compromising with large period detections. Thus, the proposed algorithm can be used for microsatellites as well as for minisatellites detection with better performance than other signal processing based reported methods.Author contributions: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.Research funding: None declared.Employment or leadership: None declared.Honorarium: None declared.Competing interests: The funding organization(s) played no role in the study design; in the collection, analysis, and interpretation of data; in the writing of the report; or in the decision to submit the report for publication.References1.Zhou H, Du L, Yan H. Detection of tandem repeats in DNA sequences based on parametric spectral estimation. IEEE Trans Inf Technol Biomed 2009;13:747–55.1927302010.1109/TITB.2008.920626ZhouHDuLYanHDetection of tandem repeats in DNA sequences based on parametric spectral estimationIEEE Trans Inf Technol Biomed200913747552.Mitas M. Trinucleotide repeats associated with human disease. Nucleic Acids Res 1997;25:2245–54.917107310.1093/nar/25.12.2245MitasMTrinucleotide repeats associated with human diseaseNucleic Acids Res1997252245543.Sharma SD, Saxena R, Sharma SN. Identification of microsatellites in DNA using adaptive S-transform. IEEE J Biomed Health Inform 2015;19:1097–105.10.1109/JBHI.2014.2330901http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000356511900037&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=b7bc2757938ac7a7a821505f8243d9f324951712SharmaSDSaxenaRSharmaSNIdentification of microsatellites in DNA using adaptive S-transformIEEE J Biomed Health Inform20151910971054.Grover A, Aishwarya V, Sharma PC. Searching microsatellites in DNA sequences: approaches used and tools developed. Physiol Mol Biol Plants 2012;18:11–9.2357303610.1007/s12298-011-0098-yGroverAAishwaryaVSharmaPCSearching microsatellites in DNA sequences: approaches used and tools developedPhysiol Mol Biol Plants2012181195.Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 1999;27:573–80.10.1093/nar/27.2.573BensonGTandem repeats finder: a program to analyze DNA sequencesNucleic Acids Res199927573806.Sharma SD, Shakya K, Sharma SN. Evaluation of DNA mapping schemes for exon detection. In: Computer, Communication and Electrical Technology (ICCCET), International Conference. IEEE, Tamilnadu, India, Mar 18, 2011:71–4.SharmaSDShakyaKSharmaSNEvaluation of DNA mapping schemes for exon detectionComputer, Communication and Electrical Technology (ICCCET), International ConferenceIEEE, Tamilnadu, IndiaMar 1820117147.Sharma D, Issac B, Raghava GP, Ramaswamy R. Spectral repeat finder (SRF): identification of repetitive sequences using Fourier transformation. Bioinformatics 2004;20:1405–12.10.1093/bioinformatics/bth10314976032SharmaDIssacBRaghavaGPRamaswamyRSpectral repeat finder (SRF): identification of repetitive sequences using Fourier transformationBioinformatics2004201405128.Buchner M, Janjarasjitt S. Detection and visualization of tandem repeats in DNA sequences. IEEE Trans Signal Process 2003;51:2280–7.10.1109/TSP.2003.815396BuchnerMJanjarasjittSDetection and visualization of tandem repeats in DNA sequencesIEEE Trans Signal Process200351228079.Ravi G, Divya S, Ankush M, Kuldip S. A novel signal processing measure to identify exact and inexact tandem repeat patterns in DNA sequences. EURASIP J Bioinform Syst Biol 2007;2007:43596.RaviGDivyaSAnkushMKuldipSA novel signal processing measure to identify exact and inexact tandem repeat patterns in DNA sequencesEURASIP J Bioinform Syst Biol200720074359610.Brodzik AK. Quaternionic periodicity transform: an algebraic solution to the tandem repeat detection problem. Bioinformatics 2007;23:694–700.1723705710.1093/bioinformatics/btl674http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000245511800006&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=b7bc2757938ac7a7a821505f8243d9f3BrodzikAKQuaternionic periodicity transform: an algebraic solution to the tandem repeat detection problemBioinformatics20072369470011.Jiang R, Yan H. Detection and 2-dimensional display of short tandem repeats based on signal decomposition. Int J Data Min Bioinform 2011;5:661–90.http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000299296400005&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=b7bc2757938ac7a7a821505f8243d9f310.1504/IJDMB.2011.04541622295750JiangRYanHDetection and 2-dimensional display of short tandem repeats based on signal decompositionInt J Data Min Bioinform201156619012.de Ridder C, Kourie DG, Watson BW, Fourie TR, Reyneke PV. Fine-tuning the search for microsatellites. J Discrete Algorithms 2013;20:21–37.10.1016/j.jda.2012.12.007de RidderCKourieDGWatsonBWFourieTRReynekePVFine-tuning the search for microsatellitesJ Discrete Algorithms201320213713.Stockwell RG, Mansinha L, Lowe RP. Localization of the complex spectrum: the S transform. IEEE Trans Signal Process 1996;44:998–1001.10.1109/78.492555StockwellRGMansinhaLLoweRPLocalization of the complex spectrum: the S transformIEEE Trans Signal Process199644998100114.Pei SC, Wang PW. Energy concentration enhancement using window width optimization in S transform. In: Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference. IEEE, Dallas, TX, USA, Mar 14, 2010:4106–9.PeiSCWangPWEnergy concentration enhancement using window width optimization in S transformAcoustics Speech and Signal Processing (ICASSP), 2010 IEEE International ConferenceIEEE, Dallas, TX, USAMar 1420104106915.Sejdic E, Djurovic I, Jiang J. S-transform with frequency dependent Kaiser window. In: Acoustics, Speech and Signal Processing. ICASSP 2007. IEEE International Conference. IEEE, Honolulu, HI, USA, Apr 15, 2007;3:III–1165–68.SejdicEDjurovicIJiangJS-transform with frequency dependent Kaiser windowAcoustics, Speech and Signal Processing. ICASSP 2007. IEEE International ConferenceIEEE, Honolulu, HI, USAApr 15, 20073III–11656816.Chilukuri MV, Dash PK. Multiresolution S-transform-based fuzzy recognition system for power quality events. IEEE Trans Power Del 2004;19:323–30.10.1109/TPWRD.2003.820180ChilukuriMVDashPKMultiresolution S-transform-based fuzzy recognition system for power quality eventsIEEE Trans Power Del2004193233017.Stanković L. A measure of some time-frequency distributions concentration. Sign Proc 2001;81:621–31.10.1016/S0165-1684(00)00236-XStankovićLA measure of some time-frequency distributions concentrationSign Proc2001816213118.National Center for Biotechnology Information: Available at: www.ncbi.nlm.nih.gov. Accessed: Mar 2015.National Center for Biotechnology Information: Available at: www.ncbi.nlm.nih.govAccessed: Mar 2015

Journal

Bio-Algorithms and Med-Systemsde Gruyter

Published: Sep 26, 2017

There are no references for this article.