Fast, Practical Algorithms for Computing All the Repeats in a String

Simon Puglisi; W. Smyth; Munina Yusufu

doi:10.1007/s11786-010-0033-6

Loading next page...

References (25)

M. Abouelhoda, S. Kurtz, Enno Ohlebusch (2004)
Replacing suffix trees with enhanced suffix arrays
J. Discrete Algorithms, 2
Sung Shin, Sam Kim (2005)
A new algorithm for detecting low-complexity regions in protein sequences
Bioinformatics, 21 2
G. Manzini, P. Ferragina (2004)
Engineering a lightweight suffix array construction algorithm
Algorithmica, 40
A. Earls (2004)
Digital equipment corporation.
Analytical chemistry, 53 9
J. Larsson, A. Moffat (2000)
Off-line dictionary-based compression
Proc. IEEE, 88
G. Manzini, P. Ferragina (2004)
Engineering a Lightweight Suffix Array Construction Algorithm
Algorithmica, 40
Juha Kärkkäinen, P. Sanders (2003)
Simple Linear Work Suffix Array Construction
G. Brodal, Rune Lyngsø, Christian Pedersen, J. Stoye (1999)
Finding Maximal Pairs with Bounded Gap
S. Puglisi, W. Smyth, M. Yusufu (2008)
Fast Optimal Algorithms for Computing All the Repeats in a String
S. Karlin, G. Ghandour, F. Ost, S. Tavaré, L. Korn (1983)
New approaches for computer analysis of nucleic acid sequences.
Proceedings of the National Academy of Sciences of the United States of America, 80 18
Toru Kasai, Gunho Lee, Hiroki Arimura, S. Arikawa, Kunsoo Park (2001)
Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications
M. Crochemore, C. Hancart, T. Lecroq (2007)
Algorithms on strings
D. Gusfield (1997)
Algorithms on Strings, Trees & Sequences
A. Turpin, W. Smyth (2002)
An Approach to Phrase Selection for Offline Data Compression
N. Larsson, Alistair Moffat (2000)
Offline dictionary-based compression
Proceedings DCC'99 Data Compression Conference (Cat. No. PR00096)
P. Ko, S. Aluru (2003)
Space efficient linear time construction of suffix arrays
J. Discrete Algorithms, 3
W. Smyth (2003)
The maximum number of of runs in a string
F. Franek, W. Smyth, Yudong Tang (2003)
Computing all Repeats Using Suffix Arrays
J. Autom. Lang. Comb., 8
S. Puglisi, A. Turpin (2008)
Space-Time Tradeoffs for Longest-Common-Prefix Array Computation
S. Puglisi, W. Smyth, A. Turpin (2007)
A taxonomy of suffix array construction algorithms
Y. Bernstein, J. Zobel (2006)
Accurate discovery of co-derivative documents via duplicate text detection
Inf. Syst., 31
Giovanni Manzini (2004)
Two Space Saving Tricks for Linear Time LCP Array Computation
J. Merwe, D. Dawoud, Stephen McDonald (2007)
A survey on peer-to-peer key management for mobile ad hoc networks
ACM Comput. Surv., 39
A. Apostolico, S. Lonardi (2000)
Off-line compression by greedy textual substitution
Proceedings of the IEEE, 88
M. Burrows, D. L, R. Taylor, D. Wheeler, D. Wheeler (1994)
A Block-sorting Lossless Data Compression Algorithm

Publisher: Springer Journals
Copyright: Copyright © 2010 by Birkhäuser / Springer Basel AG
Subject: Mathematics; Mathematics, general; Computer Science, general
ISSN: 1661-8270
eISSN: 1661-8289
DOI: 10.1007/s11786-010-0033-6
Publisher site: See Article on Publisher Site

Abstract

Given a string x = x[1..n] on an alphabet of size α, and a threshold p min ≥ 1, we describe four variants of an algorithm PSY1 that, using a suffix array, computes all the complete nonextendible repeats in x of length p ≥ p min . The basic algorithm PSY1–1 and its simple extension PSY1–2 are fast on strings that occur in biological, natural language and other applications (not highly periodic strings), while PSY1–3 guarantees Θ(n) worst-case execution time. The final variant, PSY1–4, also achieves Θ(n) processing time and, over the complete range of strings tested, is the fastest of the four. The space requirement of all four algorithms is about 5n bytes, but all make use of the “longest common prefix” (LCP) array, whose construction requires about 6n bytes. The four algorithms are faster in applications and use less space than a recently-proposed algorithm (Narisawa in Proceedings of 18th Annual Symposium on Combinatorial Pattern Matching, pp. 340–351, 2007) that produces equivalent output. The suffix array is not explicitly used by algorithms PSY1, but may be required for postprocessing; in this case, storage requirements rise to 9n bytes. We also describe two variants of a fast Θ(n)-time algorithm PSY2 for computing all complete supernonextendible repeats in x.

Journal

Mathematics in Computer Science – Springer Journals

Published: Apr 15, 2010

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Fast, Practical Algorithms for Computing All the Repeats in a String

Fast, Practical Algorithms for Computing All the Repeats in a String

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Fast, Practical Algorithms for Computing All the Repeats in a String

Fast, Practical Algorithms for Computing All the Repeats in a String

References (25)

Abstract

Journal

Recommended Articles

There are no references for this article.

Our policy towards the use of cookies