Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Discovering General Prominent Streaks in Sequence Data

Discovering General Prominent Streaks in Sequence Data Discovering General Prominent Streaks in Sequence Data GENSHENG ZHANG, The University of Texas at Arlington XIAO JIANG, Shanghai Jiao Tong University PING LUO, HP Labs China MIN WANG, Google Research CHENGKAI LI, The University of Texas at Arlington This article studies the problem of prominent streak discovery in sequence data. Given a sequence of values, a prominent streak is a long consecutive subsequence consisting of only large (small) values, such as consecutive games of outstanding performance in sports, consecutive hours of heavy network traffic, and consecutive days of frequent mentioning of a person in social media. Prominent streak discovery provides insightful data patterns for data analysis in many real-world applications and is an enabling technique for computational journalism. Given its real-world usefulness and complexity, the research on prominent streaks in sequence data opens a spectrum of challenging problems. A baseline approach to finding prominent streaks is a quadratic algorithm that exhaustively enumerates all possible streaks and performs pairwise streak dominance comparison. For more efficient methods, we make the observation that prominent streaks are in fact skyline points in two dimensions--streak interval length and minimum value in the interval. Our solution thus hinges on the idea to separate the http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png ACM Transactions on Knowledge Discovery from Data (TKDD) Association for Computing Machinery

Loading next page...
 
/lp/association-for-computing-machinery/discovering-general-prominent-streaks-in-sequence-data-dEszgNk7WP

References (40)

Publisher
Association for Computing Machinery
Copyright
Copyright © 2014 by ACM Inc.
ISSN
1556-4681
DOI
10.1145/2601439
Publisher site
See Article on Publisher Site

Abstract

Discovering General Prominent Streaks in Sequence Data GENSHENG ZHANG, The University of Texas at Arlington XIAO JIANG, Shanghai Jiao Tong University PING LUO, HP Labs China MIN WANG, Google Research CHENGKAI LI, The University of Texas at Arlington This article studies the problem of prominent streak discovery in sequence data. Given a sequence of values, a prominent streak is a long consecutive subsequence consisting of only large (small) values, such as consecutive games of outstanding performance in sports, consecutive hours of heavy network traffic, and consecutive days of frequent mentioning of a person in social media. Prominent streak discovery provides insightful data patterns for data analysis in many real-world applications and is an enabling technique for computational journalism. Given its real-world usefulness and complexity, the research on prominent streaks in sequence data opens a spectrum of challenging problems. A baseline approach to finding prominent streaks is a quadratic algorithm that exhaustively enumerates all possible streaks and performs pairwise streak dominance comparison. For more efficient methods, we make the observation that prominent streaks are in fact skyline points in two dimensions--streak interval length and minimum value in the interval. Our solution thus hinges on the idea to separate the

Journal

ACM Transactions on Knowledge Discovery from Data (TKDD)Association for Computing Machinery

Published: Jun 1, 2014

There are no references for this article.