Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Approximate distributed top-k queries

Approximate distributed top-k queries We consider a distributed system where each node keeps a local count for items (similar to elections where nodes are ballot boxes and items are candidates). A top-k query in such a system asks which are the k items whose global count, across all nodes in the system, is the largest. In this paper, we present a Monte Carlo algorithm that outputs, with high probability, a set of k candidates which approximates the top-k items. The algorithm is motivated by sensor networks in that it focuses on reducing the individual communication complexity. In contrast to previous algorithms, the communication complexity depends only on the global scores and not on the partition of scores among nodes. If the number of nodes is large, our algorithm dramatically reduces the communication complexity when compared with deterministic algorithms. We show that the complexity of our algorithm is close to a lower bound on the cell-probe complexity of any non-interactive top-k approximation algorithm. We show that for some natural global distributions (such as the Geometric or Zipf distributions), our algorithm needs only polylogarithmic number of communication bits per node. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Distributed Computing Springer Journals

Approximate distributed top-k queries

Distributed Computing , Volume 21 (1) – Mar 19, 2008

Loading next page...
 
/lp/springer-journals/approximate-distributed-top-k-queries-0MIzV00yaK

References (28)

Publisher
Springer Journals
Copyright
Copyright © 2008 by Springer-Verlag
Subject
Computer Science; Theory of Computation ; Software Engineering/Programming and Operating Systems ; Computer Systems Organization and Communication Networks; Computer Hardware ; Computer Communication Networks
ISSN
0178-2770
eISSN
1432-0452
DOI
10.1007/s00446-008-0055-3
Publisher site
See Article on Publisher Site

Abstract

We consider a distributed system where each node keeps a local count for items (similar to elections where nodes are ballot boxes and items are candidates). A top-k query in such a system asks which are the k items whose global count, across all nodes in the system, is the largest. In this paper, we present a Monte Carlo algorithm that outputs, with high probability, a set of k candidates which approximates the top-k items. The algorithm is motivated by sensor networks in that it focuses on reducing the individual communication complexity. In contrast to previous algorithms, the communication complexity depends only on the global scores and not on the partition of scores among nodes. If the number of nodes is large, our algorithm dramatically reduces the communication complexity when compared with deterministic algorithms. We show that the complexity of our algorithm is close to a lower bound on the cell-probe complexity of any non-interactive top-k approximation algorithm. We show that for some natural global distributions (such as the Geometric or Zipf distributions), our algorithm needs only polylogarithmic number of communication bits per node.

Journal

Distributed ComputingSpringer Journals

Published: Mar 19, 2008

There are no references for this article.