Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Finding Essential Attributes from Binary Data

Finding Essential Attributes from Binary Data We consider data sets that consist of n-dimensional binary vectors representing positive and negative examples for some (possibly unknown) phenomenon. A subset S of the attributes (or variables) of such a data set is called a support set if the positive and negative examples can be distinguished by using only the attributes in S. In this paper we study the problem of finding small support sets, a frequently arising task in various fields, including knowledge discovery, data mining, learning theory, logical analysis of data, etc. We study the distribution of support sets in randomly generated data, and discuss why finding small support sets is important. We propose several measures of separation (real valued set functions over the subsets of attributes), formulate optimization models for finding the smallest subsets maximizing these measures, and devise efficient heuristic algorithms to solve these (typically NP-hard) optimization problems. We prove that several of the proposed heuristics have a guaranteed constant approximation ratio, and we report on computational experience comparing these heuristics with some others from the literature both on randomly generated and on real world data sets. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Annals of Mathematics and Artificial Intelligence Springer Journals

Loading next page...
 
/lp/springer-journals/finding-essential-attributes-from-binary-data-eROfo505GD

References (69)

Publisher
Springer Journals
Copyright
Copyright © 2003 by Kluwer Academic Publishers
Subject
Computer Science; Artificial Intelligence (incl. Robotics); Mathematics, general; Computer Science, general; Complex Systems
ISSN
1012-2443
eISSN
1573-7470
DOI
10.1023/A:1024653703689
Publisher site
See Article on Publisher Site

Abstract

We consider data sets that consist of n-dimensional binary vectors representing positive and negative examples for some (possibly unknown) phenomenon. A subset S of the attributes (or variables) of such a data set is called a support set if the positive and negative examples can be distinguished by using only the attributes in S. In this paper we study the problem of finding small support sets, a frequently arising task in various fields, including knowledge discovery, data mining, learning theory, logical analysis of data, etc. We study the distribution of support sets in randomly generated data, and discuss why finding small support sets is important. We propose several measures of separation (real valued set functions over the subsets of attributes), formulate optimization models for finding the smallest subsets maximizing these measures, and devise efficient heuristic algorithms to solve these (typically NP-hard) optimization problems. We prove that several of the proposed heuristics have a guaranteed constant approximation ratio, and we report on computational experience comparing these heuristics with some others from the literature both on randomly generated and on real world data sets.

Journal

Annals of Mathematics and Artificial IntelligenceSpringer Journals

Published: Oct 10, 2004

There are no references for this article.