Access the full text.
Sign up today, get an introductory month for just $19.
J. Friedman, J. Meulman (2004)
Clustering objects on subsets of attributes (with discussion)Journal of the Royal Statistical Society: Series B (Statistical Methodology), 66
Alexander Hinneburg, C. Aggarwal, D. Keim (2000)
What Is the Nearest Neighbor in High Dimensional Spaces?
P. Tan, M. Steinbach, Vipin Kumar (2005)
Introduction to Data Mining, (First Edition)
Jiong Yang, Wei Wang, Haixun Wang, Philip Yu (2002)
/spl delta/-clusters: capturing subspace correlation in a large data setProceedings 18th International Conference on Data Engineering
C. Aggarwal, Philip Yu (2000)
Finding generalized projected clusters in high dimensional spaces
P. Perner (2002)
Data Mining - Concepts and TechniquesKünstliche Intell., 16
I. Assent, Ralph Krieger, Emmanuel Müller, T. Seidl (2007)
VISA: visual subspace clustering analysisSIGKDD Explor., 9
J. Friedman (2002)
Clustering objects on subsets of attributes
I. Jolliffe (2003)
Principal Component AnalysisTechnometrics, 45
M. Ankerst, M. Breunig, H. Kriegel, J. Sander (1999)
OPTICS: ordering points to identify the clustering structure
R. Agrawal, R. Srikant (1994)
Fast Algorithms for Mining Association Rules in Large Databases
Elke Achtert, C. Böhm, H. Kriegel, Peer Kröger, A. Zimek (2007)
Robust, Complete, and Efficient Correlation Clustering
Shigeo DrEng (2001)
Pattern Classification
Jan Ihmels, S. Bergmann, N. Barkai (2004)
Defining transcription modules using large-scale gene expression dataBioinformatics, 20 13
J. Pfaltz (2007)
What Constitutes a Scientific Database?19th International Conference on Scientific and Statistical Database Management (SSDBM 2007)
C. Faloutsos, V. Megalooikonomou (2007)
On data mining, compression, and Kolmogorov complexityData Mining and Knowledge Discovery, 15
C. Aggarwal, Cecilia Procopiuc, J. Wolf, Philip Yu, Jong Park (1999)
Fast algorithms for projected clustering
Haixun Wang, Wei Wang, Jiong Yang, Philip Yu (2002)
Clustering by pattern similarity in large data sets
H. Kriegel, Peer Kröger, A. Zimek (2008)
Detecting clusters in moderate-to-high dimensional data: subspace clustering, pattern-based clustering, and correlation clusteringProc. VLDB Endow., 1
Sébastien Ferré, S. Rudolph (2009)
Formal Concept Analysis
R. Wille (1999)
Formal Concept Analysis: Tutorial on formal concept analysisElectron. Notes Discret. Math., 2
K. Murthy, H. Kriegel, Peer Kröger (2004)
Density-Connected Subspace Clustering for High-Dimensional Data
Kyoung-Gu Woo, Jeong-Hoon Lee, Myoung-Ho Kim, Yoon-Joon Lee (2004)
FINDIT: a fast and intelligent subspace clustering algorithm using dimension votingInf. Softw. Technol., 46
Bing Liu, Yiyuan Xia, Philip Yu (2000)
Clustering through decision tree construction
M. Fischler, R. Bolles (1981)
Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartographyCommun. ACM, 24
E. Segal, B. Taskar, A. Gasch, N. Friedman, D. Koller (2001)
Rich probabilistic models for gene expressionBioinformatics, 17 Suppl 1
A. Prelic, S. Bleuler, Philip Zimmermann, Anja Wille, P. Bühlmann, W. Gruissem, L. Hennig, L. Thiele, E. Zitzler (2006)
A systematic comparison and evaluation of biclustering methods for gene expression dataBioinformatics, 22 9
Hyuk Cho, I. Dhillon, Yuqiang Guan, S. Sra (2004)
Minimum Sum-Squared Residue Co-Clustering of Gene Expression Data
C. Böhm, K. Murthy, H. Kriegel, Peer Kröger (2004)
Density connected clustering with local subspace preferencesFourth IEEE International Conference on Data Mining (ICDM'04)
A. Tung, Xin Xu, B. Ooi (2005)
CURLER: finding and visualizing nonlinear correlation clusters
Flip Korn, Bernd-Uwe Pagel, C. Faloutsos (2001)
On the 'Dimensionality Curse' and the 'Self-Similarity Blessing'IEEE Trans. Knowl. Data Eng., 13
C. Bouveyron, S. Girard, C. Schmid (2006)
High-dimensional data clusteringComput. Stat. Data Anal., 52
Guimei Liu, Jinyan Li, Kelvin Sim, L. Wong (2007)
Distance Based Subspace Clustering with Flexible Dimension Partitioning2007 IEEE 23rd International Conference on Data Engineering
Christopher Bishop (2006)
Pattern Recognition and Machine Learning (Information Science and Statistics)
J. Hartigan (1972)
Direct Clustering of a Data MatrixJournal of the American Statistical Association, 67
Elisabeth Georgii, L. Richter, U. Rückert, Stefan Kramer (2005)
Analyzing microarray data using quantitative association rulesBioinformatics, 21 Suppl 2
A. Califano, G. Stolovitzky, Y. Tu (2000)
Analysis of Gene Expression Microarrays for Phenotype ClassificationProceedings. International Conference on Intelligent Systems for Molecular Biology, 8
G. Moise, J. Sander (2008)
Finding non-redundant, statistically significant regions in high dimensional data: a novel approach to projected and subspace clustering
D. Ruppert (2004)
The Elements of Statistical Learning: Data Mining, Inference, and PredictionJournal of the American Statistical Association, 99
R. Agrawal, J. Gehrke, D. Gunopulos, P. Raghavan (1998)
Automatic subspace clustering of high dimensional data for data mining applications
H. Kriegel, Peer Kröger, Erich Schubert, A. Zimek (2008)
A General Framework for Increasing the Robustness of PCA-Based Correlation Clustering Algorithms
C. Böhm, H. Kriegel (2000)
Dynamically Optimizing High-Dimensional Index Structures
R. Weber, H. Schek, Stephen Blott (1998)
A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces
T. Murali, S. Kasif (2002)
Extracting Conserved Gene Expression Motifs from Gene Expression DataPacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
J. Slagle, Chin-Liang Chang, S. Heller (1975)
A Clustering and Data-Reorganizing AlgorithmIEEE Transactions on Systems, Man, and Cybernetics, SMC-5
L. Jing, M. Ng, J. Huang (2007)
An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse DataIEEE Transactions on Knowledge and Data Engineering, 19
Stefan Berchtold, Bernhard Ertl, D. Keim, H. Kriegel, T. Seidl (1998)
Fast nearest neighbor search in high-dimensional spaceProceedings 14th International Conference on Data Engineering
A. Belussi, C. Faloutsos (1995)
Estimating the Selectivity of Spatial Queries Using the 'Correlation' Fractal Dimension
C. Domeniconi, Dimitris Papadopoulos, D. Gunopulos, Sheng Ma (2004)
Subspace Clustering of High Dimensional Data
R. Haralick, R. Harpaz (2005)
Linear Manifold Clustering
Cecilia Procopiuc, Michael Jones, Florham Park, P. Agarwal, T. Murali (2002)
A Monte Carlo algorithm for fast projective clustering
Kelvin Sim, Jinyan Li, V. Gopalkrishnan, Guimei Liu (2006)
Mining Maximal Quasi-Bicliques to Co-Cluster Stocks and Financial Ratios for Value InvestmentSixth International Conference on Data Mining (ICDM'06)
I. Dhillon (2001)
Co-clustering documents and words using bipartite spectral graph partitioning
G. Getz, Erel Levine, E. Domany (2000)
Coupled two-way clustering analysis of gene microarray data.Proceedings of the National Academy of Sciences of the United States of America, 97 22
Qizheng Sheng, Y. Moreau, B. Moor (2003)
Biclustering microarray data by Gibbs samplingBioinformatics, 19 Suppl 2
A. Ben-Dor, B. Chor, R. Karp, Z. Yakhini (2002)
Discovering local structure in gene expression data: the order-preserving submatrix problemJournal of computational biology : a journal of computational molecular cell biology, 10 3-4
Jiuyong Li, Xiaodi Huang, Clinton Selke, J. Yong (2007)
A Fast Algorithm for Finding Correlation Clusters in Noise Data
Stefan Berchtold, C. Böhm, H. Jagadish, H. Kriegel, J. Sander (2000)
Independent quantization: an index compression technique for high-dimensional data spacesProceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073)
Hao Cheng, K. Hua, Khanh Vu (2008)
Constrained locally weighted clusteringProc. VLDB Endow., 1
Stefan Berchtold, D. Keim, H. Kriegel (2001)
The X-tree : An Index Structure for High-Dimensional Data
J. Kettenring (2008)
A Perspective on Cluster AnalysisStatistical Analysis and Data Mining: The ASA Data Science Journal, 1
Radford Neal (2006)
Pattern Recognition and Machine LearningPattern Recognition and Machine Learning
S. Hong (1997)
Data miningFuture Gener. Comput. Syst., 13
Stefan Berchtold, Christian Böhm, H. Kriegel (1998)
The pyramid-technique: towards breaking the curse of dimensionality
R. Harpaz, R. Haralick (2007)
LINEAR MANIFOLD CORRELATION CLUSTERING
C. Faloutsos, I. Kamel (1994)
Beyond uniformity and independence: analysis of R-trees using the concept of fractal dimensionProceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
P. Tan, M. Steinbach, Vipin Kumar (2005)
Introduction to Data Mining
Elke Achtert, H. Kriegel, A. Zimek (2008)
ELKI: A Software System for Evaluation of Subspace Clustering Algorithms
K. Chakrabarti, S. Mehrotra (2000)
Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces
D. Jiang, Chun Tang, A. Zhang (2004)
Cluster analysis for gene expression data: a surveyIEEE Transactions on Knowledge and Data Engineering, 16
G. Moise, J. Sander, M. Ester
Under Consideration for Publication in Knowledge and Information Systems Robust Projected Clustering
T. Hastie, R. Tibshirani, J. Friedman (2001)
The Elements of Statistical Learning
Elaine Machado, A. Traina, C. Faloutsos (2002)
How to Use the Fractal Dimension to Find Correlations between Attributes
C. Böhm, K. Murthy, Peer Kröger, A. Zimek (2004)
Computing Clusters of Correlation Connected objects
Elke Achtert, C. Böhm, H. Kriegel, Peer Kröger, A. Zimek (2006)
Deriving quantitative models for correlation clusters
Elke Achtert, C. Böhm, Peer Kröger, A. Zimek (2006)
Mining Hierarchies of Correlation Clusters18th International Conference on Scientific and Statistical Database Management (SSDBM'06)
S. Parsons (2004)
Principles of Data Mining by David J. Hand, Heikki Mannila and Padhraic Smyth, MIT Press, 546 pp., £34.50, ISBN 0-262-08290-XThe Knowledge Engineering Review, 19
J. Pei, Xiaoling Zhang, Moonjung Cho, Haixun Wang, Philip Yu (2003)
MaPle: a fast algorithm for maximal pattern-based clusteringThird IEEE International Conference on Data Mining
M. Freimer, R. Bellman (1961)
Adaptive Control Processes: A Guided TourThe Mathematical Gazette, 46
Yizong Cheng, G. Church (2000)
Biclustering of Expression DataProceedings. International Conference on Intelligent Systems for Molecular Biology, 8
I. Mechelen, H. Bock, P. Boeck (2004)
Two-mode clustering methods: astructuredoverviewStatistical Methods in Medical Research, 13
K. Beyer, J. Goldstein, R. Ramakrishnan, U. Shaft (1999)
When Is ''Nearest Neighbor'' Meaningful?
Jinze Liu, Wei Wang (2003)
OP-cluster: clustering by tendency in high dimensional spaceThird IEEE International Conference on Data Mining
King-Ip Lin, H. Jagadish, C. Faloutsos (1994)
The TV-tree: An index structure for high-dimensional dataThe VLDB Journal, 3
(2000)
Efficient construction of large high-dimensional indexes
Jiawei Han, M. Kamber (2000)
Data Mining: Concepts and Techniques
R. Harpaz, R. Haralick (2007)
Mining Subspace Correlations2007 IEEE Symposium on Computational Intelligence and Data Mining
Kevin Yip, D. Cheung, M. Ng (2004)
HARP: a practical projected clustering algorithmIEEE Transactions on Knowledge and Data Engineering, 16
(1997)
Mach. Learn
Geoffrey Webb (2001)
Discovering associations with numeric variables
Eamonn Keogh (2010)
Nearest Neighbor
D. Hand, H. Mannila, Padhraic Smyth (2001)
Principles of Data MiningDrug Safety, 30
B. Mirkin (1996)
Mathematical Classification and ClusteringJournal of the Operational Research Society, 48
U. Rückert, L. Richter, Stefan Kramer (2004)
Quantitative association rules based on half-spaces: an optimization approachFourth IEEE International Conference on Data Mining (ICDM'04)
M. Ester, H. Kriegel, J. Sander, Xiaowei Xu (1996)
A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise
Bernd-Uwe Pagel, Flip Korn, C. Faloutsos (2000)
Deflating the dimensionality curse using multiple fractal dimensionsProceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073)
I. Assent, Ralph Krieger, Emmanuel Müller, T. Seidl (2007)
DUSC: Dimensionality Unbiased Subspace ClusteringSeventh IEEE International Conference on Data Mining (ICDM 2007)
J. Huang, M. Ng, H. Rong, Zichen Li (2005)
Automated variable weighting in k-means type clusteringIEEE Transactions on Pattern Analysis and Machine Intelligence, 27
David Johnson (1982)
The NP-Completeness Column: An Ongoing GuideJ. Algorithms, 4
Elke Achtert, C. Böhm, J. David, Peer Kröger, A. Zimek (2008)
Robust Clustering in Arbitrarily Oriented Subspaces
Kevin Yip, D. Cheung, M. Ng (2005)
On discovery of extremely low-dimensional clusters using semi-supervised projected clustering21st International Conference on Data Engineering (ICDE'05)
Lance Parsons, E. Haque, Huan Liu (2004)
Subspace clustering for high dimensional data: a reviewSIGKDD Explor., 6
Heng Shen (2009)
Principal Component Analysis
R. Duda, P. Hart (1972)
Use of the Hough transformation to detect lines and curves in picturesCommun. ACM, 15
Gabriela Moise, Jörg Sander, M. Ester (2008)
Robust projected clusteringKnowledge and Information Systems, 14
Chet Langin (2019)
Introduction to Data MiningScalable Comput. Pract. Exp., 9
Man Yiu, N. Mamoulis (2005)
Iterative projected clustering by subspace miningIEEE Transactions on Knowledge and Data Engineering, 17
Daniel Barbará, Ping Chen (2000)
Using the fractal dimension to cluster datasets
A. Gionis, Alexander Hinneburg, S. Papadimitriou, Panayiotis Tsaparas (2005)
Dimension induced clustering
R. Haralick, R. Harpaz (2008)
Model-based linear manifold clustering
S. Madeira, Arlindo Oliveira
Ieee/acm Transactions on Computational Biology and Bioinformatics 1 Biclustering Algorithms for Biological Data Analysis: a Survey
(2008)
Received May
Elke Achtert, C. Böhm, H. Kriegel, Peer Kröger, A. Zimek (2007)
On Exploring Complex Relationships of Correlation Clusters19th International Conference on Scientific and Statistical Database Management (SSDBM 2007)
(2009)
Article 1, Publication date
C. Aggarwal, Alexander Hinneburg, D. Keim (2001)
On the Surprising Behavior of Distance Metrics in High Dimensional Spaces
K. Sequeira, Mohammed Zaki (2004)
SCHISM: a new approach for interesting subspace miningFourth IEEE International Conference on Data Mining (ICDM'04)
A. Tanay, R. Sharan, R. Shamir (2007)
Biclustering Algorithms: A Survey
B. Ganter, Rudolf Wille, C. Franzke (1998)
Formal Concept Analysis: Mathematical Foundations
H. Nagesh, Sanjay Goil, A. Choudhary (2001)
Adaptive Grids for Clustering Massive Data Sets
R. Agrawal, R. Srikant (1998)
Fast Algorithms for Mining Association Rules
H. Kriegel, Peer Kröger, M. Renz, S. Wurst (2005)
A generic framework for efficient subspace clustering of high-dimensional dataFifth IEEE International Conference on Data Mining (ICDM'05)
A. Tanay, R. Sharan, R. Shamir (2002)
Discovering statistically significant biclusters in gene expression dataBioinformatics, 18 Suppl 1
Norio Katayama, S. Satoh (1997)
The SR-tree: an index structure for high-dimensional nearest neighbor queries
Anil Jain, M. Murty, P. Flynn (1999)
Data clustering: a reviewACM Comput. Surv., 31
(1962)
Methods and means for recognizing complex patterns. U.S. patent 3069654
C. Cheng, A. Fu, Yi Zhang (1999)
Entropy-based subspace clustering for mining numerical data
Elke Achtert, Christian Böhm, H. Kriegel, Peer Kröger, Ina Müller-Gorman, A. Zimek (2007)
Detection and Visualization of Subspace Cluster Hierarchies
Man Yiu, N. Mamoulis (2003)
Frequent-pattern based iterative projected clusteringThird IEEE International Conference on Data Mining
As a prolific research area in data mining, subspace clustering and related problems induced a vast quantity of proposed solutions. However, many publications compare a new proposition—if at all—with one or two competitors, or even with a so-called “naïve” ad hoc solution, but fail to clarify the exact problem definition. As a consequence, even if two solutions are thoroughly compared experimentally, it will often remain unclear whether both solutions tackle the same problem or, if they do, whether they agree in certain tacit assumptions and how such assumptions may influence the outcome of an algorithm. In this survey, we try to clarify: (i) the different problem definitions related to subspace clustering in general; (ii) the specific difficulties encountered in this field of research; (iii) the varying assumptions, heuristics, and intuitions forming the basis of different approaches; and (iv) how several prominent solutions tackle different problems.
ACM Transactions on Knowledge Discovery from Data (TKDD) – Association for Computing Machinery
Published: Mar 1, 2009
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get an introductory month for just $19.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.