Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Network‐based semisupervised clustering

Network‐based semisupervised clustering Semisupervised clustering extends standard clustering methods to the semisupervised setting, in some cases considering situations when clusters are associated with a given outcome variable that acts as a “noisy surrogate,” that is a good proxy of the unknown clustering structure. In this article, a novel approach to semisupervised clustering associated with an outcome variable named network‐based semisupervised clustering (NeSSC) is introduced. It combines an initialization, a training and an agglomeration phase. In the initialization and training a matrix of pairwise affinity of the instances is estimated by a classifier. In the agglomeration phase the matrix of pairwise affinity is transformed into a complex network, in which a community detection algorithm searches the underlying community structure. Thus, a partition of the instances into clusters highly homogeneous in terms of the outcome is obtained. We consider a particular specification of NeSSC that uses classification or regression trees as classifiers and the Louvain, Label propagation and Walktrap as possible community detection algorithm. NeSSC's stopping criterion and the choice of the optimal partition of the original data are also discussed. Several applications on both real and simulated data are presented to demonstrate the effectiveness of the proposed semisupervised clustering method and the benefits it provides in terms of improved interpretability of results with respect to three alternative semisupervised clustering methods. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Applied Stochastic Models in Business and Industry Wiley

Loading next page...
 
/lp/wiley/network-based-semisupervised-clustering-eCVsKS3lNf

References (60)

Publisher
Wiley
Copyright
© 2021 John Wiley & Sons, Ltd.
ISSN
1524-1904
eISSN
1526-4025
DOI
10.1002/asmb.2618
Publisher site
See Article on Publisher Site

Abstract

Semisupervised clustering extends standard clustering methods to the semisupervised setting, in some cases considering situations when clusters are associated with a given outcome variable that acts as a “noisy surrogate,” that is a good proxy of the unknown clustering structure. In this article, a novel approach to semisupervised clustering associated with an outcome variable named network‐based semisupervised clustering (NeSSC) is introduced. It combines an initialization, a training and an agglomeration phase. In the initialization and training a matrix of pairwise affinity of the instances is estimated by a classifier. In the agglomeration phase the matrix of pairwise affinity is transformed into a complex network, in which a community detection algorithm searches the underlying community structure. Thus, a partition of the instances into clusters highly homogeneous in terms of the outcome is obtained. We consider a particular specification of NeSSC that uses classification or regression trees as classifiers and the Louvain, Label propagation and Walktrap as possible community detection algorithm. NeSSC's stopping criterion and the choice of the optimal partition of the original data are also discussed. Several applications on both real and simulated data are presented to demonstrate the effectiveness of the proposed semisupervised clustering method and the benefits it provides in terms of improved interpretability of results with respect to three alternative semisupervised clustering methods.

Journal

Applied Stochastic Models in Business and IndustryWiley

Published: Mar 1, 2021

Keywords: ; ; ; ; ;

There are no references for this article.