Access the full text.
Sign up today, get DeepDyve free for 14 days.
M Newman (2004)
Detecting community structure in networksEur Phys J B, 38
B. Hendrickson, R. Leland (1993)
The Chaco user`s guide. Version 1.0
Hui Han, C. Giles, H. Zha, Cheng Li, Kostas Tsioutsiouliklis (2004)
Two supervised learning approaches for name disambiguation in author citationsProceedings of the 2004 Joint ACM/IEEE Conference on Digital Libraries, 2004.
J. Kogan, Charles Nicholas, M. Teboulle (2006)
Grouping Multidimensional Data - Recent Advances in Clustering
Einat Minkov, William Cohen, A. Ng (2006)
Contextual search and name disambiguation in email using graphsProceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
J Han, M Kamber, A Tung (2001)
Spatial clustering methods in data mining: a survey, geographic data mining and knowledge discovery
C. Fiduccia, R. Mattheyses (1982)
A Linear-Time Heuristic for Improving Network Partitions19th Design Automation Conference
D Zeimpekis, E Gallopoulos (2006)
TMG: A MATLAB toolbox for generating term document matrices from text collections. Grouping multidimensional data: recent advances in clustering
R. Bekkerman, A. McCallum (2005)
Disambiguating Web appearances of people in a social network
Dongwon Lee, Byung-Won On, Jaewoo Kang, Sanghyun Park (2005)
Effective and scalable solutions for mixed and split citation problems in digital libraries
A. Dunlop, B. Kernighan (1985)
A Procedure for Placement of Standard-Cell VLSI CircuitsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 4
A Pothen, H Simon, K Liou (1990)
Partitioning sparse sparse matrices with eigenvectors of graphsSIAM J Mat Anal Appl, 11
Jiawei Han, M. Kamber, A. Tung (2001)
Spatial clustering methods in data mining : A survey
David Cheng, S. Vempala, R. Kannan, Grant Wang (2005)
A divide-and-merge methodology for clustering
I. Dhillon, Yuqiang Guan, B. Kulis (2007)
Weighted Graph Cuts without Eigenvectors A Multilevel ApproachIEEE Transactions on Pattern Analysis and Machine Intelligence, 29
J. MacQueen (1967)
Some methods for classification and analysis of multivariate observations, 1
Ergin Elmacioglu, Yee Tan, Su Yan, Min-Yen Kan, Dongwon Lee (2007)
PSNUS: Web People Name Disambiguation by Simple Clustering with Rich Features
Hui Han, H. Zha, C. Giles (2005)
Name disambiguation in author citations using a K-way spectral clustering methodProceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05)
G. Golub, C. Loan (1996)
Matrix computations (3rd ed.)
D. Zeimpekis, Efstratios Gallopoulos (2006)
TMG: A MATLAB Toolbox for Generating Term-Document Matrices from Text Collections
M. Fiedler (1973)
Algebraic connectivity of graphsCzechoslovak Mathematical Journal, 23
Anil Jain (2008)
Data clustering: 50 years beyond K-meansPattern Recognit. Lett., 31
B. Malin (2005)
Unsupervised Name Disambiguation via Social Network Similarity
K. Liou, A. Pothen (1990)
PARTITIONING SPARSE MATRICES WITH EIGENVECTORS OF GRAPHS*Ibm Journal of Research and Development
Michael Heath, Eric Munson (1996)
Scientific Computing: An Introductory Survey
G. Karypis, K. Schloegel, Vipin Kumar (1997)
Parmetis parallel graph partitioning and sparse matrix ordering library
(2003)
The European Physical Journal B
G. Karypis, Vipin Kumar (1998)
A Parallel Algorithm for Multilevel Graph Partitioning and Sparse Matrix OrderingJ. Parallel Distributed Comput., 48
William Cohen, Pradeep Ravikumar, S. Fienberg (2003)
A Comparison of String Distance Metrics for Name-Matching Tasks
A. Banerjee, Sugato Basu, S. Merugu (2007)
Multi-way Clustering on Relation Graphs
To cluster web documents, all of which have the same name entities, we attempted to use existing clustering algorithms such as K-means and spectral clustering. Unexpectedly, it turned out that these algorithms are not effective to cluster web documents. According to our intensive investigation, we found that clustering such web pages is more complicated because (1) the number of clusters (known as ground truth) is larger than two or three clusters as in general clustering problems and (2) clusters in the data set have extremely skewed distributions of cluster sizes. To overcome the aforementioned problem, in this paper, we propose an effective clustering algorithm to boost up the accuracy of K-means and spectral clustering algorithms. In particular, to deal with skewed distributions of cluster sizes, our algorithm performs both bisection and merge steps based on normalized cuts of the similarity graph G to correctly cluster web documents. Our experimental results show that our algorithm improves the performance by approximately 56% compared to spectral bisection and 36% compared to K-means.
Artificial Intelligence Review – Springer Journals
Published: Jan 18, 2011
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.