Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

MDL4BMF: Minimum Description Length for Boolean Matrix Factorization

MDL4BMF: Minimum Description Length for Boolean Matrix Factorization MDL4BMF: Minimum Description Length for Boolean Matrix Factorization PAULI MIETTINEN, Max-Planck Institute for Informatics JILLES VREEKEN, Max-Planck Institute for Informatics, Saarland University, University of Antwerp Matrix factorizations--where a given data matrix is approximated by a product of two or more factor matrices--are powerful data mining tools. Among other tasks, matrix factorizations are often used to separate global structure from noise. This, however, requires solving the "model order selection problem" of determining the proper rank of the factorization, that is, to answer where fine-grained structure stops, and where noise starts. Boolean Matrix Factorization (BMF)--where data, factors, and matrix product are Boolean--has in recent years received increased attention from the data mining community. The technique has desirable properties, such as high interpretability and natural sparsity. Yet, so far no method for selecting the correct model order for BMF has been available. In this article, we propose the use of the Minimum Description Length (MDL) principle for this task. Besides solving the problem, this well-founded approach has numerous benefits; for example, it is automatic, does not require a likelihood function, is fast, and, as experiments show, is highly accurate. We formulate the description length function for BMF in general--making it applicable http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png ACM Transactions on Knowledge Discovery from Data (TKDD) Association for Computing Machinery

MDL4BMF: Minimum Description Length for Boolean Matrix Factorization

Loading next page...
 
/lp/association-for-computing-machinery/mdl4bmf-minimum-description-length-for-boolean-matrix-factorization-5Z98Eyd2Bd
Publisher
Association for Computing Machinery
Copyright
Copyright © 2014 by ACM Inc.
ISSN
1556-4681
DOI
10.1145/2601437
Publisher site
See Article on Publisher Site

Abstract

MDL4BMF: Minimum Description Length for Boolean Matrix Factorization PAULI MIETTINEN, Max-Planck Institute for Informatics JILLES VREEKEN, Max-Planck Institute for Informatics, Saarland University, University of Antwerp Matrix factorizations--where a given data matrix is approximated by a product of two or more factor matrices--are powerful data mining tools. Among other tasks, matrix factorizations are often used to separate global structure from noise. This, however, requires solving the "model order selection problem" of determining the proper rank of the factorization, that is, to answer where fine-grained structure stops, and where noise starts. Boolean Matrix Factorization (BMF)--where data, factors, and matrix product are Boolean--has in recent years received increased attention from the data mining community. The technique has desirable properties, such as high interpretability and natural sparsity. Yet, so far no method for selecting the correct model order for BMF has been available. In this article, we propose the use of the Minimum Description Length (MDL) principle for this task. Besides solving the problem, this well-founded approach has numerous benefits; for example, it is automatic, does not require a likelihood function, is fast, and, as experiments show, is highly accurate. We formulate the description length function for BMF in general--making it applicable

Journal

ACM Transactions on Knowledge Discovery from Data (TKDD)Association for Computing Machinery

Published: Oct 7, 2014

There are no references for this article.