Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

A theory of subtree matching and tree kernels based on the edit distance concept

A theory of subtree matching and tree kernels based on the edit distance concept Edit distances provide us with an established method to capture structural features of data, and a distance between data objects represents their dissimilarity. In contrast, kernels form a category of similarity functions, and a positive definite kernel enables us to leverage abundant techniques of multivariate analysis. This paper aims to fill the gap between distances and kernels. In the literature, we have several formulas that convert a negative definite distance function into a positive definite kernel. Edit distance functions, however, are not necessarily negative definite, and our first contribution is to introduce an alternative method to derive positive definite kernels from edit distance functions that are not necessarily negative definite. The method is equipped with an easy-to-check and strong sufficient condition for positive definiteness, and the condition turns out to be tightly related with the triangle inequality. In fact, to our knowledge, all of the edit distance functions in the literature that support the triangle inequality meet the condition for positive definiteness. Secondly, we apply this method to four well-known edit distance functions for trees to introduce four novel kernels and show that three of them are positive definite. Thirdly, we develop a theory of subtree matching to study these kernels. Our kernels count matchings between subtrees of the input trees with weights determined according to individual matchings. Although the number of such matchings is an exponential function of the size of the input trees (the number of vertices), our theory enables us to develop dynamic-programming-based algorithms, whose asymptotic computational complexities fall between a quadratic function and a cubic function of the size. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Annals of Mathematics and Artificial Intelligence Springer Journals

A theory of subtree matching and tree kernels based on the edit distance concept

Loading next page...
 
/lp/springer-journals/a-theory-of-subtree-matching-and-tree-kernels-based-on-the-edit-Y70KPtLyRn

References (50)

Publisher
Springer Journals
Copyright
Copyright © 2015 by Springer International Publishing Switzerland
Subject
Computer Science; Artificial Intelligence (incl. Robotics); Mathematics, general; Computer Science, general; Statistical Physics, Dynamical Systems and Complexity
ISSN
1012-2443
eISSN
1573-7470
DOI
10.1007/s10472-015-9467-5
Publisher site
See Article on Publisher Site

Abstract

Edit distances provide us with an established method to capture structural features of data, and a distance between data objects represents their dissimilarity. In contrast, kernels form a category of similarity functions, and a positive definite kernel enables us to leverage abundant techniques of multivariate analysis. This paper aims to fill the gap between distances and kernels. In the literature, we have several formulas that convert a negative definite distance function into a positive definite kernel. Edit distance functions, however, are not necessarily negative definite, and our first contribution is to introduce an alternative method to derive positive definite kernels from edit distance functions that are not necessarily negative definite. The method is equipped with an easy-to-check and strong sufficient condition for positive definiteness, and the condition turns out to be tightly related with the triangle inequality. In fact, to our knowledge, all of the edit distance functions in the literature that support the triangle inequality meet the condition for positive definiteness. Secondly, we apply this method to four well-known edit distance functions for trees to introduce four novel kernels and show that three of them are positive definite. Thirdly, we develop a theory of subtree matching to study these kernels. Our kernels count matchings between subtrees of the input trees with weights determined according to individual matchings. Although the number of such matchings is an exponential function of the size of the input trees (the number of vertices), our theory enables us to develop dynamic-programming-based algorithms, whose asymptotic computational complexities fall between a quadratic function and a cubic function of the size.

Journal

Annals of Mathematics and Artificial IntelligenceSpringer Journals

Published: Jul 19, 2015

There are no references for this article.