Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Extracting indices from Japanese legal documents

Extracting indices from Japanese legal documents This article addresses the problem of automatically extracting legal indices which express the important contents of legal documents. Legal indices are not limited to single-word keywords and compound-word (or phrase) keywords, they are also clause keywords. We approach index extraction using structural information of Japanese sentences, i.e. chunks and clauses. Based on the assumption that legal indices are composed of important tokens from the documents, extracting legal indices is treated as a problem of collecting chunks and clauses that contain as many important tokens as possible. Each token is assigned a weight which is a statistical score, e.g. TF–IDF and Okapi BM25, to indicate its importance. The importance of a chunk or clause is determined based on the average weight of tokens included in that chunk or clause. Then, highly weighted chunks and clauses are recognized as the indices for legal documents. The experimental results on Japanese National Pension Act data show that our proposed method achieves better performance (8.6 % higher on F1-score) than TextRank, the most popular unsupervised method in extracting single-word and compound-word keywords. In addition, this approach is also applicable to extract clause keywords with high performance. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Artificial Intelligence and Law Springer Journals

Extracting indices from Japanese legal documents

Loading next page...
 
/lp/springer-journals/extracting-indices-from-japanese-legal-documents-goWJ0lFZv5
Publisher
Springer Journals
Copyright
Copyright © 2015 by Springer Science+Business Media Dordrecht
Subject
Computer Science; Artificial Intelligence (incl. Robotics); International IT and Media Law, Intellectual Property Law; Philosophy of Law; Legal Aspects of Computing; Information Storage and Retrieval
ISSN
0924-8463
eISSN
1572-8382
DOI
10.1007/s10506-015-9168-8
Publisher site
See Article on Publisher Site

Abstract

This article addresses the problem of automatically extracting legal indices which express the important contents of legal documents. Legal indices are not limited to single-word keywords and compound-word (or phrase) keywords, they are also clause keywords. We approach index extraction using structural information of Japanese sentences, i.e. chunks and clauses. Based on the assumption that legal indices are composed of important tokens from the documents, extracting legal indices is treated as a problem of collecting chunks and clauses that contain as many important tokens as possible. Each token is assigned a weight which is a statistical score, e.g. TF–IDF and Okapi BM25, to indicate its importance. The importance of a chunk or clause is determined based on the average weight of tokens included in that chunk or clause. Then, highly weighted chunks and clauses are recognized as the indices for legal documents. The experimental results on Japanese National Pension Act data show that our proposed method achieves better performance (8.6 % higher on F1-score) than TextRank, the most popular unsupervised method in extracting single-word and compound-word keywords. In addition, this approach is also applicable to extract clause keywords with high performance.

Journal

Artificial Intelligence and LawSpringer Journals

Published: Sep 8, 2015

References