Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Multi-word terms selection for information retrieval

Multi-word terms selection for information retrieval A number of approaches and algorithms have been proposed over the years as a basis for automatic indexing. Many of these approaches suffer from precision inefficiency at low recall. The choice of indexing units has a great impact on search system effectiveness. The authors dive beyond simple terms indexing to propose a framework for multi-word terms (MWT) filtering and indexing.Design/methodology/approachIn this paper, the authors rely on ranking MWT to filter them, keeping the most effective ones for the indexing process. The proposed model is based on filtering MWT according to their ability to capture the document topic and distinguish between different documents from the same collection. The authors rely on the hypothesis that the best MWT are those that achieve the greatest association degree. The experiments are carried out with English and French languages data sets.FindingsThe results indicate that this approach achieved precision enhancements at low recall, and it performed better than more advanced models based on terms dependencies.Originality/valueUsing and testing different association measures to select MWT that best describe the documents to enhance the precision in the first retrieved documents. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Information Discovery and Delivery Emerald Publishing

Multi-word terms selection for information retrieval

Loading next page...
 
/lp/emerald-publishing/multi-word-terms-selection-for-information-retrieval-Uu8rmNHicw

References (50)

Publisher
Emerald Publishing
Copyright
© Emerald Publishing Limited
ISSN
2398-6247
eISSN
2398-6247
DOI
10.1108/idd-12-2021-0142
Publisher site
See Article on Publisher Site

Abstract

A number of approaches and algorithms have been proposed over the years as a basis for automatic indexing. Many of these approaches suffer from precision inefficiency at low recall. The choice of indexing units has a great impact on search system effectiveness. The authors dive beyond simple terms indexing to propose a framework for multi-word terms (MWT) filtering and indexing.Design/methodology/approachIn this paper, the authors rely on ranking MWT to filter them, keeping the most effective ones for the indexing process. The proposed model is based on filtering MWT according to their ability to capture the document topic and distinguish between different documents from the same collection. The authors rely on the hypothesis that the best MWT are those that achieve the greatest association degree. The experiments are carried out with English and French languages data sets.FindingsThe results indicate that this approach achieved precision enhancements at low recall, and it performed better than more advanced models based on terms dependencies.Originality/valueUsing and testing different association measures to select MWT that best describe the documents to enhance the precision in the first retrieved documents.

Journal

Information Discovery and DeliveryEmerald Publishing

Published: Jan 6, 2023

Keywords: Performance measurement; Statistics; Information systems; Information retrieval; Information science; Collection management; Indexing; Multi-word terms; Association measure; Precision

There are no references for this article.