Access the full text.
Sign up today, get DeepDyve free for 14 days.
Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova (2019)
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Martin Braschler (2003)
CLEF 2003 - Overview of Results
Sean MacAvaney, F. Nardini, R. Perego, N. Tonellotto, Nazli Goharian, O. Frieder (2020)
Efficient Document Re-Ranking for Transformers by Precomputing Term RepresentationsProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval
A. Arampatzis, T. Weide, C. Koster, P. Bommel (2000)
An Evaluation of Linguistically-motivated Indexing Schemes
B. Daille (1994)
Study and Implementation of Combined Techniques for Automatic Extraction of Terminology
R.W.Y. Nogueira (2019)
Multi-stage document ranking with BERT
Frank Smadja, K. McKeown, V. Hatzivassiloglou (1996)
Translating Collocations for Bilingual Lexicons: A Statistical ApproachComput. Linguistics, 22
J. Lafferty, D. Sleator, D. Temperley (1992)
Grammatical Trigrams: A Probabilistic Model of Link Grammar
Rodrigo Nogueira, Zhiying Jiang, Ronak Pradeep, Jimmy Lin (2020)
Document Ranking with a Pretrained Sequence-to-Sequence Model
R. Pradeep, R. Nogueira, J. Lin (2021)
The expando-mono-duo design pattern for text ranking with pretrained sequence-to-sequence models
Isabel Moreno-Sánchez, Francesc Font-Clos, Á. Corral (2015)
Large-Scale Analysis of Zipf’s Law in English TextsPLoS ONE, 11
S. Clinchant, F. Perronnin (2013)
Aggregating Continuous Word Embeddings for Information Retrieval
Didier Bourigault (1992)
Surface Grammatical Analysis for the Extraction of Terminological Noun Phrases
Michael Bendersky, W. Croft (2008)
Discovering key concepts in verbose queries
C. Macdonald, N. Tonellotto, Sean MacAvaney (2021)
IR From Bag-of-words to BERT and Beyond through Practical ExperimentsProceedings of the 30th ACM International Conference on Information & Knowledge Management
K. Maxwell, W. Croft (2013)
Compact query term selection using topically related textProceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
S. Boulaknadel, B. Daille, D. Aboutajdine (2008)
Multi-word term indexing for Arabic document retrieval2008 IEEE Symposium on Computers and Communications
Munirathnam Srikanth, R. Srihari (2003)
Exploiting syntactic structure of queries in a language modeling approach to IR
Michael Bendersky, Donald Metzler, W. Croft (2010)
Learning concept importance using a weighted dependence model
Yves Rasolofo, J. Savoy (2003)
Term Proximity Scoring for Keyword-Based Retrieval Systems
A. Arampatzis, T. Tsoris, C. Koster, T. Weide (1998)
Phase-Based Information RetrievalInf. Process. Manag., 34
Traitement Automatique Des Langues, ATALA, 55
S. Robertson, S. Walker (1994)
Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval
Wei Zhang, Shuang Liu, Clement Yu, Chaojing Sun, Fang Liu, W. Meng (2007)
Recognition and classification of noun phrases in queries for effective retrieval
S. Petrovic, J. Šnajder, B. Basic (2010)
Extending lexical association measures for collocation extractionComput. Speech Lang., 24
A. Kilgariff (1992)
Polysemy
I. Ounis, G. Amati, Vassilis Plachouras, Ben He, C. Macdonald, Douglas Johnson (2005)
Terrier Information Retrieval Platform
C. Macdonald, Vassilis Plachouras, Ben He, C. Lioma, I. Ounis (2005)
University of Glasgow at WebCLEF 2005: Experiments in per-field Normalisation and Language Specific Stemming
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter Liu (2019)
Exploring the Limits of Transfer Learning with a Unified Text-to-Text TransformerJ. Mach. Learn. Res., 21
O. Khattab, M. Zaharia (2020)
ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERTProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval
Sam Henry, Clint Cuffy, Bridget McInnes (2018)
Vector representations of multi-word terms for semantic relatednessJournal of biomedical informatics, 77
E. Riloff (1993)
Automatically Constructing a Dictionary for Information Extraction Tasks
K. Hasan, Vincent Ng (2014)
Automatic Keyphrase Extraction: A Survey of the State of the Art
Hatem Haddad (2003)
French Noun Phrase Indexing and Mining for an Information Retrieval System
Jose Carballo, T. Strzalkowski (2000)
Natural language information retrieval: progress reportInf. Process. Manag., 36
Jie Peng, C. Macdonald, Ben He, Vassilis Plachouras, I. Ounis (2007)
Incorporating term dependency in the dfr framework
Sean MacAvaney, Andrew Yates, Arman Cohan, Nazli Goharian (2019)
CEDR: Contextualized Embeddings for Document RankingProceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval
Donald Metzler, W. Croft (2005)
A Markov random field model for term dependencies
Pavel Pecina (2009)
Lexical association measures and collocation extractionLanguage Resources and Evaluation, 44
Chengxiang Zhai, J. Lafferty (2004)
A study of smoothing methods for language models applied to information retrievalACM Trans. Inf. Syst., 22
Christopher Khoo (1997)
The use of relation matching in Information Retrieval
Thuy Vu, AiTi Aw, Min Zhang (2008)
Term Extraction Through Unithood and Termhood Unification
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan Gomez, Lukasz Kaiser, Illia Polosukhin (2017)
Attention is All you Need
Patrick Paroubek, Pierre Zweigenbaum, Dominic Forest, Cyril Grouin (2012)
Indexation libre et contrôlée d’articles scientifiques. Présentation et résultats du défi fouille de textes DEFT2012 (Controlled and free indexing of scientific papers. Presentation and results of the DEFT2012 text-mining challenge) [in French]
C. Macdonald, N. Tonellotto, Sean MacAvaney, I. Ounis (2021)
PyTerrier: Declarative Experimentation in Python from BM25 to Dense RetrievalProceedings of the 30th ACM International Conference on Information & Knowledge Management
Jianfeng Gao, Jian-Yun Nie, Guangyuan Wu, Guihong Cao (2004)
Dependence language model for information retrieval
Kenneth Church, Patrick Hanks (1989)
Word Association Norms, Mutual Information, and Lexicography
A. Hazem, E. Morin (2014)
Extraction de lexiques bilingues à partir de corpus comparables spécialisés : étude du contexte lexical [Bilingual lexicon extraction from specialized comparable corpora: a study of lexical context]Trait. Autom. des Langues, 55
Aristomenis Thanopoulos, N. Fakotakis, G. Kokkinakis (2002)
Comparative Evaluation of Collocation Extraction Metrics
A number of approaches and algorithms have been proposed over the years as a basis for automatic indexing. Many of these approaches suffer from precision inefficiency at low recall. The choice of indexing units has a great impact on search system effectiveness. The authors dive beyond simple terms indexing to propose a framework for multi-word terms (MWT) filtering and indexing.Design/methodology/approachIn this paper, the authors rely on ranking MWT to filter them, keeping the most effective ones for the indexing process. The proposed model is based on filtering MWT according to their ability to capture the document topic and distinguish between different documents from the same collection. The authors rely on the hypothesis that the best MWT are those that achieve the greatest association degree. The experiments are carried out with English and French languages data sets.FindingsThe results indicate that this approach achieved precision enhancements at low recall, and it performed better than more advanced models based on terms dependencies.Originality/valueUsing and testing different association measures to select MWT that best describe the documents to enhance the precision in the first retrieved documents.
Information Discovery and Delivery – Emerald Publishing
Published: Jan 6, 2023
Keywords: Performance measurement; Statistics; Information systems; Information retrieval; Information science; Collection management; Indexing; Multi-word terms; Association measure; Precision
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.