Access the full text.
Sign up today, get DeepDyve free for 14 days.
F. Damerau (1965)
An experiment in automatic indexingAmerican Documentation, 16
G. Furnas, T. Landauer, L. Gómez, S. Dumais (1984)
Statistical semantics: analysis of the potential performance of keyword information systems
W. Cooper (1968)
Expected search length: A single measure of retrieval effectiveness based on the weak ordering action of retrieval systemsAmerican Documentation, 19
D. Lindberg, B. Humphreys, A. McCray (1993)
The Unified Medical Language SystemMethods of Information in Medicine, 32
B. Humphreys, D. Lindberg, H. Schoolman, G. Barnett (1998)
Technical Milestone: The Unified Medical Language System: An Informatics Research CollaborationJournal of the American Medical Informatics Association : JAMIA, 5 1
L. Gomez, C. Lochbaum, T. Landauer (1990)
All the Right Words: Finding What You Want as a Function of Richness of Indexing Vocabulary.Journal of the Association for Information Science and Technology, 41
W. Croft, Jing Yufeng (1994)
An Association Thesaurus for Information Retrieval
M. Bates (1986)
Subject access in online catalogs: A design modelJ. Am. Soc. Inf. Sci., 37
M. Costanza, H. Larson (1983)
Introduction to Probability Theory and Statistical Inference. (3rd ed.)Journal of the American Statistical Association, 78
M. Bates (1989)
Rethinking Subject Cataloging in the Online Environment.Library Resources & Technical Services, 33
S. Harter (1975)
A probabilistic approach to automatic keyword indexing. Part I. On the Distribution of Specialty Words in a Technical LiteratureJ. Am. Soc. Inf. Sci., 26
Eric Brill (1995)
Unsupervised Learning of Disambiguation Rules for Part of Speech Tagging
T. Strzalkowski, Barbara Vauthey (1992)
Information Retrieval Using Robust Natural Language Processing
H. Luhn (1953)
A new method of recording and searching informationAmerican Documentation, 4
D. Cutting, J. Kupiec, Jan Pedersen, Penelope Sibun (1992)
A Practical Part-of-Speech Tagger
A. Bookstein, S. Klein, T. Raita (1995)
Detecting Content-Bearing Words by Serial Clustering.
G. Furnas, T. Landauer, Louis Gomez, S. Dumais (1983)
Human factors and behavioral science: Statistical semantics: Analysis of the potential performance of key-word information systemsThe Bell System Technical Journal, 62
Don Stone, M. Rubinoff (1968)
Statistical generation of a technical vocabularyAmerican Documentation, 19
D. Lewis, Karen Jones (1996)
Natural language processing for information retrievalCommun. ACM, 39
D. Lewis, W. Croft (1989)
Term clustering of syntactic phrases
A. McCray, S. Srinivasan, Allen Browne (1994)
Lexical methods for managing variation in biomedical terminologies.Proceedings. Symposium on Computer Applications in Medical Care
D. Harman (1993)
The First Text REtrieval Conference (TREC-1)
S. Finch (1995)
Partial orders for document representation: a new methodology for combining document features
Eric Brill (1993)
Automatic Grammar Induction and Parsing Free Text: A Transformation-Based Approach
G. Cooper, R. Miller (1998)
Research Paper: An Experiment Comparing Lexical and Statistical Methods for Extracting MeSH Terms from Clinical Free TextJournal of the American Medical Informatics Association : JAMIA, 5 1
W. Wilbur (1992)
An information measure of retrieval performanceInf. Syst., 17
Atro Voutilainen (1995)
NPtool, a Detector of English Noun PhrasesArXiv, cmp-lg/9502010
UMLS-based Conceptual Queries to Biomedical Information Databases:
S. Harter (1974)
A probabilistic approach to automatic keyword indexingJournal of the Association for Information Science and Technology
G. Salton (1992)
The State of Retrieval System EvaluationInf. Process. Manag., 28
H. Larson (1970)
Introduction to Probability Theory and Statistical Inference
D. Lewis (1992)
An evaluation of phrasal and clustered representations on a text categorization task
Leslie Jones, Edward Gassie, S. Radhakrishnan (1990)
INDEX: The statistical basis for an automatic conceptual phrase-indexing systemJ. Am. Soc. Inf. Sci., 41
M. Bates (1998)
Indexing and Access for Digital Libraries and the Internet: Human, Database, and Domain FactorsJ. Am. Soc. Inf. Sci., 49
James Allen (2016)
Natural Language UnderstandingArtificial Intelligence
T. Strzalkowski (1994)
Document indexing and retrieval using natural language processing
Rosalie Steier (1985)
An evaluation of retrieval effectiveness for a full-text document-retrieval systemCommun. ACM, 28
David Evans, Robert Lefferts, G. Grefenstette, Steve Handerson, W. Hersh, Armar Archbold (1992)
CLARIT TREC Design, Experiments, and Results
G. Salton, Anita Wong, Clement Yu (1976)
Automatic indexing using term discrimination and term precision measurementsInf. Process. Manag., 12
Nuala Bennett, Qin He, Kevin Powell, B. Schatz (1999)
Extracting noun phrases for all of MEDLINEProceedings. AMIA Symposium
A. Bookstein, S. Klein, T. Raita (1998)
Clumping Properties of Content-Bearing WordsJ. Am. Soc. Inf. Sci., 49
M. Funk, C. Reid (1983)
Indexing consistency in MEDLINE.Bulletin of the Medical Library Association, 71 2
I. Witten (1994)
Managing gigabytes
M. Joubert, M. Fieschi, J. Robert, F. Volot, D. Fieschi (1998)
Model Formulation: UMLS-based Conceptual Queries to Biomedical Information Databases: An Overview of the Project ARIANEJ. Am. Medical Informatics Assoc., 5
H. Luhn (1958)
The Automatic Creation of Literature AbstractsIBM J. Res. Dev., 2
J. Fagan (1989)
The effectiveness of a nonsyntatic approach to automatic phrase indexing for document retrievalJournal of the Association for Information Science and Technology, 40
William Hersh, Emily Campbell, David Evans, Nicholas Brownlow (1996)
Empirical, automated vocabulary discovery using large text corpora and advanced natural language processing tools.Proceedings : a conference of the American Medical Informatics Association. AMIA Fall Symposium
AbstractPurpose: The authors study the extraction of useful phrases from a natural language database by statistical methods. The aim is to leverage human effort by providing preprocessed phrase lists with a high percentage of useful material.Method: The approach is to develop six different scoring methods that are based on different aspects of phrase occurrence. The emphasis here is not on lexical information or syntactic structure but rather on the statistical properties of word pairs and triples that can be obtained from a large database.Measurements: The Unified Medical Language System (UMLS) incorporates a large list of humanly acceptable phrases in the medical field as a part of its structure. The authors use this list of phrases as a gold standard for validating their methods. A good method is one that ranks the UMLS phrases high among all phrases studied. Measurements are 11-point average precision values and precision-recall curves based on the rankings.Result: The authors find of six different scoring methods that each proves effective in identifying UMLS quality phrases in a large subset of MEDLINE. These methods are applicable both to word pairs and word triples. All six methods are optimally combined to produce composite scoring methods that are more effective than any single method. The quality of the composite methods appears sufficient to support the automatic placement of hyperlinks in text at the site of highly ranked phrases.Conclusion: Statistical scoring methods provide a promising approach to the extraction of useful phrases from a natural language database for the purpose of indexing or providing hyperlinks in text.
Journal of the American Medical Informatics Association – Oxford University Press
Published: Sep 1, 2000
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.