Access the full text.
Sign up today, get DeepDyve free for 14 days.
Miranda Chong, Lucia Specia (2011)
Lexical Generalisation for Word-level Matching in Plagiarism Detection
A. Stolcke (2002)
SRILM - an extensible language modeling toolkit
A. Broder (1997)
On the resemblance and containment of documentsProceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171)
R. Nawab, Mark Stevenson, Paul Clough (2012)
Retrieving Candidate Plagiarised Documents Using Query Expansion
Mounir Errami, Zhaohui Sun, Angela George, Tara Long, M. Skinner, J. Wren, H. Garner (2010)
Identifying duplicate content using statistically improbable phrasesBioinformatics, 26
Paul Clough, R. Gaizauskas, S. Piao, Y. Wilks (2002)
Measuring Text Reuse
(2009)
The Uni fi ed Medical Language System ( UMLS ) project
Mounir Errami, J. Wren, Justin Hicks, H. Garner (2007)
eTBLAST: a web server to identify expert reviewers, appropriate journals and similar publicationsNucleic Acids Research, 35
(2009)
Automatic plagiarism detection based on latent semantic analysis. [PhD thesis] Czech Republic
Chien-Ying Chen, Jen-Yuan Yeh, Hao-Ren Ke (2010)
Plagiarism Detection using ROUGE and WordNetArXiv, abs/1003.4065
T. Cover, Joy Thomas (2005)
Elements of Information Theory
P. Durani (2006)
Duplicate publications: redundancy in plastic surgery literature.Journal of plastic, reconstructive & aesthetic surgery : JPRAS, 59 9
Se Gwilym, Mc Swan, Henk Giele (2004)
One in 13 'original' articles in the Journal of Bone and Joint Surgery are duplicate or fragmented publications.The Journal of bone and joint surgery. British volume, 86 5
(2014)
J Am Med Inform Assoc
Wen Wang, A. Stolcke, Jing Zheng (2007)
Reranking machine translation hypotheses with structured and web-based language models2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU)
Mounir Errami, Justin Hicks, W. Fisher, D. Trusty, J. Wren, Tara Long, H. Garner (2008)
Déjà vu - A study of duplicate citations in MedlineBioinformatics, 24 2
James Lewis, Stephan Ossowski, Justin Hicks, Mounir Errami, Harold Garner (2006)
Data and text mining Text similarity : an alternative way to search MEDLINE
Jong Kim, Selçuk Candan, J. Tatemura (2009)
Efficient overlap and content reuse detection in blogs and online news articles
Benno Stein, Sven Eissen, Martin Potthast (2007)
Strategies for retrieving plagiarized documents
Marcel Abendroth (2016)
Data Mining Practical Machine Learning Tools And Techniques With Java Implementations
C. Lyon, J. Malcolm, B. Dickerson (2001)
Detecting Short Passages of Similar Text in Large Document Collections
3rd PAN workshop on uncovering plagiarism, authorship and social software misuse
A. Wendelboe, K. Hegmann, L. Gren, S. Alder, G. White, J. Lyon (2004)
Associations between body-mass index and surgery for rotator cuff tendinitis.The Journal of bone and joint surgery. American volume, 86 4
อนิรุธ สืบสิงห์ (2014)
Data Mining Practical Machine Learning Tools and TechniquesJournal of management science, 3
A. Aronson, François-Michel Lang (2010)
An overview of MetaMap: historical perspective and recent advancesJournal of the American Medical Informatics Association : JAMIA, 17 3
T. Hoad, J. Zobel (2003)
Methods for Identifying Versioned and Plagiarized DocumentsJ. Assoc. Inf. Sci. Technol., 54
(2002)
The Unified Medical Language System (UMLS) project. In: Kent A, Hall CM Encyclopedia of library and information science
R. Nawab, Mark Stevenson, Paul Clough (2012)
Detecting Text Reuse with Modified and Weighted N-grams
Alberto Barrón-Cedeño, Paolo Rosso, J. Benedí (2009)
Reducing the Plagiarism Detection Search Space on the Basis of the Kullback-Leibler Distance
Miranda Chong, Lucia Specia, R. Mitkov (2010)
Using Natural Language Processing for Automatic Detection of Plagiarism
(2009)
thesis] Czech Republic, University of West Bohemia
Christopher Manning, Hinrich Schütze (1999)
Book Reviews: Foundations of Statistical Natural Language Processing
Chin-Yew Lin (2004)
ROUGE: A Package for Automatic Evaluation of Summaries
Martin Potthast, Matthias Hagen, Tim Gollub, Martin Tippmann, Johannes Kiesel, Paolo Rosso, E. Stamatatos, Benno Stein (2014)
Overview of the 6th International Competition on Plagiarism Detection
B. Bailey (2002)
Duplicate Publication in the Field of Otolaryngology-Head and Neck SurgeryOtolaryngology–Head and Neck Surgery, 126
R. Nawab (2012)
Mono-lingual Paraphrased Text Reuse and Plagiarism Detection
K. Papineni, Salim Roukos, T. Ward, Wei-Jing Zhu (2002)
Bleu: a Method for Automatic Evaluation of Machine Translation
AbstractObjective We aim to identify duplicate pairs of Medline citations, particularly when the documents are not identical but contain similar information.Materials and methods Duplicate pairs of citations are identified by comparing word n-grams in pairs of documents. N-grams are modified using two approaches which take account of the fact that the document may have been altered. These are: (1) deletion, an item in the n-gram is removed; and (2) substitution, an item in the n-gram is substituted with a similar term obtained from the Unified Medical Language System Metathesaurus. N-grams are also weighted using a score derived from a language model. Evaluation is carried out using a set of 520 Medline citation pairs, including a set of 260 manually verified duplicate pairs obtained from the Deja Vu database.Results The approach accurately detects duplicate Medline document pairs with an F1 measure score of 0.99. Allowing for word deletions and substitution improves performance. The best results are obtained by combining scores for n-grams of length 1–5 words.Discussion Results show that the detection of duplicate Medline citations can be improved by modifying n-grams and that high performance can also be obtained using only unigrams (F1=0.959), particularly when allowing for substitutions of alternative phrases.
Journal of the American Medical Informatics Association – Oxford University Press
Published: Jan 28, 2014
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.