Access the full text.
Sign up today, get DeepDyve free for 14 days.
Regina Nuzzo (2014)
Scientific method: Statistical errorsNature, 506
G. Knatterud (2002)
Management and conduct of randomized controlled trials.Epidemiologic reviews, 24 1
F. Bray, D. Parkin (2009)
Evaluation of data quality in the cancer registry: principles and methods. Part I: comparability, validity and timeliness.European journal of cancer, 45 5
C. Sáez, M. Robles, J. García-Gómez (2013)
Comparative study of probability distribution distances to define a metric for the stability of multi-source biomedical research data2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)
S. Galea, J. Ahern, A. Karpati (2005)
A model of underlying socioeconomic vulnerability in human populations: evidence from variability in population health and implications for public health.Social science & medicine, 60 11
W. Shewhart (1939)
Statistical method from the viewpoint of quality control
L. Halsey, D. Curran-Everett, S. Vowler, G. Drummond (2015)
The fickle P value generates irreproducible resultsNature Methods, 12
L. Toubiana, M. Cuggia (2014)
Big Data and Smart Health Strategies: Findings from the Health Information Systems PerspectiveYearbook of Medical Informatics, 23
Gerhard Svolba, Peter Bauer (1999)
Statistical quality control in clinical trials.Controlled clinical trials, 20 6
A. McMurry, S. Murphy, D. MacFadden, G. Weber, William Simons, John Orechia, Jonathan Bickel, Nich Wattanasin, Clint Gilbert, Philip Trevvett, S. Churchill, I. Kohane (2013)
SHRINE: Enabling Nationally Scalable Multi-Site Disease StudiesPLoS ONE, 8
S. Liaw, Alireza Rahimi, Alireza Rahimi, P. Ray, J. Taggart, S. Dennis, S. Lusignan, B. Jalaludin, A. Yeo, A. Talaei-Khoei (2013)
Towards an ontology for data quality in integrated chronic disease management: A realist review of the literatureInternational journal of medical informatics, 82 1
Ó. Zurriaga, H. Vanaclocha, M. Martínez-Beneito, P. Botella-Rocamora (2008)
Spatio-temporal evolution of female lung cancer mortality in a region of Spain, is it worth taking migration into account?BMC Cancer, 8
G. Weber, S. Murphy, A. McMurry, D. MacFadden, D. Nigrin, S. Churchill, I. Kohane (2009)
Application of Information Technology: The Shared Health Research Information Network (SHRINE): A Prototype Federated Query Tool for Clinical Data RepositoriesJ. Am. Medical Informatics Assoc., 16
Y. Kopylova, D. Buell, Chin-Tser Huang, Jeff Janies (2008)
Mutual information applied to anomaly detectionJournal of Communications and Networks, 10
T. Cover, Joy Thomas (2005)
Elements of Information Theory
Sami Borg, Arja Kuula (2008)
Open Access to and Reuse of Research Data - The State of the Art in Finland
K. Walker, O. Kirillova, Suzanne Gillespie, David Hsiao, Valentyna Pishchalenko, A. Pai, J. Puro, R. Plumley, R. Kudyakov, Weiming Hu, Art Allisany, M. McBurnie, S. Kurtz, Brian Hazlehurst (2014)
Using the CER Hub to ensure data quality in a multi-institution smoking cessation studyJournal of the American Medical Informatics Association : JAMIA, 21 6
C. Sáez, P. Rodrigues, João Gama, M. Robles, J. García-Gómez (2015)
Probabilistic change detection and visualization methods for the assessment of temporal stability in biomedical data qualityData Mining and Knowledge Discovery, 29
G. Knatterud, F. Rockhold, S. George, F. Barton, C. Davis, W. Fairweather, T. Honohan, R. Mowery, R. O'neill (1998)
Guidelines for quality assurance in multicenter trials: a position paper.Controlled clinical trials, 19 5
Sandra MacKenzie, Matt Wyatt, R. Schuff, J. Tenenbaum, Nick Anderson (2012)
Practices and perspectives on building integrated data repositories: results from a 2010 CTSA surveyJournal of the American Medical Informatics Association : JAMIA, 19 e1
S. Tortajada, E. Fuster-García, Javier Vicente, P. Wesseling, F. Howe, M. Julià-Sapé, A. Candiota, D. Monleón, À. Moreno-Torres, J. Pujol, John Griffiths, Alan Wright, Andrew Peet, M. Martínez-Bisbal, B. Celda, C. Arús, M. Robles, J. García-Gómez (2011)
Incremental Gaussian Discriminant Analysis based on Graybill and Deal weighted combination of estimators for brain tumour diagnosisJournal of biomedical informatics, 44 4
João Gama, M. Gaber (2007)
Learning from Data Streams: Processing Techniques in Sensor Networks
B. Massoudi, K. Goodman, I. Gotham, J. Holmes, L. Lang, K. Miner, David Potenziani, Janise Richards, A. Turner, Paul Fu (2012)
An informatics agenda for public health: summarized recommendations from the 2011 AMIA PHI ConferenceJournal of the American Medical Informatics Association : JAMIA, 19 5
Mingfeng Lin, H. Lucas, Galit Shmueli (2013)
Research Commentary - Too Big to Fail: Large Samples and the p-Value ProblemInf. Syst. Res., 24
Hong Chen, D. Hailey, Ning Wang, Robert Yu (2014)
A Review of Data Quality Assessment Methods for Public Health Information SystemsInternational Journal of Environmental Research and Public Health, 11
David Moher, Sally Hopewell, K. Schulz, V. Montori, P. Gøtzsche, P. Devereaux, Diana Elbourne, M. Egger, Douglas Altman (2010)
CONSORT 2010 Explanation and Elaboration: updated guidelines for reporting parallel group randomised trialsThe BMJ, 340
C. Sáez, M. Robles, J. García-Gómez (2017)
Stability metrics for multi-source biomedical data based on simplicial projections from probability distribution distancesStatistical Methods in Medical Research, 26
G. Weber, William Barnett, M. Conlon, D. Eichmann, W. Kibbe, Holly Falk-Krzesinski, Michael Halaas, Layne Johnson, Eric Meeks, Donald Mitchell, T. Schleyer, S. Stallings, Michael Warden, Maninder Kahlon (2011)
Direct2Experts: a pilot national network to demonstrate interoperability among research-networking platformsJournal of the American Medical Informatics Association : JAMIA, 18 Suppl 1
J. Gassman, W. Owen, T. Kuntz, J. Martin, W. Amoroso (1995)
Data quality assurance, monitoring, and reporting.Controlled clinical trials, 16 2 Suppl
I. Borg, P. Groenen (1999)
Modern Multidimensional Scaling: Theory and ApplicationsJournal of Educational Measurement, 40
C. AbouZahr, L. Mikkelsen, R. Rampatige, Alan Lopez (2013)
Strengthening civil registration and vital statistics for births, deaths and causes of death: resource kit
D. Sayer, D. Goodridge (2007)
Pilot study: assessment of interlaboratory variability of sequencing-based typing DNA sequence data quality.Tissue antigens, 69 Suppl 1
R. Cruz-Correia, P. Rodrigues, A. Freitas, F. Almeida, Rong Chen, A. Costa-Pereira (2009)
Data Quality and Integration Issues in Electronic Health Records
N. Weiskopf, C. Weng (2013)
Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical researchJournal of the American Medical Informatics Association : JAMIA, 20
Toubiana (2014)
Big data and smart health strategies: findings from the health information systems perspectiveIMIA Yearb., 9
Jianhua Lin (1991)
Divergence measures based on the Shannon entropyIEEE Trans. Inf. Theory, 37
M. Natter, Justin Quan, David Ortiz, A. Bousvaros, N. Ilowite, C. Inman, K. Marsolo, A. McMurry, C. Sandborg, L. Schanberg, C. Wallace, Robert Warren, G. Weber, K. Mandl (2012)
An i2b2-based, generalizable, open source, self-scaling chronic disease registryJournal of the American Medical Informatics Association : JAMIA, 20
B. Pompe, P. Blidh, D. Hoyer, M. Eiselt (1998)
Using mutual information to measure coupling in the cardiorespiratory system.IEEE engineering in medicine and biology magazine : the quarterly magazine of the Engineering in Medicine & Biology Society, 17 6
J. García-Gómez, J. Luts, M. Julià-Sapé, P. Krooshof, S. Tortajada, Javier Robledo, W. Melssen, E. Fuster-García, I. Olier, G. Postma, D. Monleón, À. Moreno-Torres, J. Pujol, A. Candiota, M. Martínez-Bisbal, J. Suykens, L. Buydens, B. Celda, S. Huffel, C. Arús, M. Robles (2008)
Multiproject–multicenter evaluation of automatic brain tumor classification by magnetic resonance spectroscopyMagma (New York, N.y.), 22
Avital Cnaan, Nan Laird, Peter Slasor (1997)
Using the general linear mixed model to analyse unbalanced repeated measures and longitudinal data.Statistics in medicine, 16 20
M. Kahn, M. Raebel, J. Glanz, Karen Riedlinger, J. Steiner (2012)
A pragmatic framework for single-site and multisite data quality assessment in electronic health record-based clinical research.Medical care, 50 Suppl
Objective To assess the variability in data distributions among data sources and over time through a case study of a large multisite repository as a systematic approach to data quality (DQ).Materials and Methods Novel probabilistic DQ control methods based on information theory and geometry are applied to the Public Health Mortality Registry of the Region of Valencia, Spain, with 512 143 entries from 2000 to 2012, disaggregated into 24 health departments. The methods provide DQ metrics and exploratory visualizations for (1) assessing the variability among multiple sources and (2) monitoring and exploring changes with time. The methods are suited to big data and multitype, multivariate, and multimodal data.Results The repository was partitioned into 2 probabilistically separated temporal subgroups following a change in the Spanish National Death Certificate in 2009. Punctual temporal anomalies were noticed due to a punctual increment in the missing data, along with outlying and clustered health departments due to differences in populations or in practices.Discussion Changes in protocols, differences in populations, biased practices, or other systematic DQ problems affected data variability. Even if semantic and integration aspects are addressed in data sharing infrastructures, probabilistic variability may still be present. Solutions include fixing or excluding data and analyzing different sites or time periods separately. A systematic approach to assessing temporal and multisite variability is proposed.Conclusion Multisite and temporal variability in data distributions affects DQ, hindering data reuse, and an assessment of such variability should be a part of systematic DQ procedures.
Journal of the American Medical Informatics Association – Oxford University Press
Published: Nov 23, 2016
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.