ATLAS: an automated association test using probabilistically linked health records with application to genetic studies

Harrison G Zhang; Boris P Hejblum; Griffin M Weber; Nathan P Palmer; Susanne E Churchill; Peter Szolovits; Shawn N Murphy; Katherine P Liao; Isaac S Kohane; Tianxi Cai

doi:10.1093/jamia/ocab187

Loading next page...

References (49)

Neter (1965)
The effect of mismatching on the measurement of response errors
J Am Stat Assoc, 60
Plant (2000)
Relationship between time-integrated C-reactive protein levels and radiologic progression in patients with rheumatoid arthritis
Arthritis Rheum, 43
Shen (2015)
Rheumatoid factor, anti-cyclic citrullinated peptide antibody, C-reactive protein, and erythrocyte sedimentation rate for the clinical diagnosis of rheumatoid arthritis
Lab Med, 46
Huang (2020)
Impact of ICD10 and secular changes on electronic medical record rheumatoid arthritis algorithms
Rheumatology (Oxford), 59
Chipperfield (2019)
A weighting approach to making inference with probabilistically linked data
Stat Neerland, 73
J. Chipperfield (2019)
A weighting approach to making inference with probabilistically linked data
Stat Med, 73
BP Hejblum, GM Weber, KP Liao (2019)
Probabilistic record linkage of de-identified research datasets with discrepancies using diagnosis codes
Int Stat Rev, 6
Dalzell (2018)
Regression modeling and file matching using possibly erroneous matching variables
J Comput Graph Stat, 27
Rentsch (2018)
Impact of linkage quality on inferences drawn from analyses using data with high rates of linkage errors in rural Tanzania
BMC Med Res Methodol, 18
Liao (2013)
Associations of autoantibodies, autoimmune risk alleles, and clinical diagnoses from the electronic medical records in rheumatoid arthritis cases and non–rheumatoid arthritis controls
Arthritis Rheum, 65
NM Dalzell, JP. Reiter (2018)
Regression modeling and file matching using possibly erroneous matching variables
Stat Neerland, 27
Denny (2010)
PheWAS: Demonstrating the feasibility of a phenome-wide scan to discover gene–disease associations
Bioinformatics, 26
Han (2019)
Statistical analysis with linked data
Int Stat Rev, 87
Minnier (2011)
A perturbation method for inference on regularized regression estimates
J Am Stat Assoc, 106
Wolfe (1997)
Comparative usefulness of C-reactive protein and erythrocyte sedimentation rate in patients with rheumatoid arthritis
J Rheumatol, 24
Benjamini (1995)
Controlling the false discovery rate: a practical and powerful approach to multiple testing
J R Stat Soc B (Methodol), 57
Hof (2012)
Methods for analyzing data from probabilistic linkage strategies based on partially identifying variables
Stat Med, 31
CT Rentsch, K Harron, M Urassa (2018)
Impact of linkage quality on inferences drawn from analyses using data with high rates of linkage errors in rural Tanzania
J Am Stat Assoc, 18
Wei (2017)
Evaluating phecodes, clinical classification software, and icd-9-cm codes for phenome-wide association studies in the electronic health record
PLoS One, 12
Karlson (2016)
Building the partners healthcare biobank at partners personalized medicine: Informed consent, return of research results, recruitment lessons and operational considerations
J Pers Med, 6
Pope (2021)
C-reactive protein and implications in rheumatoid arthritis and associated comorbidities
Semin Arthritis Rheum, 51
Kohane (2012)
A translational engine at the national scale: informatics for integrating biology and the bedside
J Am Med Inform Assoc, 19
K Schmidlin, KM Clough-Gorr, A Spoerri (2013)
Impact of unlinked deaths and coding changes on mortality trends in the Swiss national cohort
PLoS One, 13
Dessein (2004)
High sensitivity C-reactive protein as a disease activity marker in rheumatoid arthritis
J Rheumatol, 31
Okada (2014)
Genetics of rheumatoid arthritis contributes to biology and drug discovery
Nature, 506
Gainer (2016)
The biobank portal for partners personalized medicine: a query tool for working with consented biobank samples, genotypes, and phenotypes using i2b2
J Pers Med, 6
Wolfe (2001)
The level of inflammation in rheumatoid arthritis is determined early and remains stable over the longterm course of the illness
J Rheumatol, 28
Liao (2010)
Electronic medical records for discovery research in rheumatoid arthritis
Arthritis Care Res (Hoboken), 62
Gutman (2013)
A Bayesian procedure for file linking to analyze end-of-life medical costs
J Am Stat Assoc, 108
Seaman (2012)
Combining multiple imputation and inverse-probability weighting
Biometrics, 68
Nalichowski (2006)
Calculating the benefits of a research patient data repository
AMIA Annu Symp Proc, 2006
CL Moore, J Amin, HF Gidding (2014)
A new method for assessing how sensitivity and specificity of linkage studies affects estimation
BMC Med Res Methodol, 9
Butte (2008)
Translational bioinformatics: coming of age
J Am Med Inform Assoc, 15
Moore (2014)
A new method for assessing how sensitivity and specificity of linkage studies affects estimation
PLoS One, 9
Schmidlin (2013)
Impact of unlinked deaths and coding changes on mortality trends in the Swiss national cohort
BMC Med Inform Decis Mak, 13
Aggarwal (2009)
Anti-citrullinated peptide antibody (ACPA) assays and their role in the diagnosis of rheumatoid arthritis
Arthritis Rheum, 61
Y Han, P. Lahiri (2019)
Statistical analysis with linked data
J Comput Graph Stat, 87
Boutin (2016)
The information technology infrastructure for the translational genomics core and the partners biobank at partners personalized medicine
J Pers Med, 6
Amos (1977)
Rheumatoid arthritis: relation of serum C-reactive protein and erythrocyte sedimentation rates to radiographic changes
Br Med J, 1
Raychaudhuri (2012)
Five amino acids in three HLA proteins explain most of the association between MHC and seropositive rheumatoid arthritis
Nat Genet, 44
Alemao (2016)
Evaluation of the association between C-reactive protein and anti-citrullinated protein antibody in rheumatoid arthritis: analysis of two clinical practice data sets [abstract]
Arthritis Rheumatol, 68 (suppl 10): 1226
Shi (2020)
Spherical regression under mismatch corruption with application to automated knowledge translation
J Am Stat Assoc
MHP Hof, AH. Zwinderman (2012)
Methods for analyzing data from probabilistic linkage strategies based on partially identifying variables
Int J Epidemiol, 31
Doidge (2019)
Reflections on modern methods: linkage error bias
Int J Epidemiol, 48
Harron (2013)
Linkage, evaluation and analysis of national electronic healthcare data: application to providing enhanced blood-stream infection surveillance in paediatric intensive care
PLoS One, 8
JC Doidge, KL. Harron (2019)
Reflections on modern methods: linkage error bias
BMC Med Inform Decis Mak, 48
Kurreeman (2011)
Genetic basis of autoantibody positive and negative rheumatoid arthritis risk in a multi-ethnic cohort derived from electronic health records
Am J Hum Genet, 88
Jin (2001)
A simple resampling method by perturbing the minimand
Biometrika, 88
Hejblum (2019)
Probabilistic record linkage of de-identified research datasets with discrepancies using diagnosis codes
Sci Data, 6

Publisher: Oxford University Press
Copyright: © The Author(s) 2021. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com
ISSN: 1067-5027
eISSN: 1527-974X
DOI: 10.1093/jamia/ocab187
Publisher site: See Article on Publisher Site

Abstract

ObjectiveLarge amounts of health data are becoming available for biomedical research. Synthesizing information across databases may capture more comprehensive pictures of patient health and enable novel research studies. When no gold standard mappings between patient records are available, researchers may probabilistically link records from separate databases and analyze the linked data. However, previous linked data inference methods are constrained to certain linkage settings and exhibit low power. Here, we present ATLAS, an automated, flexible, and robust association testing algorithm for probabilistically linked data.Materials and MethodsMissing variables are imputed at various thresholds using a weighted average method that propagates uncertainty from probabilistic linkage. Next, estimated effect sizes are obtained using a generalized linear model. ATLAS then conducts the threshold combination test by optimally combining P values obtained from data imputed at varying thresholds using Fisher’s method and perturbation resampling.ResultsIn simulations, ATLAS controls for type I error and exhibits high power compared to previous methods. In a real-world genetic association study, meta-analysis of ATLAS-enabled analyses on a linked cohort with analyses using an existing cohort yielded additional significant associations between rheumatoid arthritis genetic risk score and laboratory biomarkers.DiscussionWeighted average imputation weathers false matches and increases contribution of true matches to mitigate linkage error-induced bias. The threshold combination test avoids arbitrarily choosing a threshold to rule a match, thus automating linked data-enabled analyses and preserving power.ConclusionATLAS promises to enable novel and powerful research studies using linked data to capitalize on all available data sources.

Journal

Journal of the American Medical Informatics Association – Oxford University Press

Published: Oct 5, 2021

Keywords: electronic health records; record linkage; genetic association studies; biorepositories; perturbation resampling

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

ATLAS: an automated association test using probabilistically linked health records with application to genetic studies

ATLAS: an automated association test using probabilistically linked health records with application to genetic studies

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

ATLAS: an automated association test using probabilistically linked health records with application to genetic studies

ATLAS: an automated association test using probabilistically linked health records with application to genetic studies

References (49)

Abstract

Journal

Recommended Articles

There are no references for this article.

Our policy towards the use of cookies