Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Using Electronic Health Records for the Learning Health System: Creation of a Diabetes Research Registry

Using Electronic Health Records for the Learning Health System: Creation of a Diabetes Research... Electronic health records (EHRs) were originally developed for clinical care and billing. As such, the data are not collected, organized, and curated in a fashion that is optimized for secondary use to support the Learning Health System. Population health registries provide tools to support quality improvement. These tools are generally integrated with the live EHR, are intended to use a minimum of computing resources, and may not be appropriate for some research projects. Researchers may require different electronic phenotypes and variable definitions from those typically used for population health, and these definitions may vary from study to study. Establishing a formal registry that is mapped to the Observation Medical Outcomes Partnership common data model provides an opportunity to add custom mappings and more easily share these with other institutions. Performing preprocessing tasks such as data cleaning, calculation of risk scores, time-to-event analysis, imputation, and transforming data into a format for statistical analyses will improve efficiency and make the data easier to use for investigators. Research registries that are maintained outside the EHR also have the luxury of using significant computational resources without jeopardizing clinical care data. This paper describes a virtual Diabetes Registry at Atrium Health Wake Forest Baptist and the plan for its continued development. (JMIR Med Inform 2022;10(9):e39746) doi: 10.2196/39746 KEYWORDS electronic health record; EHR; Learning Health System; registry; diabetes outcome studies that use data at different time points are still Background rare. Research investigators struggle with the processing and statistical analyses of EHR-derived data due to the time-varying The first electronic health records (EHRs) were developed to nature, inconsistency, inaccuracy, lack of documentation, and support clinical care, but later became primarily focused on incompleteness of clinical data. Investigators report that the billing after the creation of diagnosis-related group (DRG) codes amount of time spent deciphering and cleaning these data make [1]. DRGs are intended to provide precise estimates of resource many research projects impractical. A systematic review of the use across different hospitals. Unfortunately, the documentation use of EHR data for population health identified several common necessary to support billing frequently does not result in a data barriers for the use of these data for population health, of which content and structure ideal for the secondary use of these data missing data were most cited [3]. Handling of missing data for research. Safran et al [2] outlined a framework for using requires an understanding of the reasons for missing data, some EHR data for secondary purposes. The use of EHR data for of which can be project-specific reasons and related to decisions research purposes has increased significantly at Wake Forest about how to handle them. Simply excluding patients with and elsewhere over the past several years. However, complex https://medinform.jmir.org/2022/9/e39746 JMIR Med Inform 2022 | vol. 10 | iss. 9 | e39746 | p. 1 (page number not for citation purposes) XSL FO RenderX JMIR MEDICAL INFORMATICS Wells et al missing data may reduce sample size and can lead to biased fashion. Imputation will be one of the services provided to results. One common method for handling missing data in EHR investigators. Prior to imputation, it is necessary to explore data projects is multiple imputation, where statistical models are to identify implausible values that may arise due to inaccurate used to estimate values for missing data elements [4]. measurements or data entry errors. Textbox 1 highlights some Investigators may be unfamiliar with these techniques or may of the data-processing steps that may be required prior to using lack the knowledge and skills to perform the task in a robust clinical data for statistical analyses. Textbox 1. Common data-processing steps required to analyze clinical data. Common data-processing steps Removal of extreme values Correction of erroneous entries Imputation of missing values Calculation of predefined variables Determination of active medication classes on a given date Calculation of dates and time to events Creation of a single analytic data set with a single row per patient from normalized tables with a successful registry can improve the chances for research Research Registries funding. Research registries derived from the EHR can provide a foundation that improves the efficiency for research projects in Population Health Registries a specific disease area. Registries can provide formal There has been a proliferation of population health registries in documentation of the institutional knowledge gained over time EHR systems. These real-time data are necessary for clinical from previous investigations and input from the research care, and these registries are designed to put minimal burden community. The sharing of experiences provides an opportunity on the EHR system, especially given that they are using the live for critical evaluation of the data from investigators with EHR system, which is critical for clinical care. These types of different areas of expertise, leading to improved data quality EHR-based population health registry tools (eg, Healthy Planet, and knowledge of the data necessary for interinstitutional Epic Systems) provide current snapshots of patients and are projects. Preprocessed data, predefined variables, linkage with helpful for population health management. These operational other institutional databases (eg, echocardiogram and pulmonary reporting tools are fast, provide real-time data, and are function tests), linkage with external data (eg, American incorporated into the clinical workflow. These minute-by-minute Community Survey and North Carolina Death Registry), and updates of clinical data are unnecessary for many types of creation of statistical functions can greatly reduce the time and secondary data analyses. Population health registries have cost of secondary data analyses. Data preprocessing can include motivations that may differ from research investigations. For data cleaning (eg, removal of extreme values and imputation of example, population health registries support quality-based missing data), which can reduce the risk of biased results but metrics such as indicators maintained by the National Quality would be inappropriate for clinical data. Prescription Foundation, which may be publicly reported and are used to medications provide another opportunity for data preprocessing. guide reimbursement incentives for programs such as the For example, calculation of dosages and quantity of medications Medicare Shared Savings Plan. In these instances, disease can be determined by applying regular expressions to free text phenotypes and variable definitions are pre-defined by the prescription instructions. Research registries also provide a interested parties. In this scenario, there may be a single criterion mechanism for pooling knowledge and resources from disparate used to define the population and associated metrics. Creating research areas. For example, chart reviews conducted for one additional criteria would be counterproductive. By contrast, a specific research study could provide important knowledge that research registry should provide comprehensive data on benefits all users of the registry. Similarly, researchers could members collected over time, requires statistical analyses, and pool resources to purchase external data (eg, National Death may contain multiple definitions for the same variable. These Index or Centers for Medicare & Medicaid Services [CMS] data allow evaluations at user-defined time points or data) that will benefit all. Research registries provide a time-varying analyses. Because the tool is not integrated into repository for collecting research items not intended for the clinical workflows, there is an opportunity to incorporate large legal medical record to support activities such as creating risk quantities of data into computationally intensive analyses that prediction models and conducting epidemiologic studies. would otherwise be a drain on clinical systems. Furthermore, the research registries also provide potential populations of patients for research studies (clinical trials, Population health registries are ideally suited for clinical care pragmatic trials, implementation science, population health, and quality improvement in that they are available and medical informatics). The increased recognition and instantaneously on the live EHR, have standardized definitions, credibility of an institution’s clinical data for research that comes https://medinform.jmir.org/2022/9/e39746 JMIR Med Inform 2022 | vol. 10 | iss. 9 | e39746 | p. 2 (page number not for citation purposes) XSL FO RenderX JMIR MEDICAL INFORMATICS Wells et al and use limited computing resources. By contrast, the type of multicenter studies involving different EHR systems more research registry that we have created enables the creation of difficult. The registry we have built is mapped to the different cohorts for the same disease entities, makes use of Observational Medical Outcomes Partnership (OMOP) common additional computation resources that would be inappropriate data model (CDM). CDMs such as OMOP have been for the clinical EHR and allows different variable definitions instrumental in creating interoperability standards in support of depending on the specific study. Table 1 lists additional clinical research networks that span multiple institutions. This differences between our research registry and population health registry will take advantage of the data mappings available in registries. OMOP and benefit from the automated tools developed for OMOP for identifying potential data issues. The Phenotype Registries created from EHR data may have different goals and Knowledge Base contains a repository of electronic phenotypes requirements. The table compares features of research and to support registry construction and variable definitions [5]. population health registries. These phenotypes have been successfully integrated into the It should also be noted that EHR vendors each use their own OMOP data model to facilitate implementation at different proprietary technical data models that will map to ontologies research institutions [6]. We will also have the opportunity to such as International Classification of Diseases codes. The create additional custom mappings to our OMOP instance, precise mappings are not made publicly available, which makes which can be leveraged by local researchers. Table 1. Characteristics of research registries vs population health registries. Research registry Population health registry Intermittent updates Real-time updates Higher computational resources Low resource use Complex definitions from a variety of sources and multiple definitions Simple definitions defined by QI-based reimbursement for similar concepts Variety of external data sources Data limited to EHR Extensive data processing Limited data processing Complex temporal relationships Single point in time Easily accessible and detailed documentation Documentation or coding sometimes lacking or not easily accessible Does not need to be integrated into workflow Integration in clinic workflow is crucial Does not require front-end EHR access. Requires front-end EHR access with PHI Mapped to open-source common data models Mapped to vendor-based technical data models QI: quality improvement. EHR: electronic health record. PHI: protected health information. A laboratory values, or prescriptions for hypoglycemic 1c Custom Phenotypes medications. As mentioned previously, research projects may require variable In other instances, existing definitions may be available from definitions that are different from quality-based metrics, and agencies such as Agency for Healthcare Research and Quality variable definitions may vary from one project to the next. or the CMS. For example, we used the CMS definition for an Varying variable definitions are also necessary for cohort acute exacerbation of chronic obstructive pulmonary disease discovery. The definition of diabetes may differ between for a study looking at the impact of a chronic obstructive projects. For example, a case control study needing a limited pulmonary disease care pathway on reducing readmissions [9]. number of cases may want to have a highly specific definition In addition to phenotypes used for cohort discovery, research for type 2 diabetes such as the one created by Kho [7]. By projects require definitions for covariates included in the contrast, a study evaluating the accuracy of different electronic statistical analyses. Depending on the situation, investigators phenotypes may require a highly sensitive definition to capture may desire different definitions for comorbidities such as all possible diabetes cases for manual chart review [8]. Figure hypertension. Textbox 2 shows the contrast between an example 1 shows a Venn diagram illustrating the different patient of a simple definition for hypertension based on diagnoses codes populations that would be captured from our data warehouse vs a complex definition that might be used for a study, where depending on whether one uses diagnosis codes, hemoglobin maximizing the sensitivity for identifying hypertension is key. https://medinform.jmir.org/2022/9/e39746 JMIR Med Inform 2022 | vol. 10 | iss. 9 | e39746 | p. 3 (page number not for citation purposes) XSL FO RenderX JMIR MEDICAL INFORMATICS Wells et al Figure 1. Sets of patients with possible diabetes according to definitions based on diagnoses codes (DX), laboratory values (LAB), or prescriptions (RX). Textbox 2. Example definitions of hypertension. Research registry International Classification of Diseases (ICD) code for hypertension (HTN) in encounter diagnoses, past medical history, or problem list OR Minimum of 3 blood pressure (BP) readings >140/90 over 3 months in the electronic health records Outpatient BP excluding urgent care clinics, emergency department, or observation visits Based on last BP of encounter Exclude BPs when associated temperature≥38 °C OR Active prescription for an antihypertensive agent Population health ICD code for HTN in encounter diagnoses define reason for medication. These new phenotypes can be OMOP Limitations used locally and shared with the OMOP community without being formally integrated into the OMOP model. While the use of OMOP has many advantages in terms of standardization, there are still significant areas of limitations. OMOP will not be able to represent all the new phenotypes that Medications are one area where common data models are still the registry will require, making it necessary to characterize our lacking. For example, OMOP contains a single drug exposure own concepts. Some of these concepts may be derived entirely table for prescriptions, drug administration, dispensing from existing OMOP concepts, but many will require the information, and patient-reported information. Unfortunately, creation of our own. Like all CDMs, OMOP has limitations in dispensing information, patient-reported information, and its capacity to represent information inherent to the compliance are rarely captured in structured EHR data. In transformation from one data model (eg, EHR) to another. In addition, there are no explicitly linked medical reasons for the addition, it will be crucial to have a formal data quality structure exposures in OMOP, and the RxNorm categorizations may not in place to ensure mappings are correct and routinely updated be appropriate for a specific research study. A registry cannot as data change. We have established a phenotype working group resolve all these issues, but the structure provides the flexibility that includes the authors as well as additional faculty members to create and validate new phenotypes. For example, researchers in the Center for Biomedical Informatics. can create and share relevant medication groupings, and algorithms based on specific prescription information (eg, dates Data Structure of prescriptions, stop dates, number of pills, and number of refills) can be created as proxies for active medications and The data structure of the EHR database, a typical population compliance. Similarly, associated information (eg, presence or health registry, and a research registry can vary significantly. absence of different diagnoses codes and laboratory values) can EHR databases are stored in database management systems https://medinform.jmir.org/2022/9/e39746 JMIR Med Inform 2022 | vol. 10 | iss. 9 | e39746 | p. 4 (page number not for citation purposes) XSL FO RenderX JMIR MEDICAL INFORMATICS Wells et al using individual, partially normalized tables for each specific North Carolina state death index, allowing better ascertainment data domain. This structure reduces storage space and speeds of mortality outcomes and censoring of follow-up time. data extractions. By contrast, most statistical analyses require Variables may also be derived from the source data (eg, highest an individual flat data table (also known as pivot table), where blood pressure in the past 24 hours) and time dependent analyses the unit of analyses is the individual rows of patients. The data necessitate multiple rows for each patient that reflect the sets need to include columns for both independent and dependent patient’s current state at a given point in time. Figure 2 provides variables and may require calculations of follow-up time a graphical representation of the different data structures between baseline variables and the outcomes of interest. Some between the EHR database, an EHR-based population health external variables that do not exist in the EHR may be linked registry, and a research registry. with the data set. For example, we link our registry with the Figure 2. Comparing data structures of electronic health records (EHR), population health registries, and research registries. ovarian syndrome in women. Blood glucose values may be Diabetes-Specific Registry abnormally elevated due to inadequate fasting times, which are generally not easily determined in the EHR. Diagnosis codes We chose diabetes as one of the first registries to make available may be incorrectly used before patients meet formal criteria for in our Clinical and Translational Science Award Program given diabetes or may be associated with the incorrect diabetes type. that it represents a focus area of our research enterprise. In The issues in correctly identifying patients with diabetes addition, diabetes is a natural choice for a research registry given highlight the importance of flexible research registries. the rising incidence, chronic nature, established quality metrics, Recognizing the potential need for different diabetes definitions, comorbidities, availability of treatments, and research funding. we chose to create our registry based on the concept of a highly Research also indicates that blood sugar and associated risk sensitive Wide Net with the goal of capturing any evidence of factors are poorly controlled in patients with diabetes. In possible diabetes in the EHR. Figure 3 provides a graphical addition to a desire for improving the health of their patients, display of this concept. health care institutions have direct financial incentives for adequately treating patients with diabetes. Quality indicators This approach mirrors the one used by the SEARCH for approved by a successful diabetes research registry would Diabetes in Youth evaluation of using EHRs for diabetes provide an opportunity for the creation of risk prediction models surveillance [8]. Approaches such as these are necessary given that could be used to target patients at high risk as well as those the infeasibility of manually reviewing all patient charts. The who are most likely to benefit from a specific intervention. SEARCH work found that the simple use of diabetes codes Thorough statistical evaluations of quality improvement projects could accurately determine EHR evidence of diabetes, and the and population health interventions would provide crucial ratio of type 1 to type 2 codes had a high sensitivity and feedback on the potential net benefits of these programs. specificity for identifying youth with type 1 diabetes. Additional work is needed to determine the accuracy of this approach in The identification of diabetes in EHR is surprisingly complex. adults, and further algorithms are needed for identifying children Common methods for identifying potential cases include with type 2 diabetes or other diabetes types. This registry searches for medications, laboratory values, and diagnosis codes. provides a great source of data for future electronic phenotypic Each of these approaches has its own limitations. Medications development and validation. used for diabetes may also be used to treat other conditions. For example, metformin is commonly prescribed for polycystic https://medinform.jmir.org/2022/9/e39746 JMIR Med Inform 2022 | vol. 10 | iss. 9 | e39746 | p. 5 (page number not for citation purposes) XSL FO RenderX JMIR MEDICAL INFORMATICS Wells et al Our registry contains 128,218 patients with possible diabetes provides an opportunity to glean valuable information from according to one or more of these 3 domains, while only 50,759 manual chart reviews of these patients. Annotated data sets patients have evidence of possible diabetes based on all 3 allow for evaluation of existing and creation of new electronic variables simultaneously (Table 2). Identifying random subsets phenotypes for diabetes status, type, and date of diagnoses. of patients who meet different combinations of these criteria Figure 3. Venn diagram showing the use of electronic algorithms combined with chart reviews to identify patients with diabetes. DM: Diabetes Mellitus; EHR: electronic health record; HbA : hemoglobin A ; ICD: International Classification of Diseases. 1c 1c Table 2. Characteristics of patients who showed evidence of possible diabetes based on diagnoses codes, laboratory values, or medications. Characteristics Cohort 1: diagnosis Cohort 3: medications Cohort 2: labs Total unique patients, n (%) 84,755 (66) 90,967 (71) 84,165 (66) Age (years), median (IQR) 66.02 (19.43) 65.46 (20.20) 64.62 (20.98) Sex, n (%) Female 43,510 (51.34) 44,008 (48.38) 43,374(51.53) Male 41,239 (48.66) 46,950 (51.61) 40,783 (48.46) Race, n (%) White 59,547 (70.26) 65,693 (72.22) 60,014 (71.30) Black 19,120 (22.56) 19,042 (20.93) 17,905 (21.27) Other 5794 (6.84) 5938 (6.53) 6004 (7.13) Missing 286 (0.34) 267 (0.29) 223 (0.26) Ever smoker, n (%) 43,414 (51.22) 48,842 (53.69) 44,133 (52.44) Insulin (1 or more prescriptions in the past year), n (%) 25,663 (30.28) 25,943 (28.52) 26,685 (31.70) Charlson comorbidity index, n (median) 83,699 (2) 89,692 (2) 83,094 (2) Median household income, n (median) 66,034 (46,283) 69,253 (45,688) 64,839 (45,927) Most recent hemoglobin A , n (median) 64,959 (6.9) 72,833 (7.1) 69,933 (7.0) 1c 73,037 (70) 88,633 (66) 80,424 (70) Most recent eGFR , n (median) 58,463 (88) 60,398 (88) 59,864 (89) Most recent LDL , n (median) Patients may exist in 1, 2, or all 3 of the cohorts. Random blood sugar ≥200 mg/dL or hemoglobin A ≥6.5%. 1c eGFR: estimated glomerular filtration rate calculated using the Chronic Kidney Disease Epidemiology Collaboration (CKD-Epi) equation. LDL: low-density lipoprotein. https://medinform.jmir.org/2022/9/e39746 JMIR Med Inform 2022 | vol. 10 | iss. 9 | e39746 | p. 6 (page number not for citation purposes) XSL FO RenderX JMIR MEDICAL INFORMATICS Wells et al Jupyter Notebooks Schematic Much like the interinstitutional heuristic and algorithm sharing Figure 4 shows a schematic of the overall architecture of the enabled by sites supporting an OMOP CDM, there is potential registry and highlights some of the guiding principles governing for intrainstitutional collaboration and technique leveraging. the registry creation. Views in OMOP can be created by the honest brokers to Data processing will undoubtedly uncover errors in the clinical provision only the cohort and relevant data permitted by an data (eg, implausible values), which will be cleaned for data institutional review board application to specific authorized analyses. Data cleaning will be performed at the registry or study personnel. post–data extract level. We are not attempting (at least at this Jupyter is a free, open-source, interactive web-based point) to try and change values in the source clinical data, which computational notebook widely adopted by data scientists across is a difficult process and could have clinical implications. It is thousands of enterprises, including Fortune 500 companies, our hope that the registry could be used for data quality projects international research facilities, universities, and start-ups. A that might recognize a way to improve data collection or Jupyter hub server allows users to centrally create and share documentation. codes, equations, visualizations, as well as text and results. It As mentioned previously, the registry is mapped to the OMOP will also allow researchers to interact directly with their data CDM and linked with our existing translational data warehouse. views in OMOP via a programmatic language of their choice, This ensures the standardization of data within the registry while whether it be Python (Python Software Foundation), R (The R exploiting our established infrastructure. Infusion of additional Foundation), or even direct SQL. A library of Jupyter Notebooks data from the vendor EHR database as well as data external to with example code and outputs provided by data analysts can our Clinical Information Systems and our institution provides give researchers a rich starting base of programmatic techniques flexibility and continued creation of additional phenotypes. We that they can modify, improve, and share back for other have created a digital phenotype working group that will researchers to use in their own Jupyter Notebook analyses, prioritize electronic phenotype creation and ensure appropriate greatly reducing the learning curve and lessening code documentation. Access to the registry through Jupyter redundancy and reimplementation. Notebooks increases transparency and simplifies the sharing of code between investigators. Figure 4. Schematic of the overall architecture of the registry, highlighting some of the guiding principles governing the registry creation. CDM: common data model; EHR: electronic health records; OMOP: Observational Medical Outcomes Partnership Common Data Model; PCOR: patient-centered outcomes research common data model; TDW: Translational Data Warehouse in the Wake Forest Clinical and Translational Science Institute; UMLS: Unified Medical Language System. creating randomly selected control patients to simplify the Data Extracts conduct of case control studies. We also have existing R code for the imputation of missing values using multiple imputation Using existing R code created at Wake Forest will allow with chained equations that can be applied after the analytic investigators to extract individual analytic tables that define data set has been created. Creation of multiply imputed data patient characteristics at each given point in time per the specific sets allows an estimation of the amount of missing information study design. Figure 5 highlights how this table would appear. and stability of coefficient estimates [4]. Additionally, a Wake Forest Center for Biomedical Informatics–sponsored pilot grant is establishing a tool for https://medinform.jmir.org/2022/9/e39746 JMIR Med Inform 2022 | vol. 10 | iss. 9 | e39746 | p. 7 (page number not for citation purposes) XSL FO RenderX JMIR MEDICAL INFORMATICS Wells et al Figure 5. Example analytic data set extracted from the registry in a pivot table format. F: false; NA: not applicable; T: true. Institute has an established mechanism for continuous evaluation Future of the informatics program, of which this registry will be a part. Evaluations will include metrics on registry use, publications Although the registry will be based on coded information, we and grants using the registry, as well as formal (eg, surveys) recognize the growth in the data science community of graph and informal feedback. representation of data. The ability to use Jupyter Notebooks to access data and to create and share code will allow investigators Summary to integrate new methods such as graph theory for statistical analyses and to create data visualizations to share. We are Secondary use of EHR data for research is still in its infancy, particularly interested in examining diabetes-related treatment and tools to aid investigators in complex epidemiological-type pathways and intend to use the concept relationship table in studies needed for the Learning Health System are lacking. OMOP to define treatment pathways commonly used as well Typical population health registries do not provide the as pathways based on guidelines. The characterization of flexibility, computational resources, and data complexity treatment pathways is ripe for graph representation. necessary for many research endeavors. The virtual diabetes We recognize that the data, informatics tools, and analytic registry described in this paper is providing our researchers with techniques available for EHR-based analyses are rapidly tools that we hope will enable them to conduct sophisticated changing. We have identified a group of clinical, informatics, statistical analyses in the most transparent and efficient way and statistical professionals who can serve as registry possible. The registry is being built in a way that will allow for stakeholders. Periodic meetings will allow for continuous its continuous refinement based on user experience and in a feedback that will guide decisions on registry directions and format that will enable interinstitutional collaboration. priorities. The Wake Forest Clinical and Translational Science Acknowledgments We would like to acknowledge the Informatics Program of the Wake Forest Clinical and Translational Science Institute (WF CTSI), which is supported by the National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, through Grant Award Number UL1TR001420. Conflicts of Interest None declared. References 1. Fetter RB, Shin Y, Freeman JL, Averill RF, Thompson JD. Case mix definition by diagnosis-related groups. Med Care 1980 Feb;18(2 Suppl):iii, 1-iii,53. [Medline: 7188781] 2. Safran C, Bloomrosen M, Hammond WE, Labkoff S, Markel-Fox S, Tang PC, Expert Panel. Toward a national framework for the secondary use of health data: an American Medical Informatics Association White Paper. J Am Med Inform Assoc 2007;14(1):1-9 [FREE Full text] [doi: 10.1197/jamia.M2273] [Medline: 17077452] https://medinform.jmir.org/2022/9/e39746 JMIR Med Inform 2022 | vol. 10 | iss. 9 | e39746 | p. 8 (page number not for citation purposes) XSL FO RenderX JMIR MEDICAL INFORMATICS Wells et al 3. Kruse CS, Stein A, Thomas H, Kaur H. The use of Electronic Health Records to Support Population Health: A Systematic Review of the Literature. J Med Syst 2018 Sep 29;42(11):214 [FREE Full text] [doi: 10.1007/s10916-018-1075-6] [Medline: 30269237] 4. Wells BJ, Chagin KM, Nowacki AS, Kattan MW. Strategies for handling missing data in electronic health record derived data. EGEMS (Wash DC) 2013;1(3):1035 [FREE Full text] [doi: 10.13063/2327-9214.1035] [Medline: 25848578] 5. Kirby JC, Speltz P, Rasmussen LV, Basford M, Gottesman O, Peissig PL, et al. PheKB: a catalog and workflow for creating electronic phenotype algorithms for transportability. J Am Med Inform Assoc 2016 Nov;23(6):1046-1052 [FREE Full text] [doi: 10.1093/jamia/ocv202] [Medline: 27026615] 6. Hripcsak G, Shang N, Peissig PL, Rasmussen LV, Liu C, Benoit B, et al. Facilitating phenotype transfer using a common data model. J Biomed Inform 2019 Aug;96:103253 [FREE Full text] [doi: 10.1016/j.jbi.2019.103253] [Medline: 31325501] 7. Kho AN, Hayes MG, Rasmussen-Torvik L, Pacheco JA, Thompson WK, Armstrong LL, et al. Use of diverse electronic medical record systems to identify genetic risk for type 2 diabetes within a genome-wide association study. J Am Med Inform Assoc 2012;19(2):212-218 [FREE Full text] [doi: 10.1136/amiajnl-2011-000439] [Medline: 22101970] 8. Wells BJ, Lenoir KM, Wagenknecht LE, Mayer-Davis EJ, Lawrence JM, Dabelea D, et al. Detection of Diabetes Status and Type in Youth Using Electronic Health Records: The SEARCH for Diabetes in Youth Study. Diabetes Care 2020 Oct;43(10):2418-2425 [FREE Full text] [doi: 10.2337/dc20-0063] [Medline: 32737140] 9. Ohar JA, Loh CH, Lenoir KM, Wells BJ, Peters SP. A comprehensive care plan that reduces readmissions after acute exacerbations of COPD. Respir Med 2018 Aug;141:20-25 [FREE Full text] [doi: 10.1016/j.rmed.2018.06.014] [Medline: 30053968] Abbreviations CDM: common data model DRG: diagnosis-related group EHR: electronic health record OMOP: Observational Medical Outcomes Partnership Edited by C Lovis; submitted 20.05.22; peer-reviewed by L Patel; comments to author 26.06.22; revised version received 24.08.22; accepted 28.08.22; published 23.09.22 Please cite as: Wells BJ, Downs SM, Ostasiewski B JMIR Med Inform 2022;10(9):e39746 URL: https://medinform.jmir.org/2022/9/e39746 doi: 10.2196/39746 PMID: ©Brian J Wells, Stephen M Downs, Brian Ostasiewski. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 23.09.2022. This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included. https://medinform.jmir.org/2022/9/e39746 JMIR Med Inform 2022 | vol. 10 | iss. 9 | e39746 | p. 9 (page number not for citation purposes) XSL FO RenderX http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png JMIR Medical Informatics JMIR Publications

Using Electronic Health Records for the Learning Health System: Creation of a Diabetes Research Registry

Loading next page...
 
/lp/jmir-publications/using-electronic-health-records-for-the-learning-health-system-bPMWfps7hF

References (10)

Publisher
JMIR Publications
Copyright
Copyright © The Author(s). Licensed under Creative Commons Attribution cc-by 4.0
ISSN
2291-9694
DOI
10.2196/39746
Publisher site
See Article on Publisher Site

Abstract

Electronic health records (EHRs) were originally developed for clinical care and billing. As such, the data are not collected, organized, and curated in a fashion that is optimized for secondary use to support the Learning Health System. Population health registries provide tools to support quality improvement. These tools are generally integrated with the live EHR, are intended to use a minimum of computing resources, and may not be appropriate for some research projects. Researchers may require different electronic phenotypes and variable definitions from those typically used for population health, and these definitions may vary from study to study. Establishing a formal registry that is mapped to the Observation Medical Outcomes Partnership common data model provides an opportunity to add custom mappings and more easily share these with other institutions. Performing preprocessing tasks such as data cleaning, calculation of risk scores, time-to-event analysis, imputation, and transforming data into a format for statistical analyses will improve efficiency and make the data easier to use for investigators. Research registries that are maintained outside the EHR also have the luxury of using significant computational resources without jeopardizing clinical care data. This paper describes a virtual Diabetes Registry at Atrium Health Wake Forest Baptist and the plan for its continued development. (JMIR Med Inform 2022;10(9):e39746) doi: 10.2196/39746 KEYWORDS electronic health record; EHR; Learning Health System; registry; diabetes outcome studies that use data at different time points are still Background rare. Research investigators struggle with the processing and statistical analyses of EHR-derived data due to the time-varying The first electronic health records (EHRs) were developed to nature, inconsistency, inaccuracy, lack of documentation, and support clinical care, but later became primarily focused on incompleteness of clinical data. Investigators report that the billing after the creation of diagnosis-related group (DRG) codes amount of time spent deciphering and cleaning these data make [1]. DRGs are intended to provide precise estimates of resource many research projects impractical. A systematic review of the use across different hospitals. Unfortunately, the documentation use of EHR data for population health identified several common necessary to support billing frequently does not result in a data barriers for the use of these data for population health, of which content and structure ideal for the secondary use of these data missing data were most cited [3]. Handling of missing data for research. Safran et al [2] outlined a framework for using requires an understanding of the reasons for missing data, some EHR data for secondary purposes. The use of EHR data for of which can be project-specific reasons and related to decisions research purposes has increased significantly at Wake Forest about how to handle them. Simply excluding patients with and elsewhere over the past several years. However, complex https://medinform.jmir.org/2022/9/e39746 JMIR Med Inform 2022 | vol. 10 | iss. 9 | e39746 | p. 1 (page number not for citation purposes) XSL FO RenderX JMIR MEDICAL INFORMATICS Wells et al missing data may reduce sample size and can lead to biased fashion. Imputation will be one of the services provided to results. One common method for handling missing data in EHR investigators. Prior to imputation, it is necessary to explore data projects is multiple imputation, where statistical models are to identify implausible values that may arise due to inaccurate used to estimate values for missing data elements [4]. measurements or data entry errors. Textbox 1 highlights some Investigators may be unfamiliar with these techniques or may of the data-processing steps that may be required prior to using lack the knowledge and skills to perform the task in a robust clinical data for statistical analyses. Textbox 1. Common data-processing steps required to analyze clinical data. Common data-processing steps Removal of extreme values Correction of erroneous entries Imputation of missing values Calculation of predefined variables Determination of active medication classes on a given date Calculation of dates and time to events Creation of a single analytic data set with a single row per patient from normalized tables with a successful registry can improve the chances for research Research Registries funding. Research registries derived from the EHR can provide a foundation that improves the efficiency for research projects in Population Health Registries a specific disease area. Registries can provide formal There has been a proliferation of population health registries in documentation of the institutional knowledge gained over time EHR systems. These real-time data are necessary for clinical from previous investigations and input from the research care, and these registries are designed to put minimal burden community. The sharing of experiences provides an opportunity on the EHR system, especially given that they are using the live for critical evaluation of the data from investigators with EHR system, which is critical for clinical care. These types of different areas of expertise, leading to improved data quality EHR-based population health registry tools (eg, Healthy Planet, and knowledge of the data necessary for interinstitutional Epic Systems) provide current snapshots of patients and are projects. Preprocessed data, predefined variables, linkage with helpful for population health management. These operational other institutional databases (eg, echocardiogram and pulmonary reporting tools are fast, provide real-time data, and are function tests), linkage with external data (eg, American incorporated into the clinical workflow. These minute-by-minute Community Survey and North Carolina Death Registry), and updates of clinical data are unnecessary for many types of creation of statistical functions can greatly reduce the time and secondary data analyses. Population health registries have cost of secondary data analyses. Data preprocessing can include motivations that may differ from research investigations. For data cleaning (eg, removal of extreme values and imputation of example, population health registries support quality-based missing data), which can reduce the risk of biased results but metrics such as indicators maintained by the National Quality would be inappropriate for clinical data. Prescription Foundation, which may be publicly reported and are used to medications provide another opportunity for data preprocessing. guide reimbursement incentives for programs such as the For example, calculation of dosages and quantity of medications Medicare Shared Savings Plan. In these instances, disease can be determined by applying regular expressions to free text phenotypes and variable definitions are pre-defined by the prescription instructions. Research registries also provide a interested parties. In this scenario, there may be a single criterion mechanism for pooling knowledge and resources from disparate used to define the population and associated metrics. Creating research areas. For example, chart reviews conducted for one additional criteria would be counterproductive. By contrast, a specific research study could provide important knowledge that research registry should provide comprehensive data on benefits all users of the registry. Similarly, researchers could members collected over time, requires statistical analyses, and pool resources to purchase external data (eg, National Death may contain multiple definitions for the same variable. These Index or Centers for Medicare & Medicaid Services [CMS] data allow evaluations at user-defined time points or data) that will benefit all. Research registries provide a time-varying analyses. Because the tool is not integrated into repository for collecting research items not intended for the clinical workflows, there is an opportunity to incorporate large legal medical record to support activities such as creating risk quantities of data into computationally intensive analyses that prediction models and conducting epidemiologic studies. would otherwise be a drain on clinical systems. Furthermore, the research registries also provide potential populations of patients for research studies (clinical trials, Population health registries are ideally suited for clinical care pragmatic trials, implementation science, population health, and quality improvement in that they are available and medical informatics). The increased recognition and instantaneously on the live EHR, have standardized definitions, credibility of an institution’s clinical data for research that comes https://medinform.jmir.org/2022/9/e39746 JMIR Med Inform 2022 | vol. 10 | iss. 9 | e39746 | p. 2 (page number not for citation purposes) XSL FO RenderX JMIR MEDICAL INFORMATICS Wells et al and use limited computing resources. By contrast, the type of multicenter studies involving different EHR systems more research registry that we have created enables the creation of difficult. The registry we have built is mapped to the different cohorts for the same disease entities, makes use of Observational Medical Outcomes Partnership (OMOP) common additional computation resources that would be inappropriate data model (CDM). CDMs such as OMOP have been for the clinical EHR and allows different variable definitions instrumental in creating interoperability standards in support of depending on the specific study. Table 1 lists additional clinical research networks that span multiple institutions. This differences between our research registry and population health registry will take advantage of the data mappings available in registries. OMOP and benefit from the automated tools developed for OMOP for identifying potential data issues. The Phenotype Registries created from EHR data may have different goals and Knowledge Base contains a repository of electronic phenotypes requirements. The table compares features of research and to support registry construction and variable definitions [5]. population health registries. These phenotypes have been successfully integrated into the It should also be noted that EHR vendors each use their own OMOP data model to facilitate implementation at different proprietary technical data models that will map to ontologies research institutions [6]. We will also have the opportunity to such as International Classification of Diseases codes. The create additional custom mappings to our OMOP instance, precise mappings are not made publicly available, which makes which can be leveraged by local researchers. Table 1. Characteristics of research registries vs population health registries. Research registry Population health registry Intermittent updates Real-time updates Higher computational resources Low resource use Complex definitions from a variety of sources and multiple definitions Simple definitions defined by QI-based reimbursement for similar concepts Variety of external data sources Data limited to EHR Extensive data processing Limited data processing Complex temporal relationships Single point in time Easily accessible and detailed documentation Documentation or coding sometimes lacking or not easily accessible Does not need to be integrated into workflow Integration in clinic workflow is crucial Does not require front-end EHR access. Requires front-end EHR access with PHI Mapped to open-source common data models Mapped to vendor-based technical data models QI: quality improvement. EHR: electronic health record. PHI: protected health information. A laboratory values, or prescriptions for hypoglycemic 1c Custom Phenotypes medications. As mentioned previously, research projects may require variable In other instances, existing definitions may be available from definitions that are different from quality-based metrics, and agencies such as Agency for Healthcare Research and Quality variable definitions may vary from one project to the next. or the CMS. For example, we used the CMS definition for an Varying variable definitions are also necessary for cohort acute exacerbation of chronic obstructive pulmonary disease discovery. The definition of diabetes may differ between for a study looking at the impact of a chronic obstructive projects. For example, a case control study needing a limited pulmonary disease care pathway on reducing readmissions [9]. number of cases may want to have a highly specific definition In addition to phenotypes used for cohort discovery, research for type 2 diabetes such as the one created by Kho [7]. By projects require definitions for covariates included in the contrast, a study evaluating the accuracy of different electronic statistical analyses. Depending on the situation, investigators phenotypes may require a highly sensitive definition to capture may desire different definitions for comorbidities such as all possible diabetes cases for manual chart review [8]. Figure hypertension. Textbox 2 shows the contrast between an example 1 shows a Venn diagram illustrating the different patient of a simple definition for hypertension based on diagnoses codes populations that would be captured from our data warehouse vs a complex definition that might be used for a study, where depending on whether one uses diagnosis codes, hemoglobin maximizing the sensitivity for identifying hypertension is key. https://medinform.jmir.org/2022/9/e39746 JMIR Med Inform 2022 | vol. 10 | iss. 9 | e39746 | p. 3 (page number not for citation purposes) XSL FO RenderX JMIR MEDICAL INFORMATICS Wells et al Figure 1. Sets of patients with possible diabetes according to definitions based on diagnoses codes (DX), laboratory values (LAB), or prescriptions (RX). Textbox 2. Example definitions of hypertension. Research registry International Classification of Diseases (ICD) code for hypertension (HTN) in encounter diagnoses, past medical history, or problem list OR Minimum of 3 blood pressure (BP) readings >140/90 over 3 months in the electronic health records Outpatient BP excluding urgent care clinics, emergency department, or observation visits Based on last BP of encounter Exclude BPs when associated temperature≥38 °C OR Active prescription for an antihypertensive agent Population health ICD code for HTN in encounter diagnoses define reason for medication. These new phenotypes can be OMOP Limitations used locally and shared with the OMOP community without being formally integrated into the OMOP model. While the use of OMOP has many advantages in terms of standardization, there are still significant areas of limitations. OMOP will not be able to represent all the new phenotypes that Medications are one area where common data models are still the registry will require, making it necessary to characterize our lacking. For example, OMOP contains a single drug exposure own concepts. Some of these concepts may be derived entirely table for prescriptions, drug administration, dispensing from existing OMOP concepts, but many will require the information, and patient-reported information. Unfortunately, creation of our own. Like all CDMs, OMOP has limitations in dispensing information, patient-reported information, and its capacity to represent information inherent to the compliance are rarely captured in structured EHR data. In transformation from one data model (eg, EHR) to another. In addition, there are no explicitly linked medical reasons for the addition, it will be crucial to have a formal data quality structure exposures in OMOP, and the RxNorm categorizations may not in place to ensure mappings are correct and routinely updated be appropriate for a specific research study. A registry cannot as data change. We have established a phenotype working group resolve all these issues, but the structure provides the flexibility that includes the authors as well as additional faculty members to create and validate new phenotypes. For example, researchers in the Center for Biomedical Informatics. can create and share relevant medication groupings, and algorithms based on specific prescription information (eg, dates Data Structure of prescriptions, stop dates, number of pills, and number of refills) can be created as proxies for active medications and The data structure of the EHR database, a typical population compliance. Similarly, associated information (eg, presence or health registry, and a research registry can vary significantly. absence of different diagnoses codes and laboratory values) can EHR databases are stored in database management systems https://medinform.jmir.org/2022/9/e39746 JMIR Med Inform 2022 | vol. 10 | iss. 9 | e39746 | p. 4 (page number not for citation purposes) XSL FO RenderX JMIR MEDICAL INFORMATICS Wells et al using individual, partially normalized tables for each specific North Carolina state death index, allowing better ascertainment data domain. This structure reduces storage space and speeds of mortality outcomes and censoring of follow-up time. data extractions. By contrast, most statistical analyses require Variables may also be derived from the source data (eg, highest an individual flat data table (also known as pivot table), where blood pressure in the past 24 hours) and time dependent analyses the unit of analyses is the individual rows of patients. The data necessitate multiple rows for each patient that reflect the sets need to include columns for both independent and dependent patient’s current state at a given point in time. Figure 2 provides variables and may require calculations of follow-up time a graphical representation of the different data structures between baseline variables and the outcomes of interest. Some between the EHR database, an EHR-based population health external variables that do not exist in the EHR may be linked registry, and a research registry. with the data set. For example, we link our registry with the Figure 2. Comparing data structures of electronic health records (EHR), population health registries, and research registries. ovarian syndrome in women. Blood glucose values may be Diabetes-Specific Registry abnormally elevated due to inadequate fasting times, which are generally not easily determined in the EHR. Diagnosis codes We chose diabetes as one of the first registries to make available may be incorrectly used before patients meet formal criteria for in our Clinical and Translational Science Award Program given diabetes or may be associated with the incorrect diabetes type. that it represents a focus area of our research enterprise. In The issues in correctly identifying patients with diabetes addition, diabetes is a natural choice for a research registry given highlight the importance of flexible research registries. the rising incidence, chronic nature, established quality metrics, Recognizing the potential need for different diabetes definitions, comorbidities, availability of treatments, and research funding. we chose to create our registry based on the concept of a highly Research also indicates that blood sugar and associated risk sensitive Wide Net with the goal of capturing any evidence of factors are poorly controlled in patients with diabetes. In possible diabetes in the EHR. Figure 3 provides a graphical addition to a desire for improving the health of their patients, display of this concept. health care institutions have direct financial incentives for adequately treating patients with diabetes. Quality indicators This approach mirrors the one used by the SEARCH for approved by a successful diabetes research registry would Diabetes in Youth evaluation of using EHRs for diabetes provide an opportunity for the creation of risk prediction models surveillance [8]. Approaches such as these are necessary given that could be used to target patients at high risk as well as those the infeasibility of manually reviewing all patient charts. The who are most likely to benefit from a specific intervention. SEARCH work found that the simple use of diabetes codes Thorough statistical evaluations of quality improvement projects could accurately determine EHR evidence of diabetes, and the and population health interventions would provide crucial ratio of type 1 to type 2 codes had a high sensitivity and feedback on the potential net benefits of these programs. specificity for identifying youth with type 1 diabetes. Additional work is needed to determine the accuracy of this approach in The identification of diabetes in EHR is surprisingly complex. adults, and further algorithms are needed for identifying children Common methods for identifying potential cases include with type 2 diabetes or other diabetes types. This registry searches for medications, laboratory values, and diagnosis codes. provides a great source of data for future electronic phenotypic Each of these approaches has its own limitations. Medications development and validation. used for diabetes may also be used to treat other conditions. For example, metformin is commonly prescribed for polycystic https://medinform.jmir.org/2022/9/e39746 JMIR Med Inform 2022 | vol. 10 | iss. 9 | e39746 | p. 5 (page number not for citation purposes) XSL FO RenderX JMIR MEDICAL INFORMATICS Wells et al Our registry contains 128,218 patients with possible diabetes provides an opportunity to glean valuable information from according to one or more of these 3 domains, while only 50,759 manual chart reviews of these patients. Annotated data sets patients have evidence of possible diabetes based on all 3 allow for evaluation of existing and creation of new electronic variables simultaneously (Table 2). Identifying random subsets phenotypes for diabetes status, type, and date of diagnoses. of patients who meet different combinations of these criteria Figure 3. Venn diagram showing the use of electronic algorithms combined with chart reviews to identify patients with diabetes. DM: Diabetes Mellitus; EHR: electronic health record; HbA : hemoglobin A ; ICD: International Classification of Diseases. 1c 1c Table 2. Characteristics of patients who showed evidence of possible diabetes based on diagnoses codes, laboratory values, or medications. Characteristics Cohort 1: diagnosis Cohort 3: medications Cohort 2: labs Total unique patients, n (%) 84,755 (66) 90,967 (71) 84,165 (66) Age (years), median (IQR) 66.02 (19.43) 65.46 (20.20) 64.62 (20.98) Sex, n (%) Female 43,510 (51.34) 44,008 (48.38) 43,374(51.53) Male 41,239 (48.66) 46,950 (51.61) 40,783 (48.46) Race, n (%) White 59,547 (70.26) 65,693 (72.22) 60,014 (71.30) Black 19,120 (22.56) 19,042 (20.93) 17,905 (21.27) Other 5794 (6.84) 5938 (6.53) 6004 (7.13) Missing 286 (0.34) 267 (0.29) 223 (0.26) Ever smoker, n (%) 43,414 (51.22) 48,842 (53.69) 44,133 (52.44) Insulin (1 or more prescriptions in the past year), n (%) 25,663 (30.28) 25,943 (28.52) 26,685 (31.70) Charlson comorbidity index, n (median) 83,699 (2) 89,692 (2) 83,094 (2) Median household income, n (median) 66,034 (46,283) 69,253 (45,688) 64,839 (45,927) Most recent hemoglobin A , n (median) 64,959 (6.9) 72,833 (7.1) 69,933 (7.0) 1c 73,037 (70) 88,633 (66) 80,424 (70) Most recent eGFR , n (median) 58,463 (88) 60,398 (88) 59,864 (89) Most recent LDL , n (median) Patients may exist in 1, 2, or all 3 of the cohorts. Random blood sugar ≥200 mg/dL or hemoglobin A ≥6.5%. 1c eGFR: estimated glomerular filtration rate calculated using the Chronic Kidney Disease Epidemiology Collaboration (CKD-Epi) equation. LDL: low-density lipoprotein. https://medinform.jmir.org/2022/9/e39746 JMIR Med Inform 2022 | vol. 10 | iss. 9 | e39746 | p. 6 (page number not for citation purposes) XSL FO RenderX JMIR MEDICAL INFORMATICS Wells et al Jupyter Notebooks Schematic Much like the interinstitutional heuristic and algorithm sharing Figure 4 shows a schematic of the overall architecture of the enabled by sites supporting an OMOP CDM, there is potential registry and highlights some of the guiding principles governing for intrainstitutional collaboration and technique leveraging. the registry creation. Views in OMOP can be created by the honest brokers to Data processing will undoubtedly uncover errors in the clinical provision only the cohort and relevant data permitted by an data (eg, implausible values), which will be cleaned for data institutional review board application to specific authorized analyses. Data cleaning will be performed at the registry or study personnel. post–data extract level. We are not attempting (at least at this Jupyter is a free, open-source, interactive web-based point) to try and change values in the source clinical data, which computational notebook widely adopted by data scientists across is a difficult process and could have clinical implications. It is thousands of enterprises, including Fortune 500 companies, our hope that the registry could be used for data quality projects international research facilities, universities, and start-ups. A that might recognize a way to improve data collection or Jupyter hub server allows users to centrally create and share documentation. codes, equations, visualizations, as well as text and results. It As mentioned previously, the registry is mapped to the OMOP will also allow researchers to interact directly with their data CDM and linked with our existing translational data warehouse. views in OMOP via a programmatic language of their choice, This ensures the standardization of data within the registry while whether it be Python (Python Software Foundation), R (The R exploiting our established infrastructure. Infusion of additional Foundation), or even direct SQL. A library of Jupyter Notebooks data from the vendor EHR database as well as data external to with example code and outputs provided by data analysts can our Clinical Information Systems and our institution provides give researchers a rich starting base of programmatic techniques flexibility and continued creation of additional phenotypes. We that they can modify, improve, and share back for other have created a digital phenotype working group that will researchers to use in their own Jupyter Notebook analyses, prioritize electronic phenotype creation and ensure appropriate greatly reducing the learning curve and lessening code documentation. Access to the registry through Jupyter redundancy and reimplementation. Notebooks increases transparency and simplifies the sharing of code between investigators. Figure 4. Schematic of the overall architecture of the registry, highlighting some of the guiding principles governing the registry creation. CDM: common data model; EHR: electronic health records; OMOP: Observational Medical Outcomes Partnership Common Data Model; PCOR: patient-centered outcomes research common data model; TDW: Translational Data Warehouse in the Wake Forest Clinical and Translational Science Institute; UMLS: Unified Medical Language System. creating randomly selected control patients to simplify the Data Extracts conduct of case control studies. We also have existing R code for the imputation of missing values using multiple imputation Using existing R code created at Wake Forest will allow with chained equations that can be applied after the analytic investigators to extract individual analytic tables that define data set has been created. Creation of multiply imputed data patient characteristics at each given point in time per the specific sets allows an estimation of the amount of missing information study design. Figure 5 highlights how this table would appear. and stability of coefficient estimates [4]. Additionally, a Wake Forest Center for Biomedical Informatics–sponsored pilot grant is establishing a tool for https://medinform.jmir.org/2022/9/e39746 JMIR Med Inform 2022 | vol. 10 | iss. 9 | e39746 | p. 7 (page number not for citation purposes) XSL FO RenderX JMIR MEDICAL INFORMATICS Wells et al Figure 5. Example analytic data set extracted from the registry in a pivot table format. F: false; NA: not applicable; T: true. Institute has an established mechanism for continuous evaluation Future of the informatics program, of which this registry will be a part. Evaluations will include metrics on registry use, publications Although the registry will be based on coded information, we and grants using the registry, as well as formal (eg, surveys) recognize the growth in the data science community of graph and informal feedback. representation of data. The ability to use Jupyter Notebooks to access data and to create and share code will allow investigators Summary to integrate new methods such as graph theory for statistical analyses and to create data visualizations to share. We are Secondary use of EHR data for research is still in its infancy, particularly interested in examining diabetes-related treatment and tools to aid investigators in complex epidemiological-type pathways and intend to use the concept relationship table in studies needed for the Learning Health System are lacking. OMOP to define treatment pathways commonly used as well Typical population health registries do not provide the as pathways based on guidelines. The characterization of flexibility, computational resources, and data complexity treatment pathways is ripe for graph representation. necessary for many research endeavors. The virtual diabetes We recognize that the data, informatics tools, and analytic registry described in this paper is providing our researchers with techniques available for EHR-based analyses are rapidly tools that we hope will enable them to conduct sophisticated changing. We have identified a group of clinical, informatics, statistical analyses in the most transparent and efficient way and statistical professionals who can serve as registry possible. The registry is being built in a way that will allow for stakeholders. Periodic meetings will allow for continuous its continuous refinement based on user experience and in a feedback that will guide decisions on registry directions and format that will enable interinstitutional collaboration. priorities. The Wake Forest Clinical and Translational Science Acknowledgments We would like to acknowledge the Informatics Program of the Wake Forest Clinical and Translational Science Institute (WF CTSI), which is supported by the National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, through Grant Award Number UL1TR001420. Conflicts of Interest None declared. References 1. Fetter RB, Shin Y, Freeman JL, Averill RF, Thompson JD. Case mix definition by diagnosis-related groups. Med Care 1980 Feb;18(2 Suppl):iii, 1-iii,53. [Medline: 7188781] 2. Safran C, Bloomrosen M, Hammond WE, Labkoff S, Markel-Fox S, Tang PC, Expert Panel. Toward a national framework for the secondary use of health data: an American Medical Informatics Association White Paper. J Am Med Inform Assoc 2007;14(1):1-9 [FREE Full text] [doi: 10.1197/jamia.M2273] [Medline: 17077452] https://medinform.jmir.org/2022/9/e39746 JMIR Med Inform 2022 | vol. 10 | iss. 9 | e39746 | p. 8 (page number not for citation purposes) XSL FO RenderX JMIR MEDICAL INFORMATICS Wells et al 3. Kruse CS, Stein A, Thomas H, Kaur H. The use of Electronic Health Records to Support Population Health: A Systematic Review of the Literature. J Med Syst 2018 Sep 29;42(11):214 [FREE Full text] [doi: 10.1007/s10916-018-1075-6] [Medline: 30269237] 4. Wells BJ, Chagin KM, Nowacki AS, Kattan MW. Strategies for handling missing data in electronic health record derived data. EGEMS (Wash DC) 2013;1(3):1035 [FREE Full text] [doi: 10.13063/2327-9214.1035] [Medline: 25848578] 5. Kirby JC, Speltz P, Rasmussen LV, Basford M, Gottesman O, Peissig PL, et al. PheKB: a catalog and workflow for creating electronic phenotype algorithms for transportability. J Am Med Inform Assoc 2016 Nov;23(6):1046-1052 [FREE Full text] [doi: 10.1093/jamia/ocv202] [Medline: 27026615] 6. Hripcsak G, Shang N, Peissig PL, Rasmussen LV, Liu C, Benoit B, et al. Facilitating phenotype transfer using a common data model. J Biomed Inform 2019 Aug;96:103253 [FREE Full text] [doi: 10.1016/j.jbi.2019.103253] [Medline: 31325501] 7. Kho AN, Hayes MG, Rasmussen-Torvik L, Pacheco JA, Thompson WK, Armstrong LL, et al. Use of diverse electronic medical record systems to identify genetic risk for type 2 diabetes within a genome-wide association study. J Am Med Inform Assoc 2012;19(2):212-218 [FREE Full text] [doi: 10.1136/amiajnl-2011-000439] [Medline: 22101970] 8. Wells BJ, Lenoir KM, Wagenknecht LE, Mayer-Davis EJ, Lawrence JM, Dabelea D, et al. Detection of Diabetes Status and Type in Youth Using Electronic Health Records: The SEARCH for Diabetes in Youth Study. Diabetes Care 2020 Oct;43(10):2418-2425 [FREE Full text] [doi: 10.2337/dc20-0063] [Medline: 32737140] 9. Ohar JA, Loh CH, Lenoir KM, Wells BJ, Peters SP. A comprehensive care plan that reduces readmissions after acute exacerbations of COPD. Respir Med 2018 Aug;141:20-25 [FREE Full text] [doi: 10.1016/j.rmed.2018.06.014] [Medline: 30053968] Abbreviations CDM: common data model DRG: diagnosis-related group EHR: electronic health record OMOP: Observational Medical Outcomes Partnership Edited by C Lovis; submitted 20.05.22; peer-reviewed by L Patel; comments to author 26.06.22; revised version received 24.08.22; accepted 28.08.22; published 23.09.22 Please cite as: Wells BJ, Downs SM, Ostasiewski B JMIR Med Inform 2022;10(9):e39746 URL: https://medinform.jmir.org/2022/9/e39746 doi: 10.2196/39746 PMID: ©Brian J Wells, Stephen M Downs, Brian Ostasiewski. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 23.09.2022. This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included. https://medinform.jmir.org/2022/9/e39746 JMIR Med Inform 2022 | vol. 10 | iss. 9 | e39746 | p. 9 (page number not for citation purposes) XSL FO RenderX

Journal

JMIR Medical InformaticsJMIR Publications

Published: Sep 23, 2022

Keywords: electronic health record; EHR; Learning Health System; registry; diabetes

There are no references for this article.