Journal:Design and refinement of a data quality assessment workflow for a large pediatric research network

From LIMSWiki
Revision as of 19:10, 18 October 2019 by Shawndouglas (talk | contribs) (Saving and adding more.)
Jump to navigationJump to search
Full article title Design and refinement of a data quality assessment workflow for a large pediatric research network
Journal eGEMs
Author(s) Khare, Ritu; Utidjian, Levon H.; Razzaghi, Hanieh; Soucek, Victoria; Burrows, Evanette; Eckrich, Daniel;
Hoyt, Richard; Weistein, Harris; Miller, Matthew W.; Soler, David; Tucker, Joshua; Bailey, L. Charles
Author affiliation(s) The Children's Hospital of Philadelphia, Seattle Children’s Hospital, Nemours Children’s Health System,
Nationwide Children’s Hospital
Primary contact Email: kharer at email dot chop dot edu
Year published 2019
Volume and issue 7(1)
Page(s) 36
DOI 10.5334/egems.294
ISSN 2327-9214
Distribution license Creative Commons Attribution 4.0 International
Website https://egems.academyhealth.org/articles/10.5334/egems.294/
Download https://egems.academyhealth.org/articles/10.5334/egems.294/galley/397/download/ (PDF)

Abstract

Background: Clinical data research networks (CDRNs) aggregate electronic health record (EHR) data from multiple hospitals to enable large-scale research. A critical operation toward building a CDRN is conducting continual evaluations to optimize data quality. The key challenges include determining the assessment coverage on big datasets, handling data variability over time, and facilitating communication with data teams. This study presents the evolution of a systematic workflow for data quality assessment in CDRNs.

Implementation: Using a specific CDRN as a use case, a workflow was iteratively developed and packaged into a toolkit. The resultant toolkit comprises 685 data quality checks to identify any data quality issues, procedures to reconciliate with a history of known issues, and a contemporary GitHub-based reporting mechanism for organized tracking.

Results: During the first two years of network development, the toolkit assisted in discovering over 800 data characteristics and resolving over 1400 programming errors. Longitudinal analysis indicated that the variability in time to resolution (15day mean, 24day IQR) is due to the underlying cause of the issue, perceived importance of the domain, and the complexity of assessment.

Conclusions: In the absence of a formalized data quality framework, CDRNs continue to face challenges in data management and query fulfillment. The proposed data quality toolkit was empirically validated on a particular network and is publicly available for other networks. While the toolkit is user-friendly and effective, the usage statistics indicated that the data quality process is very time-intensive, and sufficient resources should be dedicated for investigating problems and optimizing data for research.

Keywords: CDRN, checks, data quality, electronic health records, GitHub, issues

Background

Collaborations across multiple institutions are essential to achieve sufficient cohort sizes in clinical research and strengthen findings in a wide range of scientific studies.[1][2] Clinical data research networks (CDRNs) combine electronic health record (EHR) data from multiple hospital systems to provide integrated access for conducting large-scale research studies. The results of CDRN-based studies, however, come with the caveat that the EHR data are directed towards clinical operations rather than clinical research. Suboptimal quality of EHR data and incorrect interpretation of EHR-derived data not only lead to inaccurate study results but also increase the cost of conducting science.[3] Hence, one of the most critical aspects in building a CDRN is conducting continual quality evaluation to ensure that the patient-level clinical datasets are “fit for research use.”[3][4][5] A well-designed data quality (DQ) assessment program helps data developers in identifying programming and logic errors when deriving secondary datasets from EHRs (e.g., an incorrect mapping of patient’s race information into controlled vocabularies). Also, it assists data consumers and scientists in learning the peculiar characteristics of network data (e.g., “acute respiratory tract infections” and “attention deficit hyperactive disorder” are likely to be among the most frequent diagnoses in a pediatric data resource) as well as helps assess the readiness of network data for specific research studies.[6]


References

  1. Bailey, L.C.; Milov, D.E.; Kelleher, K. et al. (2013). "Multi-Institutional Sharing of Electronic Health Record Data to Assess Childhood Obesity". PLoS One 8 (6): e66192. doi:10.1371/journal.pone.0066192. PMC PMC3688837. PMID 23823186. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3688837. 
  2. Brown, J.S.; Kahn, M.; Toh, S. (2013). "Data quality assessment for comparative effectiveness research in distributed data networks". Medical Care 51 (8 Suppl. 3): S22–9. doi:10.1097/MLR.0b013e31829b1e2c. PMC PMC4306391. PMID 23793049. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4306391. 
  3. 3.0 3.1 Kahn, M.G.; Raebel, M.A.; Glanz, J.M. et al. (2012). "A pragmatic framework for single-site and multisite data quality assessment in electronic health record-based clinical research". Medical Care 50 (Suppl.): S21–9. doi:10.1097/MLR.0b013e318257dd67. PMC PMC3833692. PMID 22692254. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3833692. 
  4. Arts, D.G.; De Keizer, N.F.; Scheffer, G.J. (2002). "Defining and improving data quality in medical registries: A literature review, case study, and generic framework". JAMIA 9 (6): 600–11. doi:10.1197/jamia.m1087. PMC PMC349377. PMID 12386111. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC349377. 
  5. Weiskopf, N.G.; Weng, C. (2013). "Methods and dimensions of electronic health record data quality assessment: Enabling reuse for clinical research". JAMIA 20 (1): 144–51. doi:10.1136/amiajnl-2011-000681. PMC PMC3555312. PMID 22733976. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3555312. 
  6. Holve, E.; Kahn, M.; Nahm, M. et al. (2013). "A comprehensive framework for data quality assessment in CER". AMIA Joint Summits on Translational Science Procedings 2013: 86–8. PMC PMC3845781. PMID 24303241. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3845781. 

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. Grammar was cleaned up for smoother reading. In some cases important information was missing from the references, and that information was added.