Journal:Improving data quality in clinical research informatics tools

Full article title	Improving data quality in clinical research informatics tools
Journal	Frontiers in Big Data
Author(s)	AbuHalimeh, Ahmed
Author affiliation(s)	University of Arkansas at Little Rock
Primary contact	Email: aaabuhalime at ualr dot edu
Editors	Ehrlinger, Lisa
Year published	2022
Volume and issue	5
Article #	871897
DOI	10.3389/fdata.2022.871897
ISSN	2624-909X
Distribution license	Creative Commons Attribution 4.0 International
Website	https://www.frontiersin.org/articles/10.3389/fdata.2022.871897/full
Download	https://www.frontiersin.org/articles/10.3389/fdata.2022.871897/pdf (PDF)

This article contains rendered mathematical formulae. You may require the TeX All the Things plugin for Chrome or the Native MathML add-on and fonts for Firefox if they don't render properly for you.

This article should be considered a work in progress and incomplete. Consider this article incomplete until this notice is removed.

Abstract

Maintaining data quality is a fundamental requirement for any successful and long-term data management project. Providing high-quality, reliable, and statistically sound data is a primary goal for clinical research informatics. In addition, effective data governance and management are essential to ensuring accurate data counts, reports, and validation. As a crucial step of the clinical research process, it is important to establish and maintain organization-wide standards for data quality management to ensure consistency across all systems designed primarily for cohort identification, allowing users to perform an enterprise-wide search on a clinical research data repository to determine the existence of a set of patients meeting certain inclusion or exclusion criteria. Some of the clinical research tools are referred to as de-identified data tools.

Assessing and improving the quality of data used by clinical research informatics tools are both important and difficult tasks. For an increasing number of users who rely on information as one of their most important assets, enforcing high data quality levels represents a strategic investment to preserve the value of the data. In clinical research informatics, better data quality translates into better research results and better patient care. However, achieving high-quality data standards is a major task because of the variety of ways that errors might be introduced in a system and the difficulty of correcting them systematically. Problems with data quality tend to fall into two categories. The first category is related to inconsistency among data resources such as format, syntax, and semantic inconsistencies. The second category is related to poor extract, transform, and load (ETL) and data mapping processes.

In this paper, we describe a real-life case study on assessing and improving the data quality within a healthcare organization. This paper compares between the results obtained from two de-identified data systems—TranSMART Foundation's i2b2 and Epic's SlicerDicer—and discuss the data quality dimensions specific to the clinical research informatics context, and the possible data quality issues between the de-identified systems. This work closes by proposing steps or rules for maintaining data quality among different systems to help data managers, information systems teams, and informaticists at any healthcare organization to monitor and sustain data quality as part of their business intelligence, data governance, and data democratization processes.

Keywords: clinical research data, data quality, research informatics, informatics, management of clinical data

Introduction

References

Notes

This presentation is faithful to the original, with only a few minor changes to presentation, grammar, and punctuation. In some cases important information was missing from the references, and that information was added. Numerous links that were originally posted inline in the text were turned into full citations for this version, adding significantly to the total citation count.