Journal:Transforming healthcare analytics with FHIR: A framework for standardizing and analyzing clinical data

From LIMSWiki
Revision as of 22:20, 8 August 2023 by Shawndouglas (talk | contribs) (Saving and adding more.)
Jump to navigationJump to search
Full article title Transforming healthcare analytics with FHIR: A framework for standardizing and analyzing clinical data
Journal Healthcare
Author(s) Ayaz, Muhammad; Pasha, Muhammad F.; Alahmadi, Tahani J.; Abdullah, Nik N.B.; Alkahtani, Hend K.
Author affiliation(s) Monash University, Princess Nourah bint Abdulrahman University
Primary contact Email: muhammad dot ayaz at monash dot edu
Year published 2023
Volume and issue 11(12)
Article # 1729
DOI 10.3390/healthcare11121729
ISSN 2227-9032
Distribution license Creative Commons Attribution 4.0 International
Website https://www.mdpi.com/2227-9032/11/12/1729
Download https://www.mdpi.com/2227-9032/11/12/1729/pdf?version=1686738253 (PDF)

Abstract

In this study, we discuss our contribution to building a data analytic framework that supports clinical statistics and analysis by leveraging a scalable standards-based data model named Fast Healthcare Interoperability Resources (FHIR). We developed an intelligent algorithm that is used to facilitate the clinical data analytics process on FHIR-based data. We designed several workflows for patient clinical data used in two hospital information systems (HIS), namely patient registration systems and laboratory information systems (LIS). These workflows exploit various FHIR application programming interfaces (API) to facilitate patient-centered and cohort-based interactive analyses. We developed a FHIR database implementation that utilizes FHIR APIs and a range of operations to facilitate descriptive data analytics (DDA) and patient cohort selection. A prototype user interface for DDA was developed with support for visualizing healthcare data analysis results in various forms. Healthcare professionals and researchers would use the developed framework to perform analytics on clinical data used in healthcare settings. Our experimental results demonstrate the proposed framework’s ability to generate various analytics from clinical data represented in the FHIR resources.

Keywords: data analytics, data analysis, FHIR, EMR, EHR

Background

To provide a comprehensive idea to readers about the applications of data analytics in the healthcare industry, this section introduces the data analytics concept employed in the healthcare sector. We also discuss the data analytics concept in the clinical data represented in the latest healthcare data standard, Fast Healthcare Interoperability Resources (FHIR).

Healthcare data analytics

Healthcare data analytics is the process of analyzing and interpreting large sets of healthcare data to gain insights and improve healthcare outcomes. It involves using a range of analytical techniques and tools to process data from various sources, such as electronic health records (EHRs), electronic medical records (EMRs), medical devices, claims data, patient-generated data, etc. The rapid advancements in hardware and software technologies in recent years have ushered in a new era of data collection and processing, resulting in remarkable progress in the field of healthcare data analytics. In the realm of healthcare organizations, clinical data serve a dual purpose. Firstly, it is utilized for the delivery of healthcare services to patients. Secondly, it is used for secondary purposes such as research, analysis, quality improvement, and more. In particular, the secondary use of clinical data has emerged as a critical component of healthcare data analytics. This has resulted in a paradigm shift in recent healthcare settings, where the secondary use of healthcare data is deemed just as important as its primary use.

EHR systems are leveraged to facilitate the secondary use of healthcare data, for activities such as quality improvement, safety measurement, payments, provider certification, marketing, and research. [1] Moreover, the secondary use of healthcare data has the potential to significantly enhance the healthcare experiences of individuals. It can facilitate the learning of diseases and their effective treatments, deepen people’s knowledge and understanding of the effectiveness and efficiency of healthcare systems, and aid in supporting public health initiatives. [1] However, the secondary use of healthcare data also raises complex ethical, social, and technical issues; for example, questions regarding data ownership and access privileges continue to challenge the field. [2]

The healthcare industry has witnessed a remarkable surge in the volume of healthcare data in recent times, primarily driven by the widespread adoption of EHR systems worldwide. [3] In addition, there has been an unprecedented growth in other types of healthcare data, such as genome sequencing and other biological structures. [4] The analysis of this clinical data is commonly referred to as analytics or healthcare data analytics, which falls under the category of secondary use of clinical data. While the term "data analytics" is extensively used in and outside of healthcare [3], our focus in this study is on its application in the healthcare industry.

Analytics has been deployed across various domains, including healthcare. However, experts from different fields offer diverse definitions of analytics. Nonetheless, the ultimate objective of analytics, as perceived by all experts, remains consistent. Data analytics experts characterize analytics as “the comprehensive exploitation of data, statistical and quantitative analysis, explanatory and predictive models, fact-based management to drive decisions, actions, and much more.” [5] Similarly, IBM defines analytics as “the methodical use of data and associated business insights developed through applied analytical disciplines (e.g., statistical, predictive, contextual, quantitative, cognitive, and other models) to drive evidence-based decision making for planning, management, measurement, and learning. Analytics can be descriptive, predictive, or prescriptive.” [6]

Moreover, the two eminent healthcare data analytics experts, Adams and Klein, outline three distinct levels and applications of analytics in the healthcare domain [7]. Each level is associated with increasing functionality and value:

  1. Descriptive: This level refers to standard reporting types that depict current situations and problems.
  2. Predictive: This level refers to simulation and modeling techniques that forecast trends and anticipate the outcomes of implemented actions.
  3. Prescriptive: This level concerns financial, clinical optimization, and other outcomes.

All three levels of healthcare data analytics are of paramount importance. However, predictive analytics has gained more attention in the current healthcare landscape [3], as medical experts seek to predict various clinical-related variables in healthcare data to enhance healthcare delivery services and optimize health and financial outcomes.

With the advent of digital medical records, hospitals and other healthcare organizations are accumulating vast amounts of data at an unprecedented rate. The clinical data captured by these organizations take multifarious forms, ranging from structured data (such as laboratory results and images) to unstructured data (such as textual notes comprising clinical narratives, reports, and various other documents). For example, the well-known US healthcare company Kaiser-Permanente has a current data store for over nine million members that surpasses a staggering 30 petabytes of data. [8] Another notable example is the American Society for Clinical Oncology (ASCO), which is developing its Cancer Learning Intelligence Network for Quality (CancerLinQ). [9] The clinical data accumulated by CancerLinQ serve myriad healthcare data analytics purposes, providing clinicians and researchers with an extensive platform for EHR data collection, data mining, and visualization, as well as the application of clinical decision support, among others.

The ultimate goal of healthcare data analytics is to use data to make informed decisions and identify patterns and trends that can help improve patient outcomes, optimize operational efficiency, and reduce costs. By analyzing data, healthcare providers can identify areas for improvement, predict health outcomes, and personalize care for individual patients.

Some common applications of healthcare data analytics include population health management, clinical decision support, disease surveillance and monitoring, and quality improvement initiatives. The field of healthcare data analytics is constantly evolving as new technologies and approaches emerge, and it is a critical area of focus for healthcare organizations looking to improve their performance and deliver better care to patients.

To summarize, data analytics has become a pivotal aspect of current healthcare settings, a core requirement for both the industry and its experts. [4] Moreover, the future of healthcare holds tremendous promise when it comes to data analytics. With the burgeoning volume of clinical and research data, coupled with the methods employed to analyze and put it to use, there is tremendous potential for improving healthcare delivery, personal health, and biomedical research. However, there is also a continuing need to improve the quality of clinical data and conduct research aimed at demonstrating how best to apply data analytics to address healthcare challenges.

Healthcare data analytics using the FHIR data standard

FHIR is the latest healthcare data standard that is gaining popularity in the healthcare sector. [10] FHIR provides a standardized way to represent and exchange healthcare information electronically. [11] This avant-garde standard has captured the imagination of healthcare providers due to its unparalleled ability to reduce the costs of interoperability and its potential to catalyze a new ecosystem of third-party applications. [12] FHIR’s revolutionary interoperability capabilities have surpassed the antiquated data standards of yore, such as Health Level 7 (HL7; v2, v3, CDA).

In a recent survey conducted by Australian and New Zealand healthcare executives, the adoption of FHIR was found to increase interoperability from a measly 11% to a staggering 66%. [13] Consequently, its adaptable nature for data exchange is increasing at a rapid pace within the healthcare industry as it garners favor among stakeholders for data exchange. The survey further revealed that 55% of healthcare providers are willing to make the shift to a FHIR-based interoperability platform. Additionally, it is estimated that FHIR will be widespread in the world healthcare industry by 2024. [14] This showed the popularity of FHIR-based interoperability in the healthcare industry and healthcare providers’ interest in its adaptability.

However, the healthcare industry’s needs go beyond mere clinical data exchange. Clinical data need to be processed for other purposes, such as data analysis, data analytics, research, and so forth. Thus, the clinical data represented in the FHIR standard need to fulfill these requirements. FHIR’s adoption is expected to increase data availability for analytics and solve the data exchange and analytics problems faced by the healthcare industry. [13] Nevertheless, the adoption of FHIR in the analytics domain remains relatively low, as the standard is still young. [15] Moreover, the tools supporting FHIR data analytics are still relatively immature. [16] However, the healthcare providers argue that they are not only interested in sharing clinical data across healthcare organizations to improve data interoperability but are more excited to process clinical data for other purposes, such as data analysis and research, to provide real-time medical services to patients. Therefore, the tools provided these services are essential in the healthcare industry.

On the flip side, the cutting-edge FHIR standard for patient clinical information presents plenty of new opportunities for visualizing, analyzing, and automating various types of healthcare data. With each passing day, fresh use cases for FHIR data analytics are building in the healthcare industry, such as real-time alerts for patient satisfaction, identifying patterns in patients’ medical records across datasets, real-time visibility into patient readmission rates, cost savings while upholding top-notch care quality, and countless more. [17,18,19,20] However, analyzing and implementing these use cases can prove challenging owing to the young stage and practicality of FHIR.

To facilitate data processing and exchange, FHIR employs REST APIs. Nonetheless, for the domain of FHIR data analytics, the FHIR APIs must possess a dynamic nature regarding data queries and processing. As data analytics are based on diverse types of data housed in varied FHIR resources, the FHIR APIs must query this data in various ways to enable effective data analysis. Additionally, FHIR has accelerated the swift delivery of a massive volume of new healthcare applications that can integrate with EHR or EMR data via the FHIR APIs. However, most of these applications are limited to perusing data relevant to a single patient. [15] One contributing factor, among many others, could be that the FHIR APIs are not optimally suited to queries that aggregate and categorize data across a vast clinical dataset.

A related and parallel trend within the realm of health information systems involves investing in higher-quality structured data via the coding of clinical records at the point of care. With the implementation of EMRs, healthcare providers are now able to incorporate a multitude of concepts into medical records using advanced terminologies, including ICD-10, LOINC, and SNOMED CT. [21,22] This affords the opportunity for more detailed analysis by enabling access to specific clinical concepts as well as the ability to query the ontology based on additional attributes and relationships to other clinical concepts.

While this technique is highly effective when analyzing clinical data based on specific codes or terminologies, it proves to be less fruitful in general concept analysis. Therefore, other scenarios, including modifications to FHIR APIs, must be considered to enable various ways of analyzing medical data for deep clinical data analysis. However, this technique is extremely challenging and requires an individual with extensive skill and experience to change the core implementation mechanisms of FHIR APIs.

Currently, the level of expertise required to make the best use of FHIR and other clinical terminology within a data analysis workflow is relatively rare in the healthcare domain. [23,24,25] The applications of data analytics and analysis in healthcare settings using the FHIR data standard are also a relatively new concept and have scarcely been applied. However, due to the rapid adoption of FHIR for medical data exchange, data analytics and analysis are now a core demand of the healthcare industry to process patient medical data in various ways and provide real-time medication to improve healthcare delivery. In summary, the standardization of healthcare data plays a crucial role in clinical and translational data analysis systems, especially when large-scale data are involved. Moreover, healthcare applications for clinical statistics and analysis can significantly enhance healthcare by connecting clinical data with analytic tools, thereby engaging practitioners or clinicians in the process of medical data analysis. [26,27]

In response to the pressing need to address the complex and multifaceted challenges of data analytics in the healthcare industry, this research study puts forth a cutting-edge and innovative FHIR standard-based data analytics framework. This platform is designed to tackle the healthcare industry’s data analytics issues and provide them with a scalable, standards-based data model. At present, this pioneering framework is tailored to work with workflows specifically designed for patient clinical data originating from two distinct hospital information systems: patient registration systems and laboratory information systems (LIS). Other possible data analysis workflows and customized research scenarios on the patient data from other HIS could be performed on FHIR-based data but are not currently directly supported by our framework without any modification.

The developed framework utilizes a FHIR database as its dataset, with FHIR RESTful APIs that query different types of FHIR resources from the database algorithmically. The mapping algorithm and analytic engine then process the retrieved data and generate various data analytics from patient clinical data, presenting the results to end-users via a user-friendly interface.

In short, this research study provides a state-of-the-art solution for healthcare data analytics, offering healthcare professionals an innovative platform to conduct data analysis on clinical data using FHIR. With the FHIR Data Analytics Framework, healthcare professionals can now extract meaningful insights from patient data and leverage these insights to enhance patient care delivery, promote better health outcomes, and drive healthcare industry advancements forward.

This research work has three main contributions: First, the entire framework and workflow design follow the FHIR data standard, which could be reused for any other clinical data domains and could provide support for any clinical data that follow the FHIR standard. Second, the data analysis workflow and tools incorporate the experience of clinical researchers and statisticians, which could provide a starting point for FHIR researchers in this cutting-edge standard. Third, the intelligent mapping algorithm is artfully designed to facilitate the sublime process of data analytics or data analysis within the realm of FHIR-based data. The mapping algorithm could be reused for any other clinical data that follow the FHIR specification and need to process the FHIR-based data for other purposes, such as research, developing an artificial intelligence (AI) model or machine learning (ML) model, etc.

The FHIR Data Analytics Framework comprises six layers: the FHIR database, the FHIR query engine layer, the mapping algorithm/agent layer, the FHIR-compliant database layer, the analytics engine layer, and the user interface. The rest of this manuscript is structured accordingly. The next section provides a comprehensive literature review, followed by a discussion of the five major materials used in this study. Then, the framework’s architecture is described in detail, followed by the implementation details, an explanation of the experiment setup, and the results. We close by describing the limitations of this approach, as well as a discussion, future plans, and finally a conclusion.

Literature review

Throughout the years, financial and administrative data were deemed essential attributes for planning purposes. However, in recent times, comprehensive healthcare data have become crucial to institutional strategic planning and self-analysis. [28] The healthcare industry heavily relies on various data sources, such as EHR analysis (EHRA), biomedical image analysis (BIA), sensor data analysis (SDA), biomedical signal analysis (BSA), genomic data analysis (GDA), clinical text mining (CTM), and other analytics methods to process and analyze clinical data. [29] Analyzing and performing data analytics on clinical data in the healthcare settings is a fundamental requirement in the healthcare industry. Despite this, the literature scarcely acknowledges the use of data analytics in the healthcare industry.

In our thorough literature review, we noticed some efforts that utilized various clinical data sources in the data analytics domain. For example, the Observational Health Data Sciences and Informatics (OHDSI) program has generated an enormous volume of work in the field of health data analytics, including the creation of the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM). [30] The OMOP provides a target data model for health data analytics, along with analytic routines and common vocabularies that could be run over the common data model.

Furthermore, the OMOP has a rich ecosystem of applications that have been developed to assist in its implementation and use, such as the ATLAS user interface designed by the OHDSI community [31] to facilitate analytic queries over the OMOP data model. Moreover, researchers have also explored the use of the openEHR model within health data analytics, as exemplified by the work of Chunlan et al. [32] in developing the Archetype Query Language (AQL), which is a standard way of querying data from openEHR-based systems. [33] The AQL has been implemented in many EHRs and analytics software tools and provides important design features for this type of capability. [15]

However, while these attempts have been applied to EHR datasets, the application of such techniques to data represented in the FHIR standard is a relatively new and challenging concept. Therefore, the researchers are looking for new techniques with which they can apply data analytics to the clinical data represented in the FHIR-based standard. However, as aforementioned, FHIR is a young data standard [15], and limited research related to FHIR analytics has been reported. [34] A recent scientific literature review study reveals that only a few studies have been reported in the literature that discussed FHIR analytics. [16] Thus, the concept of FHIR data analytics is extremely new, and so far, the state of FHIR analytics is at an early stage. Therefore, applying data analytics or data analysis is challenging and an extremely new concept in this domain. However, some researchers have made some initial efforts in FHIR-based analytical circumstances, such as the prediction of sepsis based on the FHIR standard by Lakshman et al. [35] and the deployment of clinical predictive models via FHIR in Web Services explained by Khalilia et al. [36]

Furthermore, the use of FHIR to store and analyze medical data on a large scale has also been implemented and integrated into the Google Cloud and Microsoft Azure cloud platforms. [21,26] In addition, the tech company Startups has recognized the analytical capabilities of FHIR and utilized the doc.ai application to provide personalized medicine, automate the process of controlling audit files, and store data in a structured way. [37]

Moreover, FHIR was used to support clinical decisions and to build a distributed phenotyping analytics platform. [38] Kreuzthaler et al. discussed the use and benefits of standardized data in analytical approaches. [39] In addition, Franz et al. developed a monitoring system with the FHIR data standard. [40] Liu et al. [41] explained many ways to make bulk FHIR data available for analytic queries. The authors concluded that Apache Parquet [42] is the ideal tool for storing and querying FHIR data in the context of large-scale analytics using Apache Spark.

Grimes et al. [15] discussed the use of FHIR data analytics using the pathling concept. However, it works in a limited domain because some operations are not easily or even currently possible to achieve via the FHIR REST API specification, such as data aggregation, searching the data, etc. Therefore, it is extremely challenging to implement. Furthermore, Dunn et al. [43] explained genomic data analysis using FHIR in a cloud framework. However, it only applies to the analysis of genomic data using a cloud framework and would be challenging to apply to clinical data represented in FHIR and implement in traditional healthcare settings. Similarly, Gruendner et al. [44] described the FHIR data formatting for statistical analysis. However, this technique only generated the FHIR data but failed to provide any platform for clinical data analysis or data analytics using REST APIs. Therefore, it is extremely challenging to generalize the concept and provide a platform for medical software developers and researchers to perform any data analytics on the clinical data or use the resulting data for research purposes.

Moreover, the notable health information technology (HIT) services provider Cerner Corporation produces the Bunsen [45] library that encodes FHIR resources within Apache Spark [46] datasets. This work facilitates loading, transforming, and analyzing FHIR data. Cerner Corporation has also been involved with implementation of Structured Query Language (SQL) on the FHIR proposal [47], which is a projection of the FHIR data model onto the relational query model and SQL language. Additionally, Google also discussed and implemented a method for encoding FHIR data using the Buffers Protocol. [48] Furthermore, Google also developed many tools and techniques [49] for using FHIR with the BigQuery analytics platform, integrating with the FHIR Bulk Data API, and using FHIR data within cloud-based data processing and machine learning pipelines.

Despite the various initial attempts at data analytics on clinical data represented in the FHIR data standard, there has been no user-friendly data analytics framework or visualized tool to help healthcare users such as practitioners, providers, and patients perform various data analytics on patient clinical data. To address this gap, our research study developed a framework with a user-friendly interface that enables healthcare practitioners, providers, and patients to perform data analytics on the clinical data used in two HIS and represented in the FHIR-based standard.

Materials

In this section, we are discussing various materials that will help us develop our framework. This information is helpful for the readers to know about the challenges and framework pre-development procedures involved in this undertaking.

Required outcomes

Our ambition was to develop a data analytics framework that could perform various types of analytics on clinical data typically used in healthcare facilities and represented in the FHIR standard. Our esteemed endeavor has borne fruit, and we have proudly fashioned a data analytics framework for healthcare environments, performing an array of analytical procedures on clinical data and elegantly visualizing the resulting insights.

User research and inputs

As previously discussed, support for FHIR and related data analytics is still in its infancy. Moreover, the clinical data flow in healthcare settings and the data analytics concept on clinical data represented in the FHIR format remain unclear at this stage. Particularly for individuals outside of the medical field, comprehending this concept can prove to be a challenging task. Therefore, to better grasp the FHIR data analytics concept and its workflow in the healthcare environment, we decided to take input from various professionals working in healthcare settings. We conducted numerous interviews with doctors, practitioners, patients, pharmacists, and others in the healthcare industry to obtain a more comprehensive understanding of workflow and user requirements. These interviews consisted of both open-ended and closed questions related to the current challenges within the healthcare data analytics domain. Furthermore, we sought to understand the data analytics needs of various stakeholders, including patients, practitioners, and healthcare providers, regarding the healthcare industry.

This process helped us validate our assumptions about adopting FHIR data analytics in the healthcare industry and provided insight into the views of users (practitioners, patients, providers, etc.) regarding the adoption of FHIR data analytics, as well as their opinions on workflow with this new technique in this domain. It also identified a range of use-case scenarios for various analytics that we could implement in this prototype. Based on what we learned from this process, we selected the following two parameters (use cases) to serve as the focal point of our work:

  • Patient cohort selection: This involves the selection and retrieval of patient information/records based on complex inclusion and exclusion criteria.
  • Data preparation: This includes processing and reshaping data in preparation for use with statistical models or tools.

Challenges

FHIR has a highly nested, complex, and graph-like data format that represents clinical data in a resource structure in JSON/XML format. With a hierarchical tree structure, the data elements are nested, making it difficult to represent within traditional relational data models, especially when simplifying query logic is a primary goal. Representing the data in a traditional relational data model is essential for data analytics and analysis. However, the graphical FHIR resource structure poses a significant challenge, and optimizing the data structure for analytics and analysis queries across a wide range of use cases also raises performance issues.

To handle these challenges, we have developed a cutting-edge mapping algorithm/agent to convert FHIR resource data into a sample EMR format and store it in a relational data model before conducting any data analytics. Our mapping algorithm was used to transform the clinical data stored in FHIR resources into a relational data model.

The clinical data analysis workflow design

Performing data analytics on the data present in the dataset is challenging, as it relies totally on workflows (business use-case scenarios). The design of such workflows is quite difficult, particularly for non-medical experts, because they have issues identifying various parameters for the clinical data used in the healthcare settings, which include user requirements, data constraints, and more. Therefore, we had discussions with medical experts and, on the basis of their inputs and our common clinical data analysis requirements, we designed two general analysis workflows: patient-centered data analysis and cohort-based data analysis. Furthermore, we elaborated on the workflows and designed five primary workflows that are used in healthcare settings on the patient data and are suitable for performing data analytics on our dataset (see Table 1).

Table 1. List of data analysis workflows (business use-cases) for patient data in the healthcare setting.
Number Description
1 Investigate registered patients in healthcare settings
2 Investigate registered patients in healthcare settings within a specified timeframe
3 Investigate patients having various types of allergies
4 Investigate various types of tests ordered by a physician, organization, etc.
5 Investigate various types of tests ordered by a physician, organization, etc., within a specified timeframe

The patient-centered data analysis workflow facilitates the browsing of various pieces of information focused on the individual. Patient-specific data derived from multiple sources are integrated into a single identifier. In the FHIR data model, the patient is an independent resource, while other resources such as observation and practitioner have a property “subject” that links them to a specific patient object, representing patient-centered relationships.

The cohort-based analysis workflow refers to more common data analysis needs in clinical statistics and studies. In this workflow, the Condition/AllergyIntolerance/Observation/Practitioner of a cohort is largely measured by the distribution of patient characteristics in different dimensions. The workflow is designed to support a wide range of clinical data analysis tasks, including patient registration analysis, patient allergy timeline analysis, patient laboratory test analysis, cohort gender/age distribution statistics, and more. Overall, our workflows provide a robust framework for performing data analytics on healthcare datasets.

FHIR REST API's working mechanism

References

Notes

This presentation is faithful to the original, with only a few minor changes to presentation, though grammar and word usage was substantially updated for improved readability. In some cases important information was missing from the references, and that information was added.