Difference between revisions of "Journal:Establishing reliable research data management by integrating measurement devices utilizing intelligent digital twins"

From LIMSWiki
Jump to navigationJump to search
(Created stub. Saving and adding more.)
 
(Saving and adding more.)
Line 31: Line 31:


==Introduction==
==Introduction==
Initiated through the ongoing efforts of digitization, one of the new fields of activity within [[research]] concerns the [[Information management|management of research data]]. New technologies and the related increase in computing power can now generate large amounts of data, providing new paths to scientific knowledge. [1] Research is increasingly adopting toolsets and techniques raised by Industry 4.0 while gearing itself up for Research 4.0. [2] The requirement for reliable research data management (RDM) can be managed by [[Journal:The FAIR Guiding Principles for scientific data management and stewardship|FAIR]] data management principles, which indicate that data must be findable, accessible, interoperable, and reusable through the entire data lifecycle in order to provide value to researchers. [3] In practice, however, implementation often fails due to the high heterogeneity of hardware and software, as well as outdated or decentralized data [[backup]] mechanisms. [4,5] This experience can be confirmed by the work at the Center for Mass Spectrometry and Optical Spectroscopy (CeMOS), a research institute at the Mannheim University of Applied Sciences which employs approximately 80 interdisciplinary scientific staff. In the various fields within the institute’s research landscape—including medical technology, [[biotechnology]], [[artificial intelligence]] (AI), and digital transformation, a wide variety of hardware and software is required to collect and process the data that are generated, which in initial efforts is posing a significant challenge for achieving holistic [[data integration]].


To cater to the respective disciplines, researchers of the institute develop experimental equipment such as middle infrared (MIR) scanners for the rapid detection and [[imaging]] of biochemical substances in medical tissue sections, multimodal imaging systems generating hyperspectral images of tissue slices, or photometrical measurement devices for detection of particle concentration. Nevertheless, they also use non-customizable equipment such as [[Mass spectrometry|mass spectrometers]], [[microscope]]s, and cell imagers for their experiments. These appliances provide great benefits for further development within the respective research disciplines, which is why the data are of immense value and must be brought together accordingly in a reliable RDM system.


Research practice shows that the step into the digital world seems to be associated with obstacles. As an innovative technology, the [[digital twin]] (DT) can be seen as a secure data source, as it mirrors a physical device (also called a physical twin or PT) into the digital world through a bilateral communication stream. [6] DTs are key actors for the implementation of Industry 4.0 prospects. [7] Consequently, additional reconfigurability of hardware and software of the digitally imaged devices becomes a reality. The data mapped by the DT thus enable the bridge to the digital world and hence to the digital use and management of the data. [1] Depending on the domain and use case, industry and research are creating new types of standardization-independent DTs. In most cases, only a certain part of the twin’s life cycle is reflected. Only when utilized over the entire life cycle of the physical entity does the DT becomes a powerful tool of digitization. [8,9] With the development of semantic modeling, hardware, and communication technology, there are more degrees of freedom to leverage the semantic representation of DTs, improving their usability. [10] For the internal interconnection in particular, the referencing of knowledge correlations distinguishes intelligent DTs. [11] The analysis of relevant literature reveals a research gap in the combination of both approaches (RDM and DTs), which the authors intend to address with this work.
In this paper, a centralized solution-based approach for data processing and storage is chosen, which is in contrast to the decentralized practice in RDM. Common problems of data management include having many locally, decentrally distributed research data; missing access authorizations; and missing experimental references, which is why the results become unusable over long periods of time. The resulting replication of data is followed by inconsistencies and interoperability issues. [12] Furthermore, these circumstances were also determined by empirical surveys at the authors’ institute. Therefore, a holistic infrastructure for data management is introduced, starting with the collection of the measurement series of the physical devices, up to the final reliable reusability of the data. Relevant requirements for a sustainable RDM leveraged by intelligent DTs are elaborated based on the related work. By enhancing with DT paradigms, the efficiency of a reliable RDM can be further extended. This forms the basis for an architectural concept for reliable data integration into the infrastructure with the DTs of the fully mapped physical devices.
Due to the broad spectrum and interdisciplinarity of the institution, myriad data of different origins, forms, and quantities are created. The generic concept of DT allows evaluation units to be created agnostically from their specific use cases. Not only do the physical measuring devices and apparatuses benefit in the form of flexible reconfiguration through the possibilities of providing their virtual representation with intelligent functions, but also directly through the great variety of harmonized data structures and interfaces made possible by DTs. The bidirectional communication stream between the twins enables the physical devices to be directly influenced. Accordingly, parameterization of the physical device takes place dynamically using the DT, instead of statically using firmware as is usually the case. In addition, due to the real-time data transmission and the seamless integration of the DT, an immediate and reliable response to outliers is possible. Both data management and DTs as disruptive technology are mutual enablers in terms of their realization. [1] Therefore, the designed infrastructure is based on the interacting functionality of both technologies to leverage their synergies providing sustainable and reliable data management. In order to substantiate the feasibility and practicability, a demo implementation of a measuring device within the realized infrastructure is carried out using a photometrical measuring device developed at the institute. This also forms the basis for the proof of concept and the evaluation of the overall system.
As main contributions, the paper (1) presents a new type of approach for dealing with large amounts of research data according to FAIR principles; (2) identifies the need for the use of DTs to break down barriers for the digital transformation in research institutes in order to arm them for Research 4.0; (3) elaborates a high-level knowledge graph that addresses the pending issues of interoperability and meta-representation of experimental data and associated devices; (4) devises an implementation variant for reactive DTs as a basis for later proactive realizations going beyond DTs as pure, passive state representations; and (5) works out a design approach that is highly reconfigurable, using the example of a photometer, which opens up completely new possibilities with less development effort in hardware and software engineering by using the DT rather than the physical device itself.
This paper is organized as follows. The next section points out the state of the art and the related work in terms of RDM and DTs. Both subsections derive architectural requirements, which serve to evolve an architecture for sustainable and reliable RDM. Next, specifically picked use cases of the authors’ institute are outlined, followed by their implementation and subsequent proof of concept and evaluation. Finally, after a discussion that relates the predefined requirements with each other and the implemented infrastructure, the work will be concluded and future challenges will be prospected.
==Related work==





Revision as of 16:16, 12 September 2023

Full article title Establishing reliable research data management by integrating measurement devices utilizing intelligent digital twins
Journal Sensors
Author(s) Lehmann, Joel; Schorz, Stefan; Rache, Alessa; Häußermann, Tim; Rädle, Matthias; Reichwald, Julian
Author affiliation(s) Mannheim University of Applied Sciences
Primary contact Email: j dot lehmann at hs dash mannheim dot de
Year published 2023
Volume and issue 23(1)
Article # 468
DOI 10.3390/s23010468
ISSN 1424-8220
Distribution license Creative Commons Attribution 4.0 International
Website https://www.mdpi.com/1424-8220/23/1/468
Download https://www.mdpi.com/1424-8220/23/1/468/pdf (PDF)

Abstract

One of the main topics within research activities is the management of research data. Large amounts of data acquired by heterogeneous scientific devices, sensor systems, measuring equipment, and experimental setups have to be processed and ideally managed by FAIR (findable, accessible, interoperable, and reusable) data management approaches in order to preserve their intrinsic value to researchers throughout the entire data lifecycle. The symbiosis of heterogeneous measuring devices, FAIR principles, and digital twin technologies is considered to be ideally suited to realize the foundation of reliable, sustainable, and open research data management. This paper contributes a novel architectural approach for gathering and managing research data aligned with the FAIR principles. A reference implementation as well as a subsequent proof of concept is given, leveraging the utilization of digital twins to overcome common data management issues at equipment-intense research institutes. To facilitate implementation, a top-level knowledge graph has been developed to convey metadata from research devices along with the produced data. In addition, a reactive digital twin implementation of a specific measurement device was devised to facilitate reconfigurability and minimized design effort.

Keywords: cyber–physical system, sensor data, research data management, FAIR, digital twin, research 4.0, knowledge graph, ontology

Introduction

Initiated through the ongoing efforts of digitization, one of the new fields of activity within research concerns the management of research data. New technologies and the related increase in computing power can now generate large amounts of data, providing new paths to scientific knowledge. [1] Research is increasingly adopting toolsets and techniques raised by Industry 4.0 while gearing itself up for Research 4.0. [2] The requirement for reliable research data management (RDM) can be managed by FAIR data management principles, which indicate that data must be findable, accessible, interoperable, and reusable through the entire data lifecycle in order to provide value to researchers. [3] In practice, however, implementation often fails due to the high heterogeneity of hardware and software, as well as outdated or decentralized data backup mechanisms. [4,5] This experience can be confirmed by the work at the Center for Mass Spectrometry and Optical Spectroscopy (CeMOS), a research institute at the Mannheim University of Applied Sciences which employs approximately 80 interdisciplinary scientific staff. In the various fields within the institute’s research landscape—including medical technology, biotechnology, artificial intelligence (AI), and digital transformation, a wide variety of hardware and software is required to collect and process the data that are generated, which in initial efforts is posing a significant challenge for achieving holistic data integration.

To cater to the respective disciplines, researchers of the institute develop experimental equipment such as middle infrared (MIR) scanners for the rapid detection and imaging of biochemical substances in medical tissue sections, multimodal imaging systems generating hyperspectral images of tissue slices, or photometrical measurement devices for detection of particle concentration. Nevertheless, they also use non-customizable equipment such as mass spectrometers, microscopes, and cell imagers for their experiments. These appliances provide great benefits for further development within the respective research disciplines, which is why the data are of immense value and must be brought together accordingly in a reliable RDM system.

Research practice shows that the step into the digital world seems to be associated with obstacles. As an innovative technology, the digital twin (DT) can be seen as a secure data source, as it mirrors a physical device (also called a physical twin or PT) into the digital world through a bilateral communication stream. [6] DTs are key actors for the implementation of Industry 4.0 prospects. [7] Consequently, additional reconfigurability of hardware and software of the digitally imaged devices becomes a reality. The data mapped by the DT thus enable the bridge to the digital world and hence to the digital use and management of the data. [1] Depending on the domain and use case, industry and research are creating new types of standardization-independent DTs. In most cases, only a certain part of the twin’s life cycle is reflected. Only when utilized over the entire life cycle of the physical entity does the DT becomes a powerful tool of digitization. [8,9] With the development of semantic modeling, hardware, and communication technology, there are more degrees of freedom to leverage the semantic representation of DTs, improving their usability. [10] For the internal interconnection in particular, the referencing of knowledge correlations distinguishes intelligent DTs. [11] The analysis of relevant literature reveals a research gap in the combination of both approaches (RDM and DTs), which the authors intend to address with this work.

In this paper, a centralized solution-based approach for data processing and storage is chosen, which is in contrast to the decentralized practice in RDM. Common problems of data management include having many locally, decentrally distributed research data; missing access authorizations; and missing experimental references, which is why the results become unusable over long periods of time. The resulting replication of data is followed by inconsistencies and interoperability issues. [12] Furthermore, these circumstances were also determined by empirical surveys at the authors’ institute. Therefore, a holistic infrastructure for data management is introduced, starting with the collection of the measurement series of the physical devices, up to the final reliable reusability of the data. Relevant requirements for a sustainable RDM leveraged by intelligent DTs are elaborated based on the related work. By enhancing with DT paradigms, the efficiency of a reliable RDM can be further extended. This forms the basis for an architectural concept for reliable data integration into the infrastructure with the DTs of the fully mapped physical devices.

Due to the broad spectrum and interdisciplinarity of the institution, myriad data of different origins, forms, and quantities are created. The generic concept of DT allows evaluation units to be created agnostically from their specific use cases. Not only do the physical measuring devices and apparatuses benefit in the form of flexible reconfiguration through the possibilities of providing their virtual representation with intelligent functions, but also directly through the great variety of harmonized data structures and interfaces made possible by DTs. The bidirectional communication stream between the twins enables the physical devices to be directly influenced. Accordingly, parameterization of the physical device takes place dynamically using the DT, instead of statically using firmware as is usually the case. In addition, due to the real-time data transmission and the seamless integration of the DT, an immediate and reliable response to outliers is possible. Both data management and DTs as disruptive technology are mutual enablers in terms of their realization. [1] Therefore, the designed infrastructure is based on the interacting functionality of both technologies to leverage their synergies providing sustainable and reliable data management. In order to substantiate the feasibility and practicability, a demo implementation of a measuring device within the realized infrastructure is carried out using a photometrical measuring device developed at the institute. This also forms the basis for the proof of concept and the evaluation of the overall system.

As main contributions, the paper (1) presents a new type of approach for dealing with large amounts of research data according to FAIR principles; (2) identifies the need for the use of DTs to break down barriers for the digital transformation in research institutes in order to arm them for Research 4.0; (3) elaborates a high-level knowledge graph that addresses the pending issues of interoperability and meta-representation of experimental data and associated devices; (4) devises an implementation variant for reactive DTs as a basis for later proactive realizations going beyond DTs as pure, passive state representations; and (5) works out a design approach that is highly reconfigurable, using the example of a photometer, which opens up completely new possibilities with less development effort in hardware and software engineering by using the DT rather than the physical device itself.

This paper is organized as follows. The next section points out the state of the art and the related work in terms of RDM and DTs. Both subsections derive architectural requirements, which serve to evolve an architecture for sustainable and reliable RDM. Next, specifically picked use cases of the authors’ institute are outlined, followed by their implementation and subsequent proof of concept and evaluation. Finally, after a discussion that relates the predefined requirements with each other and the implemented infrastructure, the work will be concluded and future challenges will be prospected.

Related work

References

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added.