Difference between revisions of "Journal:Registered data-centered lab management system based on data ownership security architecture"
Shawndouglas (talk | contribs) (Saving and adding more.) |
Shawndouglas (talk | contribs) (Saving and adding more.) |
||
Line 354: | Line 354: | ||
|} | |} | ||
|} | |} | ||
==Experiments== | ==Experiments== | ||
Line 363: | Line 361: | ||
Table 4 shows the details of the experiment environment. The algorithm suits the corresponding experimental data class, which can consider both efficiency and security. | Table 4 shows the details of the experiment environment. The algorithm suits the corresponding experimental data class, which can consider both efficiency and security. | ||
{| | |||
| style="vertical-align:top;" | | |||
{| class="wikitable" border="1" cellpadding="5" cellspacing="0" width="100%" | |||
|- | |||
| colspan="1" style="background-color:white; padding-left:10px; padding-right:10px;" |'''Table 4.''' Details of the experiment environment. | |||
|- | |||
| style="background-color:white; padding-left:10px; padding-right:10px;" | | |||
''Environment 1 HOST 1''<br /> | |||
Processor: Intel(R) Core (TM) i5-8300H CPU @ 2.30 GHz<br /> | |||
Memory: 8.0 GB<br /> | |||
Main hard disk: NVMe WDC PC SN520 SDA<br /> | |||
Operating system: Windows 11 ×64 Professional Edition Insider Preview<br /> | |||
Programming software: Microsoft Visual Studio Code<br /> | |||
Test work data: lab data xlsx | |||
|- | |||
| style="background-color:white; padding-left:10px; padding-right:10px;" | | |||
''Environment 2 HOST 2''<br /> | |||
Processor: AMD Ryzen 5 3550H with Radeon Vega Mobile Gfx 2.10 GHz<br /> | |||
Memory: 8.0 GB<br /> | |||
Main hard disk: SAMSUNG MZVLB512HAJQ-00000<br /> | |||
Operating system: Windows 10 Home<br /> | |||
Programming software: Pycharm Community Edition<br /> | |||
Test work data: lab data xlsx | |||
|- | |||
|} | |||
|} | |||
===Simulation key generation and transmission experiment=== | |||
The DAC module includes the right confirmation and authorization. The right confirmation determines the lab data ownership. The ownership of the experimental data belongs to the experimental team designer. After storing data in the DRC, the lab data product adds the research group public key to determine the ownership. | |||
During lab data transactions, the data user wants to purchase the experimental data product. After payment, the data user needs to find the data owner’s public key in the DAC, then the user uses the owner’s public key to open the lab data product. This is an authorization ownership process. | |||
In this experiment, a key pair was generated in Chengdu, Sichuan Province, China. The public key was sent to Beijing, China (DAC). We carried out 12 experiments generating key pairs, and the tests were performed on Environment 1 (env1) and Environment 2 (env2), respectively. Figure 9 shows a speed test of the time required to generate a key pair. The results of this experiment show that using the AES algorithm is the fastest (1.232 × 10−5 s/1.605 × 10−5 s) in two environments. Using SM4 (3.197 × 10−5 s/6.882 × 10−5 s) to create a key pair is the second fastest, which is close to AES. Meanwhile, the SM2 algorithm is the most time-consuming way. | |||
[[File:Fig10 Zheng Electronics23 12-8.png|1000px]] | |||
{{clear}} | |||
{| | |||
| style="vertical-align:top;" | | |||
{| border="0" cellpadding="5" cellspacing="0" width="1000px" | |||
|- | |||
| style="background-color:white; padding-left:10px; padding-right:10px;" |<blockquote>'''Figure 9.''' Time consumption for generating key pairs using SM2, SM4, AES, and RSA algorithms in Environment 1 (env1) '''(a)''' and Environment 2 (env2) '''(b)'''.</blockquote> | |||
|- | |||
|} | |||
|} | |||
Further, under two different operating system environments, the experiment simulated the speed of public key transmission from Chengdu to Beijing, and the results are shown in Figure 10. Compared with the key generation experiment, the speed of AES transmission is still the fastest, and the average speed is 1.327 s/1.400 s. In contrast, SM2 is the slowest, with an average speed of 1.485 s/1.571 s. | |||
[[File:Fig9 Zheng Electronics23 12-8.png|1000px]] | |||
{{clear}} | |||
{| | |||
| style="vertical-align:top;" | | |||
{| border="0" cellpadding="5" cellspacing="0" width="1000px" | |||
|- | |||
| style="background-color:white; padding-left:10px; padding-right:10px;" |<blockquote>'''Figure 10.''' Time consumption for simulating transmission public keys to DRC using SM2, SM4, AES, and RSA in Environment 1 (env1) '''(a)''' and Environment 2 (env2) '''(b)'''.</blockquote> | |||
|- | |||
|} | |||
|} | |||
===Confirmation and authorization experiments=== | |||
In the proposed system, we tried to test lab management digital records 50–300 KB, which were encrypted, respectively, using the SM2, SM4, RSA, and AES algorithms, then stored in the DRC of HOST 1 and 2. The encryption time of these data texts is shown in Figure 11. | |||
[[File:Fig11 Zheng Electronics23 12-8.png|1000px]] | |||
{{clear}} | |||
{| | |||
| style="vertical-align:top;" | | |||
{| border="0" cellpadding="5" cellspacing="0" width="1000px" | |||
|- | |||
| style="background-color:white; padding-left:10px; padding-right:10px;" |<blockquote>'''Figure 11.''' Time consumption of authorization encryption of different experimental samples using SM2, SM4, AES, and RSA in Environment 1 (env1) '''(a)''' and Environment 2 (env2) '''(b)'''.</blockquote> | |||
|- | |||
|} | |||
|} | |||
After the lab data user obtained the data owner’s public key from DAC, the lab data products were decrypted with four different key management methods, and the time taken is shown in Figure 12. | |||
[[File:Fig12 Zheng Electronics23 12-8.png|1000px]] | |||
{{clear}} | |||
{| | |||
| style="vertical-align:top;" | | |||
{| border="0" cellpadding="5" cellspacing="0" width="1000px" | |||
|- | |||
| style="background-color:white; padding-left:10px; padding-right:10px;" |<blockquote>'''Figure 12.''' Time consumption of authorized decryption of different experimental samples using SM2, SM4, AES, and RSA in Environment 1 (env1) '''(a)''' and Environment 2 (env2) '''(b)'''.</blockquote> | |||
|- | |||
|} | |||
|} | |||
This experiment was performed 16 times, with different sizes of experimental products as variables, under HOST 1 and 2. The following is the logic Algorithm 2 for this experiment: | |||
{| | |||
| style="vertical-align:top;" | | |||
{| class="wikitable" border="1" cellpadding="5" cellspacing="0" width="100%" | |||
|- | |||
| colspan="1" style="background-color:white; padding-left:10px; padding-right:10px;" |'''Algorithm 2.''' Calculate encryption and decryption time ( ) | |||
|- | |||
| style="background-color:white; padding-left:10px; padding-right:10px;" | | |||
'''Data''': Example excel file<br /> | |||
'''Result''': Time consumption for encrypt and decrypt<br /> | |||
<code>1. Data format conversion:<br /> | |||
2. excel file -> json -> string -> bytes<br /> | |||
3. '''if''' data.type == bytes '''then'''<br /> | |||
4. time.record<br /> | |||
5. algorithm.encrypt(data)<br /> | |||
6. time.record<br /> | |||
7. algorithm.decrypt(data)<br /> | |||
8. time. record<br /> | |||
9. '''end if'''<br /> | |||
10. calc(time.consumption)</code> | |||
|- | |||
|} | |||
|} | |||
As shown in Figure 11 and Figure 12, AES was the fastest in performance to encrypt and decrypt lab data in the DOSA system. In particular, there are obvious advantages in decrypting data using AES. In the prior Table 2, we compared the security performance of these four algorithms. AES is a symmetric encryption algorithm, which has the conventional requirement to secretly distribute the key before communication, and the private key must be transmitted to the receiver through the network; thus, the key is not easy to keep secret and manage. However, as we proposed an ownership framework, the private key does not need to be transmitted to the data owner, so this shortcoming can be ignored. In summary, combined with security and speed performance, for lab data of class 1 to 4, it is recommended to use the AES algorithm to encrypt registration to determine ownership using the public key, and to decrypt to authorize with the private key based on DOSA. | |||
However, also need to take high-security measures (class 5 lab data) into account, such as special major laboratory explosion data, or state secret experiment plans. For these security class 5 lab data, we need to pay more attention to safety performance. Combined with Table 2, SM2 shows better safety performance. So, in the current experiment, class 5 was applied to samples to compare the authorization and confirmation time efficiency using SM2. | |||
Host 1’s speed of encryption storage of a larger-volume data text (over 230 KB) is double that of Host 2. Additionally, the encryption storage speed of Host 1 for small and medium volume experimental data was three to four times faster than Host 2. However, it seems there is no difference in the data decryption time consumption between the two environments. | |||
As shown in Figure 13, it is recommended to choose a larger experimental sample for class 5 data encryption and decryption, which could save time. | |||
[[File:Fig13 Zheng Electronics23 12-8.png|1000px]] | |||
{{clear}} | |||
{| | |||
| style="vertical-align:top;" | | |||
{| border="0" cellpadding="5" cellspacing="0" width="1000px" | |||
|- | |||
| style="background-color:white; padding-left:10px; padding-right:10px;" |<blockquote>'''Figure 13.''' Time consumption of encryption '''(a)''' and decryption '''(b)''' of different experimental samples using SM2.</blockquote> | |||
|- | |||
|} | |||
|} | |||
===Evaluate results=== | |||
Combining the above experiments based on the proposed system, it is recommended to use the AES algorithm to encrypt the registration, authorization, and decryption of ordinary security-level data (class 1–4). For a high level of security experimental data (class 5), we recommend using the SM2 algorithm to encrypt larger experimental documents at one time, which can improve the efficiency. | |||
The advantage of DOSA is that it binds data and ownership together, protects data with key technologies, and enables secure sharing. This study designed the whole conditional sharing and transaction process of laboratory-related data based on DOSA. In this process, class 1–5 data are protected by encryption technology to ensure their security. Through the key algorithm and the Gmssl (open-source toolbox), the problem of data leakage can be solved in the data transaction process. Using Python to encrypt the laboratory data with the tools provided by DOSA, it can be considered safe to share relevant laboratory data cross-domain and internally and externally. | |||
==Conclusions== | |||
To solve the problem of isolated islands of laboratory data, we proposed a data-ownership-based security architecture for confirming lab data rights, and we used a secret key algorithm to implement internal and external laboratory data conditional sharing. A data owner inputs encrypted data into the DRC, while confirming sharing rights. When a data user needs statistics or supervision, they can apply for targeted data by searching the directory. According to the different security levels of lab data, the appropriate key algorithm is selected for ownership authorization, which can realize efficient conditional sharing. | |||
Laboratory information and data necessarily need to be conditionally shared for global and cross-border purposes. Often, these data are dispersed across various information systems, posing a challenge in sharing data for a safe and efficient lab data management system. Meanwhile, it is not easy to summarize different sources of real-time data, and it is also difficult to search data urgently among the relevant departments of the independent systems. Moreover, the security performance of the different systems is also inconsistent. As such, we developed our data architecture to manage laboratory-related data by allowing relevant units, universities, and laboratories to encrypt data with public keys and upload them to the data register center to form a data directory. Data users can search “Directory” or “data owner” names to apply to the data owner for data viewing or use. Data can be shared conditionally through public key confirmation and private key authorization to break lab data barriers. Then, we used experiments to evaluate which algorithm is better for the efficiency of this system. We suggest AES for ordinary experimental data. For higher levels of experimental data, we can use SM2 to process larger data at one time, which addresses both efficiency and security of data. From these experiments, we verified the feasibility and efficient security of DOSA to manage lab-related data. | |||
This ownership security architecture for laboratory-managed information and data can also be applied to other fields, such as smart city construction. In the future, we will focus on the prediction of laboratory accidents on this system. Our long-term work goal is to use this data architecture to process the artificial intelligence analysis of experimental management data. Through the entry of a large amount of data, under the premise of the authorization of the data user, we can use the registered massive data to make predictions, such as analyzing the probability of an experimental accident occurring. | |||
==Abbreviations, acronyms, and initialisms== | |||
* '''API''': application programming interface | |||
* '''DAC''': data authorization center | |||
* '''DAU''': data application unit | |||
* '''DEC''': data exception center | |||
* '''DRC''': data registration center | |||
* '''DOSA''': data ownership security architecture | |||
* '''ECC''': elliptic-curve cryptography | |||
* '''env1''': Environment 1 | |||
* '''env2''': Environment 2 | |||
* '''LIMS''': laboratory information management system | |||
==Acknowledgements== | |||
===Author contributions=== | |||
Conceptualization, X.Z. and F.M.; methodology, X.Z. and N.C.; software, X.Z.; validation, P.U., X.Z. and F.M.; formal analysis, X.Z. and P.U.; data curation, X.Z. and P.U.; writing—original draft preparation, X.Z.; writing—review and editing, X.Z., P.U. and F.M; visualization, X.Z.; supervision, F.M., N.C. and P.U.; project administration, X.Z.; funding acquisition, X.Z. All authors have read and agreed to the published version of the manuscript. | |||
===Funding=== | |||
This research received no external funding. | |||
===Conflicts of interest=== | |||
The authors declare no conflict of interest. | |||
==References== | ==References== |
Revision as of 20:47, 31 July 2023
Full article title | Registered data-centered lab management system based on data ownership security architecture |
---|---|
Journal | Electronics |
Author(s) | Zheng, Xuying; Miao, Fang; Udomwong, Piyachat; Chakpitak, Nopasit |
Author affiliation(s) | Chiangmai University, Chengdu University |
Primary contact | Email: nopasit at cmuic dot net |
Year published | 2023 |
Volume and issue | 12(8) |
Article # | 1817 |
DOI | 10.3390/electronics12081817 |
ISSN | 2079-9292 |
Distribution license | Creative Commons Attribution 4.0 International |
Website | https://www.mdpi.com/2079-9292/12/8/1817 |
Download | https://www.mdpi.com/2079-9292/12/8/1817/pdf?version=1681215589 (PDF) |
This article should be considered a work in progress and incomplete. Consider this article incomplete until this notice is removed. |
Abstract
University and college laboratories are important places to train professional and technical personnel. Various regulatory departments in colleges and universities still rely on traditional laboratory management in research projects, which are prone to problems such as untimely information and data transmission. The present study aimed to propose a new method to solve the problem of data islands, explicit ownership, conditional sharing, data security, and efficiency during laboratory data management. Hence, this study aimed to develop a data-centered lab management system that enhances the security of laboratory data management and allows the data owners of the labs to control data sharing with other users. The architecture ensures data privacy by binding data ownership with a person using a key management method. To achieve secure data flow, data ownership conversion through the process of authorization and confirmation was introduced. The designed lab management system enables laboratory regulatory departments to receive data in a secure form by using this platform, which could solve data sharing barriers. Finally, the proposed system was applied and run in different server environments by implementing data security registration, authorization, confirmation, and conditional sharing using SM2, SM4, RSA, and AES algorithms. The system was evaluated in terms of the execution time for several lab data with different sizes. The findings of this study indicate that the proposed strategy is secure and efficient for lab data sharing across domains.
Keywords: university laboratory management, data sharing, data ownership safety architecture, conditional sharing, security and efficiency
Introduction
The importance of knowledge for organizations is now widely recognized, being one of the resources whose management influences the success of organizations through the exchange and sharing of information, knowledge, and experience among its members. Numerous challenges are associated with the management of laboratory data, as labs generate a lot of experimental and management data on a daily basis. Frequent experimental accidents and the leakage of hazardous chemicals is worthy of attention. Supervision departments, such as the Education Bureau, Emergency Bureau, and Public Security Department, need various types of laboratory management data to report while supervising laboratory safety.
At present, existing laboratory management systems used by each university, regulatory department, and even various laboratories are different, which leads to obvious problems when lab data collection or emergency response is needed. For example, which laboratory management data are classified confidential, and which data can be handed over to relevant departments? Who must claim the ownership of laboratory-related data? Which departments can view or use the data? Could lab data be transferred to other departments? These questions raised laboratory data ownership issues. Secondly, some data that relate to laboratory management need to be kept confidential. Where and how can we best secure this data? Is it safe for different departments to transmit data? How can we guarantee that there will be no data transfer to others? These can be summarized as data security storage and transmission questions. In recent years, frequent university laboratory accidents have occurred; in such cases, emergency responding units and other regulatory departments need to collect real-time lab-related data. If it is still a regular report retrieval process, it cannot meet the requirements of time efficiency. So, we need to find efficient and trustworthy methods to solve this problem of obtaining data in a timely manner.
The main reason for the above problems is that laboratory management data ownership is unclear, and the existing laboratory information management systems (LIMS) of each branch are independent. It is impossible to collect real-time, tamper-free, statistically accurate data when it is needed. To deal with these problems, we intend to adopt data ownership security architecture (DOSA) to securely circulate laboratory management data and cultivate an ecological and sustainable forest from the “flowerpot” of each independent system. [1]
Laboratories are currently using LIMS frequently. [2] A university may also have multiple management systems, including an equipment management system, a chemical management system, a safety examination system, etc. Related systems comprise relevant departments of the laboratory, such as the experiment management department of the Education Bureau, the accident handling department of the Emergency Bureau, and the hazardous chemicals supervision department of Public Security. Such complicated systems make it difficult to obtain information accurately and quickly when the same report requires the cooperation of different departments or laboratories. Compared to traditional manual records, the LIMS used by a specific business unit or company can be easily outdated and has high maintenance costs. Once the database needs major updates related to its data source, the existing system cannot meet the demand, and a new system must be designed so that all data can be transferred, resulting in increased costs. The linkage between different systems is weak, the efficiency is relatively low, and the security is not uniform and difficult to guarantee when it is necessary to submit data.
Blockchain poses one potential solution. The medical industry uses the blockchain system to manage medical imaging data, and the banking system uses blockchain as a new type of financial technology. [3] Blockchain has high energy consumption, expensive development costs, but relatively high security. At present, some scholars have produced a framework diagram of the blockchain management in university education, but there is no corresponding technical means. [4]
Considering the problems mentioned above, the lab industry would require a technology that allows data to flow securely and efficiently. This article’s contributes discussion about potential technology to solve such problems. In our work, experiment-related data management is based on the data ownership security architecture. This security technology is built such that data ownership binds with a person and it can be safely registered on this platform; meanwhile, ownership of data can be determined, and data users and owners can conditionally share or trade using ownership conversion across domains and across borders efficiently and securely. Our proposed laboratory data management architecture is shown in Figure 1.
|
The current study has been organized as follows. The next section explains related theories and key management. After that, we present the core strategies used, including laboratory data encrypted registration and an authorized transfer method for conditional sharing. We then detail the whole lab management framework and the incident searching case study based on DOSA using the above ownership conversion method. We then discuss and verify the effectiveness of the data flow process using key management based on proposed core strategies. Finally, we provide our conclusions.
Related work
Data ownership security architecture
As data has a tendency to not be securely circulated and shared, Miao et al. proposed a data ownership security architecture (DOSA) that contains one body with two wings. [5] The body represents data binding ownership, while one wing is the key and the other wing signifies value.
To solve the problem of data transfer barriers, DOSA proposes a global, cross-border, comprehensive, collaborative, secure, and efficient data architecture system. The whole field is no longer the cross-domain data of the head industry or a single system as previous, but all industry-related data are registered. Integration and collaboration mean that data are retrieved and used by different industry sources. This architecture has efficient security technologies for data security from source, registration, circulation, and use to supervision.
DOSA is composed of a data registration center (DRC), a data authorization center (DAC), a data exception center (DEC), data application units (DAUs), a key system, and other components. Safety and efficiency are the core of DOSA. Similar DOSA-like framework applications have been implemented in tourism, lab education, and data product trading processes. [9,10]
DRC
The DRC stores the data in an encrypted form, which is invisible even to the administrator. It not only confirms the ownership of the data but also keeps it safe and confidential. After the data are registered, the directory is automatically generated, which is convenient for data users to view and find relevant data sources. It can register the data of relevant units or laboratories to DRC or link to DRC through an application programming interface (API). [6]
DAC
To ensure the relationship between people and data ownership, DAC is responsible for the process of data ownership confirmation and authorization. [7] By guaranteeing the interests of data owners, they are confident to put their data on the platform. The owner of the data divides data-related personnel into data owners, data producers, and data users. In the current study, a design was proposed to save all the related public keys to the DAC, and the private key is kept by their client, without placing it on any platform or cloud, to ensure security.
DEC
To protect the interests of data owners, DEC needs to supervise during the data flow process. From the point of data register and data conditional sharing, even after the trading process, we took blockchain-related technical means, such as watermarks and timestamps, and added them to the data authorization process to ensure data users cannot transmit data again without the owner’s permission. [8]
Laboratory management technology
Existing laboratory management usually uses a LIMS that collects related data through various laboratories. Some benefits of using a LIMS are the ability to track individual data items or samples from reported results. [11] Some limitations of a LIMS for this specific application exist, as they may not be specialized for specific equipment, or applicable to certain chemistries. [12] Some LIMS chemistry systems are only available to connect with the Public Safety department. It is rare to find a LIMS system for chemistry widely used in related departments. There is also a high price for many commercial LIMS systems. Various systems’ costs are expensive for labs, and customization is limited to what the system offers. [13] Additionally, experimental data are private to external parties, yet it still needs to be shared among the research team.
Recently, researchers have designed some LIMS for education labs or other specific applications. Focusing on education, there is Chaoqun and Lanlan's interactive LIMS, which incoporates experimental teaching management, experimental equipment management, laboratory open management, communication, and interactive management modules. [14] For specific disciplines, there are LIMS and other laboratory informatics solutions specifically designed for high-level biosafety labs or biobanks. [15,16] However, these systems only flow data inside the lab, in which data cannot flow quickly to supervision departments or related companies. Some researchers realized that the LIMS needs to be both secure and enhance efficiency in order to enable laboratory staff to conduct their research as freely as possible; therefore, the information on research, laboratory personnel, experimental materials, and experimental equipment is generally collected and utilized by only one system. [17] However, the current research mainly focuses on such a system's design, and there is no specific experimental technical means to verify it. Thus, we employed DOSA to rapidly and securely move data inside or outside the laboratory. In Table 1, we compare our proposed architecture to existing lab management systems.
|
However, DOSA-related studies have designed models for some domains such as tourism, which only have frameworks without practical implementation. [9] We propose to use technical methods to solve data ownership by binding persons that used a key management system to protect data privacy.
Key system methods
To ensure data registration and sharing security, this work adopted RSA and AES algorithms, as well as Chinese domestic commercial keys SM2 and SM4, to compare the aspects of addressing efficiency and security of data. Table 2 summarizes the pros and cons of these key algorithms.
|
SM2 and SM4 are cryptographic standards authorized to be used in China. Relevant studies have shown that the SM2 and SM4 algorithms are more secure than ECDSA and AES. [22] During a controlled experiment, AES outperformed SM4 by a significant margin. [23] Symmetric encryption is usually used when the message sender needs to encrypt a large amount of data. It has the characteristics of an open algorithm, a small amount of calculation, and fast encryption speed. The advantage of the symmetric encryption algorithm lies in the high speed of encryption and decryption, and the difficulty of cracking when using a long key. The disadvantages of symmetric encryption are that key management and distribution are difficult and insecure. Before the data are transmitted, the sender and the receiver must agree on a secret key, and both parties must keep the key. If the key of one party is leaked, the encrypted information is insecure, and its security cannot be guaranteed. [24]
The advantage of asymmetric encryption is its higher security. The public key is made available, and the private key is kept by itself so there is no need to give the private key to others. The disadvantage of asymmetric encryption is that its speed is relatively slow, so it is only suitable for encrypting a small amount of data.
Methodology
Managed laboratory information and data can be divided into unconditional and conditional sharing categories. For example, the opening hours of the laboratory represent unconditional sharing data. The transmission of the laboratory's inventory and usage of hazardous chemicals needs conditional sharing by the administrative department (data owners). Laboratory safety data, accident statistics, and related training data are transnational conditional sharing data. Lab-related data are also classified into a security level ranging from zero to five. [25] Class zero data is data that can be public, such as laboratory introductions and research group members’ profiles which do not need any security protection. Class five data is critical national scientific research experiment data which must be kept strictly confidential. Classes one through four refer to data that the laboratory needs to share or that are trade-protected.
Lab data producers can be one or several persons, such as laboratory administrators, experimental instructors, or relevant teachers. A data owner is usually a person or a unit, such as a person in charge of the laboratory team or the emergency department. Data producers can input data, but they do not have the authority to share or trade data. After the data user has found targeted data by querying the DRC category, they need to contact the data owner. If the data owner considers that the data user can apply to view or use these data sources, the owner finds the user’s public key from the DAC then encrypts these data with the user’s public key to complete the authorization. The data user opens encrypted lab data with their private key to finish the conditional sharing of data (Figure 2a,b).
|
Data encryption registration and application process
The laboratory data are uploaded to the DRC by the owner, and the owner’s public key is added to verify the ownership of the data. It can guarantee the privacy and security of the data at the source. The data are stored in the DRC, which the administrator cannot view without permission. The encrypted lab data are transmitted to the DRC, and the directory is automatically generated by keywords. The relevant users can apply for permission to obtain the data after searching in the data directory. Figure 3 shows the data owner encrypting the registration data, which automatically generates an index directory, then data users apply for the required data by viewing the catalog.
|
Confirmation and authorization of the data transmission process
Mutual trust - Single-layer encryption
After the data user searches the DRC catalog and queries the data they need, the data user communicates with the data owner. If the user obtains permission, the data owner encrypts the data with the user’s public key. Then, the data user can use their private key to decrypt data. Meanwhile, to ensure the interests of the data owner, the entire process has traceability technology in the DOSA lab management system. For example, time stamps and digital signatures are adopted, which can ensure that data users do not transmit data outside the authorized context. [26] If the mutual parties trust each other, the process of conditional sharing occurs, as shown in Figure 4.
|
Algorithm 1 shows how the lab data user retrieves data in the DRC during the process of data ownership conversion. Table 3 shows the definition of symbols used in the algorithm.
|
|
Data users do not trust the data source — Double encryption
After checking the catalog in the DRC and communicating with the data owner about the data product’s price, the data user may doubt the authenticity of the data source and ownership. In this situation, the data owner will add this data product to the owner’s private key, then encrypt it by using the user’s public key, which is found in the DAC. As the next step, data owners input this encrypted data to the DRC and then inform the users.
Firstly, the data user finds the data owner’s public key from the DAC, and they use it to open the double-encrypted data product. As the next step, the data user utilizes their private key to decrypt the data. Figure 5 shows the transaction process of experimental data products when the data user questions the data authenticity or the data source’s ownership.
|
Using a person-binding data owner method to finish conditional data sharing can break data barriers. We also resolve issues of the laboratory data class and ownership conversion process.
DOSA framework components - Managed laboratory data linked to rich soils
Overall framework
This section briefly describes the entire laboratory data transfer framework using DOSA, designed by the core techniques. Figure 6 shows the overall structure of DOSA to enable laboratory-related data to break barriers. The existing LIMS is independent. We use the proposed data security protection technology such that data can be stored securely and conditionally. Every laboratory uses the public key algorithm to store encrypted data which can determine ownership. If any supervision departments (such as the Education Bureau) need to collect a certain type of data, they just need to apply for related laboratories. The data owners decide which data can be shared, then find the Education Bureau’s public key in DAC that can encrypt related data to the DRC of the Education Bureau so data users can obtain real-time, non-tampered data from various university laboratories. Compared with the current information collected, the DOSA method using the ownership conversion technique can solve existing problems with isolated laboratory data islands.
|
Laboratory accident searching based on DOSA
We selected the registration and inquiry of laboratory hazardous chemical materials as a case study. The full cycle of hazardous chemical materials includes application, acquirement, procurement, transportation, usage, recycling waste liquid, accidental leakage management, experimental operation accident, etc. The whole process needs to be supervised by multiple relevant units, data need to be circulated, and sharing is needed to break the information cocoon. There are many departments involved, and as such, routine security inspections need to be conducted and results should be submitted according to the protocols, which is not only inefficient but also error-prone and tamper-prone. In case there is an emergency leak or experiment-related explosion, an integrated and efficient system is necessary to query related data urgently. This workflow of searching for or collecting hazardous chemicals data among related branches, such as laboratories, the Education Bureau, the Public Security Bureau Inspection Department, and chemical companies, is shown in Figure 7.
|
The hazardous chemicals can be queried through the API specification interface (query by CAS or chemical name) directly among different branches. Hazardous chemical users, administrators, public security departments, emergency departments, and hazardous chemical suppliers can query in real time.
If an accident happens in the laboratory, we could obtain cross-domain and cross-branch information by each unit’s authorization in real time. The proposed structure, which can search the corresponding hazardous chemical information, equipment details, relevant personnel profile, and experimental project proposal, etc., is shown in Figure 8.
|
Experiments
In this section, we discuss testing the efficiency of managed lab data based on DOSA by putting the proposed framework into practice. We investigated the time consumption of key generation and public key transmission to DRC by different algorithms using Python 3.10.9. Then, we tried to use four key algorithms—SM2, SM4, RSA, and AES—to compare which algorithm was faster in the proposed architecture regarding authorized encryption and decryption. Combining the security level of the lab data mentioned in the above chapters, we try to find the most suitable solution.
Experimental environment
Table 4 shows the details of the experiment environment. The algorithm suits the corresponding experimental data class, which can consider both efficiency and security.
|
Simulation key generation and transmission experiment
The DAC module includes the right confirmation and authorization. The right confirmation determines the lab data ownership. The ownership of the experimental data belongs to the experimental team designer. After storing data in the DRC, the lab data product adds the research group public key to determine the ownership.
During lab data transactions, the data user wants to purchase the experimental data product. After payment, the data user needs to find the data owner’s public key in the DAC, then the user uses the owner’s public key to open the lab data product. This is an authorization ownership process.
In this experiment, a key pair was generated in Chengdu, Sichuan Province, China. The public key was sent to Beijing, China (DAC). We carried out 12 experiments generating key pairs, and the tests were performed on Environment 1 (env1) and Environment 2 (env2), respectively. Figure 9 shows a speed test of the time required to generate a key pair. The results of this experiment show that using the AES algorithm is the fastest (1.232 × 10−5 s/1.605 × 10−5 s) in two environments. Using SM4 (3.197 × 10−5 s/6.882 × 10−5 s) to create a key pair is the second fastest, which is close to AES. Meanwhile, the SM2 algorithm is the most time-consuming way.
|
Further, under two different operating system environments, the experiment simulated the speed of public key transmission from Chengdu to Beijing, and the results are shown in Figure 10. Compared with the key generation experiment, the speed of AES transmission is still the fastest, and the average speed is 1.327 s/1.400 s. In contrast, SM2 is the slowest, with an average speed of 1.485 s/1.571 s.
|
Confirmation and authorization experiments
In the proposed system, we tried to test lab management digital records 50–300 KB, which were encrypted, respectively, using the SM2, SM4, RSA, and AES algorithms, then stored in the DRC of HOST 1 and 2. The encryption time of these data texts is shown in Figure 11.
|
After the lab data user obtained the data owner’s public key from DAC, the lab data products were decrypted with four different key management methods, and the time taken is shown in Figure 12.
|
This experiment was performed 16 times, with different sizes of experimental products as variables, under HOST 1 and 2. The following is the logic Algorithm 2 for this experiment:
|
As shown in Figure 11 and Figure 12, AES was the fastest in performance to encrypt and decrypt lab data in the DOSA system. In particular, there are obvious advantages in decrypting data using AES. In the prior Table 2, we compared the security performance of these four algorithms. AES is a symmetric encryption algorithm, which has the conventional requirement to secretly distribute the key before communication, and the private key must be transmitted to the receiver through the network; thus, the key is not easy to keep secret and manage. However, as we proposed an ownership framework, the private key does not need to be transmitted to the data owner, so this shortcoming can be ignored. In summary, combined with security and speed performance, for lab data of class 1 to 4, it is recommended to use the AES algorithm to encrypt registration to determine ownership using the public key, and to decrypt to authorize with the private key based on DOSA.
However, also need to take high-security measures (class 5 lab data) into account, such as special major laboratory explosion data, or state secret experiment plans. For these security class 5 lab data, we need to pay more attention to safety performance. Combined with Table 2, SM2 shows better safety performance. So, in the current experiment, class 5 was applied to samples to compare the authorization and confirmation time efficiency using SM2.
Host 1’s speed of encryption storage of a larger-volume data text (over 230 KB) is double that of Host 2. Additionally, the encryption storage speed of Host 1 for small and medium volume experimental data was three to four times faster than Host 2. However, it seems there is no difference in the data decryption time consumption between the two environments.
As shown in Figure 13, it is recommended to choose a larger experimental sample for class 5 data encryption and decryption, which could save time.
|
Evaluate results
Combining the above experiments based on the proposed system, it is recommended to use the AES algorithm to encrypt the registration, authorization, and decryption of ordinary security-level data (class 1–4). For a high level of security experimental data (class 5), we recommend using the SM2 algorithm to encrypt larger experimental documents at one time, which can improve the efficiency.
The advantage of DOSA is that it binds data and ownership together, protects data with key technologies, and enables secure sharing. This study designed the whole conditional sharing and transaction process of laboratory-related data based on DOSA. In this process, class 1–5 data are protected by encryption technology to ensure their security. Through the key algorithm and the Gmssl (open-source toolbox), the problem of data leakage can be solved in the data transaction process. Using Python to encrypt the laboratory data with the tools provided by DOSA, it can be considered safe to share relevant laboratory data cross-domain and internally and externally.
Conclusions
To solve the problem of isolated islands of laboratory data, we proposed a data-ownership-based security architecture for confirming lab data rights, and we used a secret key algorithm to implement internal and external laboratory data conditional sharing. A data owner inputs encrypted data into the DRC, while confirming sharing rights. When a data user needs statistics or supervision, they can apply for targeted data by searching the directory. According to the different security levels of lab data, the appropriate key algorithm is selected for ownership authorization, which can realize efficient conditional sharing.
Laboratory information and data necessarily need to be conditionally shared for global and cross-border purposes. Often, these data are dispersed across various information systems, posing a challenge in sharing data for a safe and efficient lab data management system. Meanwhile, it is not easy to summarize different sources of real-time data, and it is also difficult to search data urgently among the relevant departments of the independent systems. Moreover, the security performance of the different systems is also inconsistent. As such, we developed our data architecture to manage laboratory-related data by allowing relevant units, universities, and laboratories to encrypt data with public keys and upload them to the data register center to form a data directory. Data users can search “Directory” or “data owner” names to apply to the data owner for data viewing or use. Data can be shared conditionally through public key confirmation and private key authorization to break lab data barriers. Then, we used experiments to evaluate which algorithm is better for the efficiency of this system. We suggest AES for ordinary experimental data. For higher levels of experimental data, we can use SM2 to process larger data at one time, which addresses both efficiency and security of data. From these experiments, we verified the feasibility and efficient security of DOSA to manage lab-related data.
This ownership security architecture for laboratory-managed information and data can also be applied to other fields, such as smart city construction. In the future, we will focus on the prediction of laboratory accidents on this system. Our long-term work goal is to use this data architecture to process the artificial intelligence analysis of experimental management data. Through the entry of a large amount of data, under the premise of the authorization of the data user, we can use the registered massive data to make predictions, such as analyzing the probability of an experimental accident occurring.
Abbreviations, acronyms, and initialisms
- API: application programming interface
- DAC: data authorization center
- DAU: data application unit
- DEC: data exception center
- DRC: data registration center
- DOSA: data ownership security architecture
- ECC: elliptic-curve cryptography
- env1: Environment 1
- env2: Environment 2
- LIMS: laboratory information management system
Acknowledgements
Author contributions
Conceptualization, X.Z. and F.M.; methodology, X.Z. and N.C.; software, X.Z.; validation, P.U., X.Z. and F.M.; formal analysis, X.Z. and P.U.; data curation, X.Z. and P.U.; writing—original draft preparation, X.Z.; writing—review and editing, X.Z., P.U. and F.M; visualization, X.Z.; supervision, F.M., N.C. and P.U.; project administration, X.Z.; funding acquisition, X.Z. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Conflicts of interest
The authors declare no conflict of interest.
References
Notes
This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added.