Journal:Using OpenBIS as a virtual research environment: An ELN-LIMS open-source database tool as a framework within the CRC 1411 Design of Particulate Products

From LIMSWiki
Revision as of 18:53, 4 June 2024 by Shawndouglas (talk | contribs) (Saving and adding more.)
Jump to navigationJump to search
Full article title Using OpenBIS as a virtual research environment: An ELN-LIMS open-source database tool as a framework within the CRC 1411 Design of Particulate Products
Journal Data Science Journal
Author(s) Plass, Fabian; Englisch, Silvan; Zubiri, Benjamin A.; Pflug, Lukas; Spiecker, Erdmann; Stingl, Michael
Author affiliation(s) Friedrich-Alexander-Universität Erlangen-Nürnberg
Primary contact Email: michael dot stingl ay fau dot de
Year published 2023
Volume and issue 22
Article # 44
DOI 10.5334/dsj-2023-044
ISSN 1683-1470
Distribution license Creative Commons Attribution 4.0 International
Website https://datascience.codata.org/articles/10.5334/dsj-2023-044
Download https://datascience.codata.org/articles/1500/files/655de35843b0d.pdf (PDF)

Abstract

The transformation of existing technologies and consequent use of new digital technologies not only have a substantial impact on society and companies, but also on science. Analog documentation and research, as we have known it for centuries, will eventually be replaced by intelligent, more FAIR (findable, accessible, interoperable, and reusable) digital methods ands systems. In addition to the actual research data and results, metadata now plays an important role not only for individual, independently existing projects, but also for future scientific use and interdisciplinary research groups and disciplines. The solution presented here, consisting of an electronic laboratory notebook (ELN) and laboratory information management system (LIMS) based on the openBIS (open Biology Information System) environment, offers interesting features and advantages, especially for interdisciplinary work. The Collaborative Research Centre (CRC) 1411 "Design of Particulate Products" of the German Research Foundation is characterized by the cooperation of different working groups of synthesis, characterization, and simulation, and therefore serves as a model environment to present this implementation of openBIS. OpenBIS, as an open-source ELN-LIMS solution following FAIR principles, provides a common set of general entries, with the possibility of sharing and linking (meta-)data to improve the scientific exchange between all users.

Keywords: open science, research data management, databases, ELN-LIMS, interdisciplinary work

Introduction

Digital transformation is a key challenge that impacts our entire society. The main players here are companies that use intelligent information technologies (IT) and networks for machines and processes. Starting with flexible production via modular, changeable production processes and moving to customer-specific solutions and products requires the help of sophisticated data acquisition, processing, and analysis. [Bauernhansl et al. 2014; Lasi et al. 2014] Further examples of digital transformation include automation processes, machine-to-machine communication, internet of things (IoT) process implementation, or even augmented-reality-based workflows. [Bauernhansl et al. 2014; Egger & Masood, 2020; Lasi et al. 2014; Li et al. 2015]

However, not only companies are subject to this change, but also government institutions (eGovernment) [Gisler 2001] and science itself. [Kimmig et al. 2021] Thus, the topic of "open science" has become a growing movement. [National Academies of Sciences, Engineering, and Medicine (U.S.) et al. 2018] Open science has been supported in the European context of the Open Research Data and Data Management Plans of the European Research Council (ERC), established by the European Commission for almost five years. However, the ERC has been promoting the causes of open science since 2007. [ERC Scientific Council 2021] This also has manifested in other ways, as, for example, open access publications from funded projects that have already become mandatory to a certain extent. Exemplarily, the DFG (German Research Foundation) supports infrastructure projects within Collaborative Research Centres (CRCs), which are long-term university-based research institutions established for up to 12 years, and whose funding objective is to establish powerful information systems for research in a holistic perspective. [German Research Foundation 2021] Accordingly, new infrastructure on a national (i.e., Germany's National Research Data Infrastructure; Nationale Forschungsdateninfrastruktur or NFDI) and European level (i.e., the European Open Science Cloud or EOSC) [European Commission 2016; Mons et al. 2017] have been established, fostering the subject of research data management, data publications, and open science. Nevertheless, the topic of open science includes more than just the public provision of data in the context of open-access publications; it also includes the approaches of open methodologies, sources, and data. The necessary implementation and representation of good scientific data management, data quality, and stewardship (data governance) are tremendously important. [Brous et al. 2016; Hildebrand et al. 2011; Ladley 2020; Wilkinson et al. 2016] The resulting benefits can be measured directly, such as in terms of improvements in process efficiency or cost and risk reductions, and indirectly, such as increased acceptance, perception, and trust. [Brous et al. 2016; Hildebrand et al. 2011; Tallon 2013]

This paper presents an implementation of openBIS, an electronic laboratory notebook (ELN) and laboratory information management system (LIMS) to support open science broadly, including data management, handling, storage, and publishing within a scientific laboratory environment.

Current state of research data management

FAIR as part of research data management

One of the cornerstones of this overall research data management (RDM) is the FAIR principles, which encourage research objects to be more findable, accessible, interoperable, and reusable. The emphasis placed on and the growing awareness of FAIRness is, however, more than just an essential duty that public funding agencies impose on research. Moreover, it is the key to conduct knowledge discovery, innovation, and information transfer, as well as the subsequent integration and reuse of research objects by the scientific community. [Wilkinson et al. 2016] Events such as the global COVID-19 pandemic demonstrate the need for, and the overall benefits of, making data available online in a reusable fashion. [Besançon et al. 2021; Tse et al. 2020] This leads not only to efficient research and increased innovation, but also to fair and transparent use of public funds and tax capital, as well as increased visibility and scientific reputation and reliability, to name just a few benefits of open science. [Janssen et al. 2012]

The FAIR principles propose that all scholarly output should embody the characteristics of being findable, accessible, interoperable, and reusable. While these principles provide guidance on the expected behaviors of data resources, their practical implementation has been subject to varying interpretations. As the support for these principles has grown, so has the diversity of interpretations surrounding their application. [Mons et al. 2017] FAIR principles recognize the need for data accessibility under defined conditions, but do not necessitate complete openness. While transparency and clarity are required for accessing and reusing data, restrictions can still remain based on privacy, security, and competitive concerns. FAIR promotes a balanced approach that allows diverse participation and partnerships while ensuring the availability of data within specified guidelines. [Mons et al. 2017]

Data repositories and data publications

Data repositories are a key component of the digital transformation of science. Well-known examples are the commercial data repository service Figshare, open-access archives like arXiv, or platforms like Dataverse [Crosas 2011], EUData [Lecarpentier et al. 2013], and Zenodo, which is maintained by CERN and funded by the E.U. Commission. In fact, most of the known repositories already consider the high-level FAIR principles. In the case of Zenodo, the uploaded data is provided with a digital object identifier (DOI) and can optionally be published as open-accessible and viewable. This, in turn, leads to simplified findability, accessibility, and usability for the scientific community, as well as to the quotability of individual datasets, whose content no longer has to lead directly to a complete publication. This means that even data, methods, or code that initially received little attention can now be found by the general public and are not lost.

Despite all of this, justified doubts exist regarding the open science policy. Especially, there are questions and concerns about the security of data against external interference and possible compliance and regulatory requirements, especially in healthcare, such as the protection of relevant patient data. These issues circling around the subject of data sovereignty need to be clarified during the planning of and before introducing digital technologies such as data management or cloud-based systems. [Clayton et al. 2019; Hummel et al. 2021]

Good research data management does not start with the publication and archiving process of the work or the (meta-)data, but with their initial collection. This is because not only content-related data/information is of importance in the sense of RDM; metadata is as well. Metadata describes data or in general (additional) information about the described data(set). Metadata-specific information can be the authors, the creation time/date, or the type of the dataset, as well as the DOI of the dataset. In fact, many distinct types of metadata exist, including descriptive, structural, administrative, and process data.

ELNs and LIMS

The scientific questions, choices of experimental procedures, materials and methods, data analyses, and interpretation of research and its results were traditionally recorded in detail in paper-based laboratory notebooks. [Barillari et al. 2016] Not only is this approach incomprehensible today, since most data are generated electronically or stored as code on a network anyway, but this concept vehemently contradicts the overarching principles of FAIR, as well as open science in general. Neither are the data easy to find, nor are they accessible or usable by scientists outside the local physical system in which they are stored. Inevitably, the use of paper-based notebooks should be avoided, and a shift made to ELNs and LIMS.

The combination of an ELN with a LIMS allows research labs to facilitate the documentation and management of laboratory processes and data. An ELN is used for capturing and organizing experimental data, while a LIMS supports the management of laboratory resources, sample tracking, quality assurance, and other laboratory functions. Together, ELN and LIMS provide a comprehensive platform for efficient and secure management of laboratory information, promoting compliance with best practices and regulatory requirements. [Barillari et al. 2016; Bespalov et al. 2020; Machina & Wild 2013] ELNs can play a major role in a successful RDM effort. In this way, a continuous workflow under FAIR conditions can be guaranteed from the very beginning, starting with the collection of data, the use of the data by oneself or the research group and other researchers, the publication of the data, and, finally, a superior archiving, for example of data repositories like Zenodo or RADAR. [Kraft et al. 2016] Further advantages of an ELN system include [Barillari et al. 2016]:

  • the easy and metadata-based collection of information, improving their shareability;
  • an ensured long-lasting data storage on a secure server;
  • simplified accessibility via a global or local network;
  • greater archiving possibilities on open data repositories; and
  • self-implementable applications with the system.

Currently available project management tools connecting classical, collaborative project and data management efforts include platforms like OSF. [1] Classical ELN systems range from commercial applications like CERF [2], Benchling [3], and labfolder [4] to open-source options like Chemotion ELN [5], eLabFTW [6], and openBIS [7].

Current situation, methodology, and strategy

Current situation from an interdisciplinary point of view

The scope of open science and RDM, as well as a clear and well-defined data stewardship and governance, has been clearly recognized from the perspective of the DFG-funded Collaborative Research Centre (CRC) 1411 Design of Particulate Products. However, an ELN-LIMS is currently lacking for the use within an interdisciplinary field involving engineering, materials science, natural sciences, mathematics, theoretical modeling, and simulation. Furthermore, commercial systems are generally not recommended due to their possible financial conditions, in addition to a thereby complementary open science policy. Open-source products like Chemotion ELN are, again, too subject-specific (in this particular case, chemistry), which would lead to a less beneficial impact on interdisciplinary work and research, such as lower usability, and, thus, in the end, also acceptance in other subject groups. On the other hand, the openBIS system of the Department of Biosystems Science and Engineering and Biology of the ETH Zurich [Barillari et al. 2016; Bauch et al. 2011] shows promising features that are worth a closer look, even though this system was designed primarily for biological and medical disciplines and is not only used by academic institutions, but also by companies in the industry sector. [Bauch et al. 2011]

Accordingly, we describe the implementation of our openBIS system in several subsections, starting with the basic principles and working features of openBIS, continuing with the goals and requirements of researchers within the CRC and the scientific community for a well-functioning ELN tool, and ending with its implementation and usage. openBIS is, besides other ELN-LIMS examples, a good starting point and framework to foster cooperation via a digital environment using an ELN and by fulfilling the requirements by the DFG and NFDI consortia, such as FAIRmat (for materials science, physics, chemistry, and mathematics) with respect to data management, handling, storage, and publishing. In addition, we also want to share our experiences in further developing the system and its implementation and daily use.

General overview and structure of openBIS

Before we shed light on how openBIS can help us with a successful implementation of a beneficial RDM, we first clarify what openBIS is, how it works, and what technical requirements need to be met. OpenBIS is an open-source platform that functions both as an ELN and LIMS. Developed by ETH Zurich, openBIS provides an open-source database for research laboratories, especially designed and implemented in and for life sciences. [Barillari et al. 2016] The goal was to build a simple and efficient, yet comprehensive ELN-LIMS system that meets the daily needs of a research institution. Everyday things like storage of materials; instrumental setups and devices; acquisition, description, and processing of large amounts of data; and sharing of research with users and scientists within the openBIS system should be possible. [Bauch et al. 2011]

However, questions arise regarding what kind of technical conditions are needed to build openBIS within a research group, whether openBIS is scalable, and whether such a system in the area of a scientific network (E.U. or DFG fund)—or even a comprehensive database within an entire university—is conceivable at all. All of these questions are important to know before starting with an ELN-LIMS and will be discussed.

The history of openBIS and its platform, on which it is based, started in 2007 and is still actively maintained, nowadays by the Scientific IT Service Team of the ETH Zurich. openBIS requires a modern Unix-like operating system (OS), for instance Linux systems. However, openBIS can be run on virtual machines and docker containers and therefore is mainly platform-independent. A very interesting and detailed look on the general technical background of openBIS is provided by the developers. [Bauch et al. 2011] However, shortly summarized, openBIS has one or several data store server(s) (DSS) and an application server (AS). On the AS, data provenance actions like metadata handling are conducted, while on the DSS the raw data is managed. While on the front end user access is facilitated via a web browser, the AS uses a relational database management system (RDBMS) to generate persistent information like index information about all data sets, and the data itself is covered and stored within the (several) DSS system(s). The latter is responsible for creating, querying, and visualizing data while they are mediated by the AS. [Bauch et al. 2011] At the application side, the ELN-LIMS system is accessible via browser-based tools (the recommended ones are Chrome, Firefox, and Safari) and is reachable from any electronic device and operating system.

Moreover, as part of our INF project of the CRC 1411 (see the funding information in the acknowledgements), we want to further extend the concept of ELN-LIMS to a more virtual research environment in the future (see the next section for further information). Our goal is for VREs to be collaborative and present requirements-tailored tools that support web-based research environments. [Allan 2009; Candela et al. 2013; Lave & Wenger 1991] The DFG defines “Virtuelle Forschungsumgebung” (a literal translation of "Virtual Research Environment") as a platform for internet-based collaborative working that enables new ways of collaboration and a new way of dealing with research data and information. [Reimer & Carusi 2010]

Results

Features of openBIS as an ELN-LIMS system within CRC 1411

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added.