Journal:Using OpenBIS as a virtual research environment: An ELN-LIMS open-source database tool as a framework within the CRC 1411 Design of Particulate Products

Full article title	Using OpenBIS as a virtual research environment: An ELN-LIMS open-source database tool as a framework within the CRC 1411 Design of Particulate Products
Journal	Data Science Journal
Author(s)	Plass, Fabian; Englisch, Silvan; Zubiri, Benjamin A.; Pflug, Lukas; Spiecker, Erdmann; Stingl, Michael
Author affiliation(s)	Friedrich-Alexander-Universität Erlangen-Nürnberg
Primary contact	Email: michael dot stingl ay fau dot de
Year published	2023
Volume and issue	22
Article #	44
DOI	10.5334/dsj-2023-044
ISSN	1683-1470
Distribution license	Creative Commons Attribution 4.0 International
Website	https://datascience.codata.org/articles/10.5334/dsj-2023-044
Download	https://datascience.codata.org/articles/1500/files/655de35843b0d.pdf (PDF)

This article should be considered a work in progress and incomplete. Consider this article incomplete until this notice is removed.

Abstract

The transformation of existing technologies and consequent use of new digital technologies not only have a substantial impact on society and companies, but also on science. Analog documentation and research, as we have known it for centuries, will eventually be replaced by intelligent, more FAIR (findable, accessible, interoperable, and reusable) digital methods ands systems. In addition to the actual research data and results, metadata now plays an important role not only for individual, independently existing projects, but also for future scientific use and interdisciplinary research groups and disciplines. The solution presented here, consisting of an electronic laboratory notebook (ELN) and laboratory information management system (LIMS) based on the openBIS (open Biology Information System) environment, offers interesting features and advantages, especially for interdisciplinary work. The Collaborative Research Centre (CRC) 1411 "Design of Particulate Products" of the German Research Foundation is characterized by the cooperation of different working groups of synthesis, characterization, and simulation, and therefore serves as a model environment to present this implementation of openBIS. OpenBIS, as an open-source ELN-LIMS solution following FAIR principles, provides a common set of general entries, with the possibility of sharing and linking (meta-)data to improve the scientific exchange between all users.

Keywords: open science, research data management, databases, ELN-LIMS, interdisciplinary work

Introduction

Digital transformation is a key challenge that impacts our entire society. The main players here are companies that use intelligent information technologies (IT) and networks for machines and processes. Starting with flexible production via modular, changeable production processes and moving to customer-specific solutions and products requires the help of sophisticated data acquisition, processing, and analysis. [Bauernhansl et al. 2014; Lasi et al. 2014] Further examples of digital transformation include automation processes, machine-to-machine communication, internet of things (IoT) process implementation, or even augmented-reality-based workflows. [Bauernhansl et al. 2014; Egger & Masood, 2020; Lasi et al. 2014; Li et al. 2015]

However, not only companies are subject to this change, but also government institutions (eGovernment) [Gisler 2001] and science itself. [Kimmig et al. 2021] Thus, the topic of "open science" has become a growing movement. [National Academies of Sciences, Engineering, and Medicine (U.S.) et al. 2018] Open science has been supported in the European context of the Open Research Data and Data Management Plans of the European Research Council (ERC), established by the European Commission for almost five years. However, the ERC has been promoting the causes of open science since 2007. [ERC Scientific Council 2021] This also has manifested in other ways, as, for example, open access publications from funded projects that have already become mandatory to a certain extent. Exemplarily, the DFG (German Research Foundation) supports infrastructure projects within Collaborative Research Centres (CRCs), which are long-term university-based research institutions established for up to 12 years, and whose funding objective is to establish powerful information systems for research in a holistic perspective. [German Research Foundation 2021] Accordingly, new infrastructure on a national (i.e., Germany's National Research Data Infrastructure; Nationale Forschungsdateninfrastruktur or NFDI) and European level (i.e., the European Open Science Cloud or EOSC) [European Commission 2016; Mons et al. 2017] have been established, fostering the subject of research data management, data publications, and open science. Nevertheless, the topic of open science includes more than just the public provision of data in the context of open-access publications; it also includes the approaches of open methodologies, sources, and data. The necessary implementation and representation of good scientific data management, data quality, and stewardship (data governance) are tremendously important. [Brous et al. 2016; Hildebrand et al. 2011; Ladley 2020; Wilkinson et al. 2016] The resulting benefits can be measured directly, such as in terms of improvements in process efficiency or cost and risk reductions, and indirectly, such as increased acceptance, perception, and trust. [Brous et al. 2016; Hildebrand et al. 2011; Tallon 2013]

This paper presents an implementation of openBIS, an electronic laboratory notebook (ELN) and laboratory information management system (LIMS) to support open science broadly, including data management, handling, storage, and publishing within a scientific laboratory environment.

Current state of research data management

FAIR as part of research data management

One of the cornerstones of this overall research data management (RDM) is the FAIR principles, which encourage research objects to be more findable, accessible, interoperable, and reusable. The emphasis placed on and the growing awareness of FAIRness is, however, more than just an essential duty that public funding agencies impose on research. Moreover, it is the key to conduct knowledge discovery, innovation, and information transfer, as well as the subsequent integration and reuse of research objects by the scientific community. [Wilkinson et al. 2016] Events such as the global COVID-19 pandemic demonstrate the need for, and the overall benefits of, making data available online in a reusable fashion. [Besançon et al. 2021; Tse et al. 2020] This leads not only to efficient research and increased innovation, but also to fair and transparent use of public funds and tax capital, as well as increased visibility and scientific reputation and reliability, to name just a few benefits of open science. [Janssen et al. 2012]

The FAIR principles propose that all scholarly output should embody the characteristics of being findable, accessible, interoperable, and reusable. While these principles provide guidance on the expected behaviors of data resources, their practical implementation has been subject to varying interpretations. As the support for these principles has grown, so has the diversity of interpretations surrounding their application. [Mons et al. 2017] FAIR principles recognize the need for data accessibility under defined conditions, but do not necessitate complete openness. While transparency and clarity are required for accessing and reusing data, restrictions can still remain based on privacy, security, and competitive concerns. FAIR promotes a balanced approach that allows diverse participation and partnerships while ensuring the availability of data within specified guidelines. [Mons et al. 2017]

Data repositories and data publications

Data repositories are a key component of the digital transformation of science. Well-known examples are the commercial data repository service Figshare, open-access archives like arXiv, or platforms like Dataverse [Crosas 2011], EUData [Lecarpentier et al. 2013], and Zenodo, which is maintained by CERN and funded by the E.U. Commission. In fact, most of the known repositories already consider the high-level FAIR principles. In the case of Zenodo, the uploaded data is provided with a digital object identifier (DOI) and can optionally be published as open-accessible and viewable. This, in turn, leads to simplified findability, accessibility, and usability for the scientific community, as well as to the quotability of individual datasets, whose content no longer has to lead directly to a complete publication. This means that even data, methods, or code that initially received little attention can now be found by the general public and are not lost.

Despite all of this, justified doubts exist regarding the open science policy. Especially, there are questions and concerns about the security of data against external interference and possible compliance and regulatory requirements, especially in healthcare, such as the protection of relevant patient data. These issues circling around the subject of data sovereignty need to be clarified during the planning of and before introducing digital technologies such as data management or cloud-based systems. [Clayton et al. 2019; Hummel et al. 2021]

Good research data management does not start with the publication and archiving process of the work or the (meta-)data, but with their initial collection. This is because not only content-related data/information is of importance in the sense of RDM; metadata is as well. Metadata describes data or in general (additional) information about the described data(set). Metadata-specific information can be the authors, the creation time/date, or the type of the dataset, as well as the DOI of the dataset. In fact, many distinct types of metadata exist, including descriptive, structural, administrative, and process data.

ELNs and LIMS

The scientific questions, choices of experimental procedures, materials and methods, data analyses, and interpretation of research and its results were traditionally recorded in detail in paper-based laboratory notebooks. [Barillari et al. 2016] Not only is this approach incomprehensible today, since most data are generated electronically or stored as code on a network anyway, but this concept vehemently contradicts the overarching principles of FAIR, as well as open science in general. Neither are the data easy to find, nor are they accessible or usable by scientists outside the local physical system in which they are stored. Inevitably, the use of paper-based notebooks should be avoided, and a shift made to ELNs and LIMS.

The combination of an ELN with a LIMS allows research labs to facilitate the documentation and management of laboratory processes and data. An ELN is used for capturing and organizing experimental data, while a LIMS supports the management of laboratory resources, sample tracking, quality assurance, and other laboratory functions. Together, ELN and LIMS provide a comprehensive platform for efficient and secure management of laboratory information, promoting compliance with best practices and regulatory requirements. [Barillari et al. 2016; Bespalov et al. 2020; Machina & Wild 2013] ELNs can play a major role in a successful RDM effort. In this way, a continuous workflow under FAIR conditions can be guaranteed from the very beginning, starting with the collection of data, the use of the data by oneself or the research group and other researchers, the publication of the data, and, finally, a superior archiving, for example of data repositories like Zenodo or RADAR. [Kraft et al. 2016] Further advantages of an ELN system include [Barillari et al. 2016]:

the easy and metadata-based collection of information, improving their shareability;
an ensured long-lasting data storage on a secure server;
simplified accessibility via a global or local network;
greater archiving possibilities on open data repositories; and
self-implementable applications with the system.

Currently available project management tools connecting classical, collaborative project and data management efforts include platforms like OSF. [1] Classical ELN systems range from commercial applications like CERF [2], Benchling [3], and labfolder [4] to open-source options like Chemotion ELN [5], eLabFTW [6], and openBIS [7].

Current situation, methodology, and strategy

Current situation from an interdisciplinary point of view

The scope of open science and RDM, as well as a clear and well-defined data stewardship and governance, has been clearly recognized from the perspective of the DFG-funded Collaborative Research Centre (CRC) 1411 Design of Particulate Products. However, an ELN-LIMS is currently lacking for the use within an interdisciplinary field involving engineering, materials science, natural sciences, mathematics, theoretical modeling, and simulation. Furthermore, commercial systems are generally not recommended due to their possible financial conditions, in addition to a thereby complementary open science policy. Open-source products like Chemotion ELN are, again, too subject-specific (in this particular case, chemistry), which would lead to a less beneficial impact on interdisciplinary work and research, such as lower usability, and, thus, in the end, also acceptance in other subject groups. On the other hand, the openBIS system of the Department of Biosystems Science and Engineering and Biology of the ETH Zurich [Barillari et al. 2016; Bauch et al. 2011] shows promising features that are worth a closer look, even though this system was designed primarily for biological and medical disciplines and is not only used by academic institutions, but also by companies in the industry sector. [Bauch et al. 2011]

Accordingly, we describe the implementation of our openBIS system in several subsections, starting with the basic principles and working features of openBIS, continuing with the goals and requirements of researchers within the CRC and the scientific community for a well-functioning ELN tool, and ending with its implementation and usage. openBIS is, besides other ELN-LIMS examples, a good starting point and framework to foster cooperation via a digital environment using an ELN and by fulfilling the requirements by the DFG and NFDI consortia, such as FAIRmat (for materials science, physics, chemistry, and mathematics) with respect to data management, handling, storage, and publishing. In addition, we also want to share our experiences in further developing the system and its implementation and daily use.

General overview and structure of openBIS

Before we shed light on how openBIS can help us with a successful implementation of a beneficial RDM, we first clarify what openBIS is, how it works, and what technical requirements need to be met. OpenBIS is an open-source platform that functions both as an ELN and LIMS. Developed by ETH Zurich, openBIS provides an open-source database for research laboratories, especially designed and implemented in and for life sciences. [Barillari et al. 2016] The goal was to build a simple and efficient, yet comprehensive ELN-LIMS system that meets the daily needs of a research institution. Everyday things like storage of materials; instrumental setups and devices; acquisition, description, and processing of large amounts of data; and sharing of research with users and scientists within the openBIS system should be possible. [Bauch et al. 2011]

However, questions arise regarding what kind of technical conditions are needed to build openBIS within a research group, whether openBIS is scalable, and whether such a system in the area of a scientific network (E.U. or DFG fund)—or even a comprehensive database within an entire university—is conceivable at all. All of these questions are important to know before starting with an ELN-LIMS and will be discussed.

The history of openBIS and its platform, on which it is based, started in 2007 and is still actively maintained, nowadays by the Scientific IT Service Team of the ETH Zurich. openBIS requires a modern Unix-like operating system (OS), for instance Linux systems. However, openBIS can be run on virtual machines and docker containers and therefore is mainly platform-independent. A very interesting and detailed look on the general technical background of openBIS is provided by the developers. [Bauch et al. 2011] However, shortly summarized, openBIS has one or several data store server(s) (DSS) and an application server (AS). On the AS, data provenance actions like metadata handling are conducted, while on the DSS the raw data is managed. While on the front end user access is facilitated via a web browser, the AS uses a relational database management system (RDBMS) to generate persistent information like index information about all data sets, and the data itself is covered and stored within the (several) DSS system(s). The latter is responsible for creating, querying, and visualizing data while they are mediated by the AS. [Bauch et al. 2011] At the application side, the ELN-LIMS system is accessible via browser-based tools (the recommended ones are Chrome, Firefox, and Safari) and is reachable from any electronic device and operating system.

Moreover, as part of our INF project of the CRC 1411 (see the funding information in the acknowledgements), we want to further extend the concept of ELN-LIMS to a more virtual research environment in the future (see the next section for further information). Our goal is for VREs to be collaborative and present requirements-tailored tools that support web-based research environments. [Allan 2009; Candela et al. 2013; Lave & Wenger 1991] The DFG defines “Virtuelle Forschungsumgebung” (a literal translation of "Virtual Research Environment") as a platform for internet-based collaborative working that enables new ways of collaboration and a new way of dealing with research data and information. [Reimer & Carusi 2010]

Results

Features of openBIS as an ELN-LIMS system within CRC 1411

As openBIS is accessible for users via a web app, and even from outside the university network, access to the user’s own data and data provided by others is guaranteed always and everywhere. It is not only crucial to be able to create and implement (meta-)data, but doing so further enables the function that every user can permit access to its projects to selected users. This covers different roles (e.g., observer, user, admin) with different rights like reading, writing, or even deleting. Here, the authorization process lies completely in the hands of each user and can be adjusted independently, with respect to other users within the network (i.e., role-based) or other projects by the same user. In general, openBIS has a predefined hierarchical structure, which is divided into several levels. The first and main level is the "Work" or "Data Space" level, which, in addition to general rights and access, primarily contains all projects. The second level, "Projects," pertains to the projects themselves. These, in turn, have "Collections" (third level), which consist of different "Object Types" (fourth level), with or without datasets (see Figure 1) of type "Experimental Step," "Entry," or "Instrument." [Bauch et al. 2011] This is also valid for the Collections level. In fact, the first and second level contain a persistent identifier with one specific path using the nomenclature “/user(workspace)/projectx.” Collections and Objects extend the identifier by a unique code. However, these are moveable between Projects and Spaces. In the example of Figure 1, we can move Collection 1 with all its entries to Project 2, which can be shared, but the Projects and Spaces are not moveable.

Figure 1. Logical structure layers of openBIS containing the Work- or Data space, Projects, Collections, Object Types (OBJs), and attached Datasets (red). This figure is adapted from Barillari et al. [2016]

There are no limitations on the maximum number of users working on one openBIS ELN system. The only limitations are connected to the available and used server resources. It can also link data on samples, materials, instruments, and experiments. Here, openBIS features the parent-children relationship model (see Figure 2 for further detail). This means that every created Object type is logically interconnected between other Object types. As an example of our CRC, synthesis and characterization groups are trying to develop specific and well-defined particles considering different synthetic samples, materials, and processes. To ensure that the creation, modification, or deletion of any electronic records is traceable, computer-generated and time-stamped audit trails are used to record the date and time of any user interventions. As an example, to create an Experimental Step X and Y (see Figure 2), the researcher/user requires a different amount of Samples A and B, and one specific Instrument I. Moreover, for a specific Simulation Z, one uses Sample B and Software S. Here, Sample, Instrument, Software, and the Experimental Step and Software, or later on Publication, are all different created Object types that are accessible and existing for every user of our openBIS system. Furthermore, all Object Types within our system are classified into three different categories and differ if they are generalized entries (Entry and General Type), experimental procedures or analysis (like Experimental Step or Simulation), or properties and (real existing) objects (like Instrument or Software). In our basic examples, it is valid that Sample A and B, Instrument I, and Software S are all parents of the Experimental step X, Y, and Simulation Z, respectively, which are automatically the child(ren) of the prior Object types. If we now create an Object type Publication P, which is linked and fed via our Experimental step X, Y, and Simulation Z, then, accordingly, P is the child of X, Y, and Z, and vice versa. This results in a clear and well-understandable line of ancestry, which fulfills the FAIR concept. In our example, even after publishing (Publication P), one can clearly find and reuse (meta-)data of the corresponding project. Moreover, one does not have to create object types (and their underlying [meta-]data) multiple times, as one can reuse them if needed. This reduces redundancy of (meta-)data that is used across Spaces, Projects, and Collections, and saves storage and data maintenance time. Additionally, in an interdisciplinary environment, multiple groups utilize a common set of Object types, which would otherwise appear in a redundant manner in different fields. This can save a lot of time and lead to the possibility to create and implement a more comprehensible laboratory and research system.

Figure 2. The hierarchical structure of the parents-children relation in action. Every object type (for example Sample, Experimental Step, and Publication) is child and parent depending on its ancestry relation. In our example, Publication P is child of every other object type like Instrument I or Experimental Step X.

OpenBIS can also be used for administrative and lab-specific workflow processes. To understand how to do this, we have to take one step back and consider how openBIS works from each user’s point of view. First, every user possesses its own Workspace, which can be compared to its own desktop or computer storage. Within this Workspace, projects, metadata, and other objects can be created and adjusted without necessarily being connected to other users. This is possible using the Manage Access option, a function that allows users to simply grant access to those users within the system who should be allowed to read, write, and/or delete information in your project. All permitted users will appear as folder icons below the own Workspace folder icon. However, one researcher or user is connected to a specific working group or institute containing several other researchers/users. Finding a way to bring the entire working group together without repeatedly pressing the Manage Access button and effectively defining user roles is a key consideration. Especially for general administrative or instruments within the institute, which are accessible for every researcher of the corresponding group, creating for example several Instrument object types is not only time-consuming, but it also holds the danger of creating different metadata for the same, for example, instrument. This would result in problems with the FAIR principles.

Moreover, openBIS includes an additional working area called Inventory, which serves as a "common work environment" for user-groups to access information and data stored within this section. This area houses data and information that is universally relevant or of interest to all users or larger groups, and it contains spaces for the CRC, such as an openBIS on-boarding or a common folder. Furthermore, we have made further adjustments to the Inventory section by implementing sub-Inventories that are specific to the working groups/institutes within our CRC.

These different spaces, including the Inventory and our institute-based Inventories, provide fields for collaborative projects and administrative workflows, offering a general and overall usage area, along with a sub-Inventory containing commonly used, institute-dependent instruments and materials, among others. With a collaboratively-used Inventory, each user has access to collections of items like Instrumentation, where necessary object types (e.g., Instrument I1, I2, …) are created once and can be utilized by any user within the workgroup (those with access permission to the specific folder of the Instrumentation section). The same concept applies to collections of samples or documents/protocols that are employed in large collaborative projects.

By following the principle of the parent-child relation depicted in Figure 2, we ensure that instruments (e.g., Instrument I from our example) within an Instrumentation folder can now be used for tracking as part of the Inventory folder, while avoiding redundant information and directly providing the required documentation. This makes it possible to combine a classical working group environment with interdisciplinarity and FAIR data management within an overall framework.

OpenBIS as a multi-institutional framework within a material science-based environment: Adjustments and experiences

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added.