Journal:A roadmap for LIMS at NIST Material Measurement Laboratory
Full article title | A roadmap for LIMS at NIST Material Measurement Laboratory |
---|---|
Author(s) | Greene, Gretchen; Ragland, Jared; Trautt, Zachary; Lau, June; Plante, Raymond; Taillon, Joshua; Creuziger, Adam; Becker, Chandler; Bennett, Joseph; Blonder, Niksa; Borsuk, Lisa; Campbell, Carelyn; Friss, Adam; Hale, Lucas; Halter, Michael; Hanisch, Robert; Hardin, Gary; Levine, Lyle; Maragh, Samantha; Miller, Sierra; Muzny, Christopher; Newrock, Marcus; Perkins, John; Plant, Anne; Ravel, Bruce; Ross, David; Scott, John H.; Szakal, Chris; Tona, Alessandro; Vallone, Peter |
Author affiliation(s) | National Institute of Standards and Technology |
Year published | 2022 |
Volume and issue | NIST Technical Note 2216 |
Page(s) | i–iii, 1–17 |
DOI | 10.6028/NIST.TN.2216 |
Distribution license | Public domain |
Website | https://www.nist.gov/publications/roadmap-lims-nist-material-measurement-laboratory |
Download | https://tsapps.nist.gov/publication/get_pdf.cfm?pub_id=934610 (PDF) |
This article should be considered a work in progress and incomplete. Consider this article incomplete until this notice is removed. |
Foreword
Over the past decade, emerging technology in laboratory and computational science has changed the landscape for research by accelerating the production, processing, and exchange of data. The NIST Material Measurement Laboratory community recognizes that to keep pace with the transformation of measurement science to a digital paradigm, it is essential to implement laboratory information management systems (LIMS). Effective introduction of LIMS early in the research life cycle provides direct support for planning and execution of experiments and accelerating research productivity. From this perspective, LIMS are not passive entities with isolated interaction, but rather key resources supporting collaboration, scientific integrity, and transfer of knowledge over time. They serve as a delivery system for organizational contributions to the broader federated data community, supporting both controlled and open access, determined by the sensitivity of the research.
The overall goal of a successful LIMS is to empower a research community by establishing common tools providing access to laboratory data resources. Modern LIMS should therefore provide several core functions and touchpoints:
- Workflow management – A research workflow describes steps to be performed to derive results. These patterns serve as a prescription for LIMS to control the progression of data
and associated services or tools. Automation of a workflow simplifies the transfer of information through defined interfaces from a network of systems.
- Repository of data – Effective storage and retrieval of data (raw and derived)—including associated metadata, data products, calibration, software, logs, etc.—facilitates data discovery, processing, collaboration, and dissemination.
- Creation of data products and tools – A LIMS should support storage and processing of raw data, leading to products which can be shared and consumed. Examples would include sample data, instrument-generated data, and algorithms generating defined outputs. Inclusion of data models provides context and structure, and machine learning (ML) integration may generate related data which could be combined into more comprehensive data models. Tools may include visualization, evaluation, and analysis packages offering users advanced capabilities for their research projects.
- Organization of data for search and retrieval – Tools and interfaces give users access to sophisticated searches of data holdings and efficient mechanisms for data transfer in standardized formats. Searching should extend to domain or project-specific semantics, be coupled closely with related data, and go beyond individual research projects to include super-searches (e.g., use-case-driven interoperability between LIMS).
- Long-lived, stable, and agile structures – LIMS require institutional and architectural sustainability for long baseline research and curatorship. Technology tends to change faster than the practical lifetime of research programs, so paths must exist for maintaining IT infrastructure and introducing faster and more complex processes.
- Standards and best practices – LIMS benefit from standardization to support collaborations among research communities and make data workflows efficient and affordable. Community buy-in for standards and best practices is an essential part of LIMS, and organizational shared expertise naturally serves as a means for coordination and adaptation of standards.
- User involvement – In all the core functions listed above, it is critical to involve the subject matter experts from the beginning. LIMS should establish a working team that explicitly includes representatives from the end user community.
Abstract
Instrumentation generates data faster and in greater quantity than ever before, and inter-laboratory research is in historic demand domestically and internationally to stimulate economic innovation. Strategic mission needs of the NIST Material Measurement Laboratory (MML) to support a wide array of research disciplines therefore compel our organization to adopt advanced strategies for research data management. Laboratory information management systems (LIMS) provide a framework for managing data from the outset of the research life cycle, delivering new capabilities for machine learning (ML), data analysis, collaboration, and dissemination. This roadmap describes our current understanding and strategy for adapting our research workflows for LIMS throughout MML by embracing the use of standards and best practices from data science communities. The NIST research data cyber-infrastructure complements these goals for MML by providing a secure environment to host LIMS solutions. Additionally, integration of scientific workflows requires ongoing collaboration to bridge organizational LIMS with external scientific communities. Thus, MML LIMS will evolve over time in synergy with the technology and experimental environments, delivering new science. LIMS will broaden our mission impact through adoption of the FAIR Data Principles.
Keywords: data, laboratory information management systems, experimental data, research data, research workflows
Introduction
Beginning late 2019, MML initiated as part of its strategic plan the development of "next-generation" data and informatics with a focus on LIMS as a key resource to support research data and science. This effort was complemented by initiatives for enhancing data management planning and data systems infrastructure. The first year’s groundwork established common needs for both individual researchers and teams to engage more readily with LIMS, with a goal of building greater capacity for interaction and use of data. A vision for LIMS was written to convey the purpose of these collective efforts:
“Laboratory information management systems will provide MML scientists a practical means for repeatability, traceability, reproducibility, efficiency, and compliance of research, serving as a beacon to both intramural and extramural community stakeholders.”
The MML approach to implementing LIMS started with defining goals for specific division research laboratory projects and established a cross-divisional Community of Interest (COI) group for sharing solutions, services, practices, and challenges. Comprehensive LIMS solutions have been successfully implemented in several NIST laboratories. Several shared resources have successfully demonstrated use of LIMS components including repository platforms, a standard persistent identifier service, a centralized research data storage with networked data transfer nodes, and data transfer services, in addition to expertise in data modeling and semantics. These resources, along with community best practices, contribute to a basic LIMS architecture for research.
A system view and architecture model provide the foundation for planned future outcomes. In this roadmap, we define a set of research-oriented LIMS capabilities which serves to guide implementation along with components to deliver these capabilities. More detailed guidance for use of specific LIMS resources is available internally to NIST and where possible shared on external repository websites. It is also anticipated that LIMS implementation will provide an important contribution to the goals of NIST program areas such as artificial intelligence (AI), biosystems, chemical informatics, additive manufacturing, and the materials science areas which spearheaded early innovation in data systems through the Materials Genome Initiative.
Roadmap objectives
This roadmap provides a framework for manifesting the MML LIMS vision and outlines the key objectives highlighted by the MML LIMS COI project goals. These are grouped into near- and long-term objectives for MML LIMS prioritization. These objectives, along with broader community efforts, will strengthen the NIST data-as-an-asset [1] strategic approach to research. More comprehensive goals such as the development of a "digital twin" [2] will enable models to probe the measurement science space to further analyze the physics and gain understanding, leading to new science.
Near-term objectives include:
- Establish a LIMS COI for MML (as of this writing, already in place)
- Develop pilot LIMS solutions for targeted research workflows (as of this writing, several solutions already piloted and deployed for laboratory operations)
- Design tiered LIMS architectures to support a range of research workflow implementations
- Establish core infrastructure services to support LIMS
- Develop data acquisition and experimental activity capture solutions
- Deploy key functional support services such as a Handle.Net service (supports persistent identifiers) and data transfer service (see the later subsection on supporting services)
- Prototype and exchange LIMS components as a basis for shared resources (e.g., repository platforms for experimental activity; instrumentation; samples; extract, transform, and load [ETL])
- Establish best practices in digital object management to support standards and community practice
Long-term objectives include:
- Establish best practices for development and implementation of data models and semantics
- Deploy prioritized LIMS end-to-end solutions (achieving multi-component level functionality)
- Develop use-case-driven solutions for cross-laboratory LIMS interconnectivity
- Provide on-demand system-level LIMS resources for research teams at NIST for rapid engagement
- Develop automated workflow integration between LIMS and computational platforms (e.g., high-performance computing [HPC], ML, and analysis applications like SciServer)
- Develop integration between LIMS and public data access systems
- Establish methodology for building and applying "digital twin" models
- Provide NIST leadership a strategy for adoption and implementation of LIMS to promote innovative data-driven science
Challenges
Within the MML LIMS COI exchange forum, lessons learned and shared by early adopters of LIMS highlighted several key challenges. These will be factored into data management plans for implementation and extended to the broader MML community, which relies on either operation of LIMS or end-usage LIMS outputs.
One common challenge with instrumentation data is generation of vendor-proprietary output formats. A repository for sharing data format exchange software tools is a good example of a solution benefiting LIMS by supporting the need to transform vendor data into more consumable open data formats for downstream analysis and computation. A prototype repository was created by the Office of Data and Informatics (ODI) with a few extractor tools, and efforts are underway to explore how this may achieve wider utility. Community repositories such as Bio-Formats and MaterialsIO are examples of resources which support tools for conversions of third-party data into open data models. These community-oriented solutions successfully demonstrate methods to lower the barrier for LIMS through shared software.
Another challenge is finding appropriately skilled labor resources required for domain-specific engineering LIMS workflows. This is a common barrier to LIMS prioritization for research organizations. Integrating data structures requires close collaboration between domain and data science subject matter experts (SMEs) for modeling and mapping of multiple source data to repository storage.
Integration or migration of legacy systems and bespoke tools with next-generation LIMS architecture presents another challenge, especially for those with limited resources supporting maintenance. Legacy systems commonly lack sustainability due to factors such as end of funding support or unavailable expertise.
Data provenance is commonly required for sample tracking and traceability across laboratory processes (e.g., sample transformations, generation of parts, or inter-laboratory sample exchange). The latter is a challenge for inter-laboratory study because data management systems (including LIMS) are most commonly not standard or normalized. Supporting common data exchange protocols and chain of custody workflows will be an ongoing design consideration for interoperability, including concepts such as data trust and integrity.
Another common challenge is operational security compliance for IT infrastructure. NIST adheres to CIS Controls (Critical Security Controls,) and as LIMS architectures rely on networked systems, this translates to requirements for vigilant monitoring of service and platform deployments to ensure organizational security.
MML LIMS stakeholders
MML LIMS has a number of stakeholders with differing needs and priorities. ODI and research project leads work with all stakeholder categories, balancing goals to develop options that provide the best overall benefit. Stakeholders include:
- Research community: Each research discipline area involves a community benefiting from the generation and exchange of data outcomes.
- MML Laboratory Office: The MML Lab Office ensures support of the NIST organizational mission through the productivity achieved from the use of LIMS.
- Program funding sources: Funding for strategic initiatives and research priorities for NIST and MML address both internal and external stakeholder needs and account for LIMS resources.
- Collaborative partners: Critical stakeholders help foster research innovation and science through exchange of LIMS data and software products. A key benefit to LIMS is the ability to support collaboration through exchange and access to data, data products, software, and related resources.
External solutions and partnering
Working with the external community has resulted in a more robust MML LIMS knowledge base, benefiting from partnerships with external researchers during LIMS development and implementation, commercial procurement, and collaboration. Several organizational partners have contributed to MML efforts, including the National Research Energy Laboratory [3], Oak Ridge National Laboratory (ORNL), Brookhaven National Laboratory (BNL), Air Force Research Laboratory, 3M Corporation, NASA, University of Illinois - Champaign, and national forensic crime labs in collaboration with the National Institute of Justice. [4] Many NIST strategic research areas rely on collaborative data engagement; as such, having a LIMS with capabilities for supporting external access is important in many cases.
Collaboration with external partners may involve co-development of a LIMS platform system (or component), adoption of a community LIMS, shared access to data/code, or support for commercial vendor solution customization. Such collaborations often include harmonizing requirements and goals to inform the design, architecture, and engineering of a solution. Open-source solutions often provide more mechanisms supporting the complexity of many research workflows through their flexible configuration, customization, and independent software enhancements. A few examples of such solutions include ORNL DataFed [5], SynBioHub, BNL’s BlueSky platform [6], and SciServer. While these may require engineering expertise to fit within the NIST infrastructure, each adds meaningful user capabilities.
As one example, NIST synchrotron beamline stations at the BNL National Synchrotron Light Source II (NSLS-II) Facility have implemented BlueSky in partnership with BNL’s Data Science and Systems Integration group.
Several solutions have also been developed by partnering within NIST, and both are highly customized to the research requirements of their respective user groups. They include:
- the NEXUS Electron Microscopy LIMS (NEXUS-LIMS [7]) workflow based on the Configurable Data Repository System (CDCS), a NIST-developed open platform, and
- a LIMS supporting real-time biosystem cell line sample tracking LIMS with a custom Excel application used for experimental activity capture.
A few commercial solutions [1] were adopted in part or in full. In one successful example, the NIST Center for Automotive Lightweighting (NCAL) successfully implemented the commercial platform Ansys Granta. Other community and vendor solutions have been and continue to be evaluated for pilot use, such as 4CeeD [8], the Tadabase no-code solution, the Benchling cloud platform, Microsoft platforms, and others. Due to the challenging nature of customized research workflows and complexity in the secure integration with government networked infrastructure, commercial solutions may pose additional cost and skillset requirements for successful adaptation into the laboratory working environment. In most cases at NIST, use case development leads to the adoption of hybrid solutions. Closed-source solutions widely adopted in the scientific community (e.g., Globus) provide unique and robust capabilities that are difficult to recreate. In the instance of Globus, the linkage to several research community services and best practices like GridFTP, linked identity management with institutions authentication including InCommon, a Python software development kit (SDK), and multi-platform storage connections provide high value and ease of adoption.
Implementation of the roadmap
System level solutions
A LIMS, as we define it, is a system of components which delivers the capabilities for the early stages of a research life cycle. [9] It is widely recognized there is no singular LIMS solution that ranges across all disciplines of research, yet shared components (and data assets) provide greater economy of scale and consistent usage across the organization and beyond. A LIMS design implements research workflow requirements and provides context for assembling an architecture supporting component integration to produce desired outcomes. Off-the-Shelf (OTS) LIMS solutions are often challenging to adopt, primarily due to the limitations in both configuration and customization of components to match workflows. Monolithic solutions have demonstrated challenges whether they are homegrown or OTS by constraining interfaces between components. Workflow flexibility, orchestration, and evolution can be managed with lower risk [10] to overall performance when using service-oriented solutions. Therefore, implementations may vary as to which components are used and in which sequence to support the required research workflow. A tiered model ranging from basic components and plug-in architecture to instantiation of more complex data models with computational support can lower the barrier for entry. Given this context and the goal of flexibility, a few commonly used LIMS workflow components provide critical functionality.
LIMS tiers
Requirements for building or deploying LIMS are dependent upon the complexity of the research workflow and project needs. In the consideration of the system, a basic three-tiered model serves as a general guide for capabilities mapping and implementation, to level of effort (Fig. 1). As with many aspects of LIMS, this model may have variations in the strata depending on the optimal architecture for achieving desired outcomes.
|
Tier 1: On-demand resources
Tier 1 is primarily reliant on infrastructure and support services and can be readily adopted with available “on-demand” resources, meaning those resources that exist and can be provisioned for use upon request. The main functions for Tier 1 LIMS include data acquisition, near-line instrument data collection, sample tracking, and data movement—possibly automated—to a storage location, ideally with access for processing and analysis. Tier 1 LIMS support unstructured data and may provide more flexibility for researchers to experiment with data structures, formats, and tuning for workflow.
Tier 2: Data science and services
Tier 2 integrates research workflow design, including data structuring, formatting, metadata, and possible integration to a data repository solution. The Tier 2 LIMS also requires a greater level of effort and engagement between a research SME and data science engineers for design, installation, and operational configuration.
Tier 3: Discipline science
The Tier 3 LIMS require the highest level of commitment and collaboration between the research SME and a data science expert to factor in all the functionality for more complex workflows. This might involve functionality such as computational system integration, support for external client tools through an application programming interface (API), and semantic modeling to factor in community adoption for data consumption. An example of a Tier 3 LIMS would include data generated from multiple instruments, each with variable processes, interconnected to computational tools and applications to produce data products, with dependencies on analytical requirements for external interoperability. Both the MML NEXUS LIMS and Granta NCAL systems are examples of Tier 3.
Supporting services
Common services support LIMS at various touch points throughout the research data workflow. These may be considered part of the infrastructure, i.e., they are underlying services shared and configured to support more than one stage of a LIMS workflow. However, by nature, they often require a contextual prescription for how they interface with a LIMS workflow of component. Examples of these key support services include:
- Data transfer services: manual, automation, and tools for file movement between storage locations
- Handle service: generates and resolves a standard persistent identifier (PID) known as a Handle
- Repository systems: OTS solutions or well-established and supported repository platforms (e.g., CDCS, Cordra, GitLab, GitHub) and open-source solutions supporting customization for (meta)data
- Containerized deployments: Docker, Kubernetes, virtual machine/cloud services supporting deployments of LIMS applications and tools
- Solutions brokering: Documentation and communications, including instructional guides (“playbooks”) and consultation for design and use of LIMS systems, services, and components
Infrastructure resources for LIMS
Organizationally MML and NIST both provide infrastructure-as-a-service for networking, storage, and compute resources. These systems may be requested through internal NIST IT services and are readily available.
As part of the MML Data and Informatics strategic plan, both network and storage have been significantly expanded to support higher bandwidth for data transfer between laboratory instruments and storage (see Fig. 2). Data space allocations are designed to be flexible and can be established on request for cross-organizational projects, research teams, and instrument dedicated endpoints. Several network attached storage (NAS) solutions have been implemented to support both localized (data collection nodes) and central data storage (a dedicated MML Research Data storage array). Additionally, the NIST Amazon Web Service (AWS) cloud environment is available and integrated with the NIST VPN (virtual private network) providing both storage and compute on demand.
A research equipment network (REN) is used to manage pass through and routing between instrument laboratory equipment and data storage. Network configuration via the REN ensures that instrument-control computers are protected and isolated as needed for secure operations. Higher speed network backbones and a Science DMZ [11] is currently in a planning phase to provide higher throughputs and secured zones for operating with LIMS architectures.
|
Current infrastructure resources include DCN, NAS, CDS, AWS, and local research data storage, as well as enki, AWS, SciServer, and HPC computational resources. Meanwhile, efforts are ongoing to integrate the CIS Controls into the infrastructure and expand capabilities in support of LIMS. Security for government research networks is a top priority for NIST, and all LIMS deployments must adhere to cybersecurity controls and processes for monitoring and managing access. This critical infrastructure element requires coordination between LIMS developers and IT security offices to ensure systems are protected and compliant with CIS including monitoring, security patching, and notification of problems for mitigation. In developing a plan to implement LIMS, resource needs to complete security assessment and authorization should be taken into consideration.
Community standards and best practices
Several standards and best practices in the research data community are key to building an effective LIMS workspace. While we list a few concepts core to LIMS and examples of use, these are only a small subset in the growing field of data science. Standards and best practices will continue to evolve, requiring resources for maintenance, expansion, community engagement, and user training, though one goal of a successful LIMS implementation is to minimize these burdens for end users. Examples of data standards and practices used with LIMS are found in Table 1.
|
References
Notes
This document falls in the U.S. public domain and is republished courtesy of the National Institute of Standards and Technology. This presentation is faithful to the original, with only a few minor changes to presentation, spelling, and grammar.