Scientific data management system

From LIMSWiki
Revision as of 16:14, 22 March 2024 by Shawndouglas (talk | contribs) (Updated for 2024)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search
NIST tests standard interfaces for its lab instruments. SDMSs allow labs to integrate raw and processed instrument data with other types of data, unstructured and structured.

A scientific data management system (SDMS) (occasionally referenced to as a laboratory data management system [LDMS][1][2]) is software that acts similarly to a document management system (DMS), capturing, cataloging, and archiving data generated by laboratory instruments (e.g., high-performance liquid chromatography and mass spectrometry instruments) and applications (e.g., laboratory information management systems, electronic laboratory notebooks, and other analytical applications) in a compliant, often pre-defined manner best suitable for its intended use, whether it be structured, unstructured, or semi-structured data.[3][4] The SDMS can also act as a gatekeeper, serving platform-independent data to informatics applications and other stakeholders.

Purpose and technology

An SDMS is used to improve data handling and management issues in a number of scientific disciplines. As the four Vs of modern big data—volume, variety, veracity, and velocity—increase time spent on data acquisition and management, taking time away from other aspects of scientific research and complicating aspects of experimental reproducibility, solutions like an SDMS can help better manage the total lifecycle of data.[5] This is accomplished through a variety of tools, including data normalization and integration, data sharing and management, metadata capture and management, data object and record management, and robust search tools.[5]

As with many other laboratory informatics tools, the lines between an SDMS, LIMS, ELN, and other systems are at times blurred, as functionality from these systems makes their way into each other.[1][4] However, there are some essential qualities that an SDMS owns that distinguishes it from other informatics systems:

1. While a LIMS has traditionally been built to handle structured, mostly homogeneous data, an SDMS (and systems like it) is built to handle unstructured, mostly heterogeneous data[6], though many can handle structured, unstructured, and semi-structured data.

2. An SDMS typically acts as a seamless "wrapper" for other data systems like LIMS and ELN in the laboratory, though sometimes the SDMS software is readily apparent.

3. An SDMS is designed primarily for data consolidation and reuse, knowledge integration and management, and knowledge asset discovery and realization.[4][7]

An SDMS can be seen as one potential solution for handling unstructured data, which can make up nearly 75 percent of a research and development unit's data.[8] This includes PDF files, images, instrument data, spreadsheets, and other forms of data rendered in many environments in the laboratory. Traditional SDMSs have focused on acting as a nearly invisible blanket or wrapper that integrate information from corporate offices (standard operating procedures, safety documents, etc.) with data from lab devices and other data management tools, all to be indexed and searchable from a central database. An SDMS also must be focused on increasing research productivity without sacrificing data sharing and collaboration efforts.[8]

Some of the things a standard SDMS may be asked to do include, but are not limited to[4][5][9][10]:

  • store and archive raw data files;
  • interact real-time with simple and complex laboratory instruments;
  • retrieve worklists from LIMS and convert them to sequence files;
  • require review and approval of actions, with electronic signatures;
  • capture provenance information of data;
  • allow for annotation of records and collections to inform or warn other users of relevant changes or errors;
  • analyze and create reports on laboratory instrument functions;
  • perform complex calculations and comparisons of two different sample groups;
  • monitor environmental conditions and react when base operating parameters are out of range;
  • act as an operational database that allows selective importation/exportation of ELN data;
  • manage workflows based on data imported into the SDMS;
  • validate other computer systems and software in the laboratory; and
  • identify and retrieve data and metadata useful for training artificial intelligence and machine learning agents.

Further reading

References

  1. 1.0 1.1 Kranjc, Tilen (16 August 2021), Zupancic, Klemen; Pavlek, Tea; Erjavec, Jana, eds., "Introduction to Laboratory Software Solutions and Differences Between Them" (in en), Digital Transformation of the Laboratory (Wiley): 75–84, doi:10.1002/9783527825042.ch3, ISBN 978-3-527-34719-3, https://onlinelibrary.wiley.com/doi/10.1002/9783527825042.ch3 
  2. Avunjian, S. (17 November 2023). "Laboratory Software Systems: What You Need to Know to Make an Informed Decision". LigoLab Blog. LigoLab Information Systems. https://www.ligolab.com/post/laboratory-software-systems-what-you-need-to-know-to-make-an-informed-decision. Retrieved 22 March 2024. 
  3. Hayward, S. (15 May 2017). "Experts Explain: The Rise of Laboratory Data Lakes". Laboratory Equipment. Advantage Business Media. Archived from the original on 16 May 2017. https://web.archive.org/web/20170516235859/http://www.laboratoryequipment.com/article/2017/05/experts-explain-rise-laboratory-data-lakes. Retrieved 22 March 2024. 
  4. 4.0 4.1 4.2 4.3 "ASTM E1578-18 Standard Guide for Laboratory Informatics". ASTM International. 23 August 2019. https://www.astm.org/e1578-18.html. Retrieved 22 March 2024. 
  5. 5.0 5.1 5.2 Stansberry, Dale; Somnath, Suhas; Breet, Jessica; Shutt, Gregory; Shankar, Mallikarjun (1 December 2019). "DataFed: Towards Reproducible Research via Federated Data Management". 2019 International Conference on Computational Science and Computational Intelligence (CSCI) (Las Vegas, NV, USA: IEEE): 1312–1317. doi:10.1109/CSCI49370.2019.00245. ISBN 978-1-7281-5584-5. https://ieeexplore.ieee.org/document/9071425/. 
  6. Elliott, M.H. (31 October 2003). "Considerations for Management of Laboratory Data". Scientific Computing. Advantage Business Media. Archived from the original on 26 April 2017. https://web.archive.org/web/20170426150419/http://www.scientificcomputing.com/article/2003/10/considerations-management-laboratory-data. Retrieved 22 March 2024. 
  7. Wood, S. (September 2007). "Comprehensive Laboratory Informatics: A Multilayer Approach" (PDF). American Laboratory. p. 1. Archived from the original on 22 March 2024. https://web.archive.org/web/20170825181932/https://www.it.uu.se/edu/course/homepage/lims/vt12/ComprehensiveLaboratoryInformatics.pdf. 
  8. 8.0 8.1 Deutsch, S. (31 December 2006). "Tomorrow’s Successful Research Organizations Face a Critical Challenge". R&D World. WTWH Media LLC. http://www.rdworldonline.com/tomorrows-successful-research-organizations-face-a-critical-challenge/. Retrieved 22 March 2024. 
  9. Valle, Mario. "Scientific Data Management". Swiss National Supercomputing Center. Archived from the original on 06 March 2012. http://web.archive.org/web/20120306015034/http://personal.cscs.ch/~mvalle/sdm/scientific-data-management.html. Retrieved 22 March 2024. 
  10. Heyward, J.E. II (5 November 2009). "Selection of a Scientific Data Management System (SDMS) Based on User Requirements". Indiana University-Purdue University Indianapolis. pp. 5. doi:10.7912/C2/812.