Difference between revisions of "Scientific data management system"

From LIMSWiki
Jump to navigationJump to search
m (Cleaned up citations)
(Updated for 2024)
 
(3 intermediate revisions by the same user not shown)
Line 1: Line 1:
[[File:NIST Testing standard interfaces.jpg|right|thumb|NIST tests standard interfaces for its lab equipment. SDMSs allow labs to integrate equipment data with other types of data.]]A '''scientific data management system''' (SDMS) is software that acts as a document management system (DMS), capturing, cataloging, and archiving data generated by [[laboratory]] instruments ([[HPLC]], [[mass spectrometry]]) and applications ([[LIMS]], analytical applications, [[electronic laboratory notebook]]s) in a compliant manner. The SDMS also acts as a gatekeeper, serving platform-independent data to informatics applications and/or other consumers.
[[File:NIST Testing standard interfaces.jpg|right|thumb|360px|NIST tests standard interfaces for its lab instruments. SDMSs allow labs to integrate raw and processed instrument data with other types of data, unstructured and structured.]]A '''scientific data management system''' ('''SDMS''') (occasionally referenced to as a '''laboratory data management system''' ['''LDMS''']<ref name="KranjcIntro21">{{Citation |last=Kranjc |first=Tilen |date=2021-08-16 |editor-last=Zupancic |editor-first=Klemen |editor2-last=Pavlek |editor2-first=Tea |editor3-last=Erjavec |editor3-first=Jana |title=Introduction to Laboratory Software Solutions and Differences Between Them |url=https://onlinelibrary.wiley.com/doi/10.1002/9783527825042.ch3 |work=Digital Transformation of the Laboratory |language=en |edition=1 |publisher=Wiley |pages=75–84 |doi=10.1002/9783527825042.ch3 |isbn=978-3-527-34719-3}}</ref><ref name="AvunjianLab23">{{cite web |url=https://www.ligolab.com/post/laboratory-software-systems-what-you-need-to-know-to-make-an-informed-decision |title=Laboratory Software Systems: What You Need to Know to Make an Informed Decision |author=Avunjian, S. |work=LigoLab Blog |publisher=LigoLab Information Systems |date=17 November 2023 |accessdate=22 March 2024}}</ref>) is software that acts similarly to a document management system (DMS), capturing, cataloging, and archiving data generated by [[laboratory]] instruments (e.g., [[high-performance liquid chromatography]] and [[mass spectrometry]] instruments) and applications (e.g., [[laboratory information management system]]s, [[electronic laboratory notebook]]s, and other analytical applications) in a compliant, often pre-defined manner best suitable for its intended use, whether it be structured, unstructured, or semi-structured data.<ref name="HaywardExperts17">{{cite web |url=https://www.laboratoryequipment.com/article/2017/05/experts-explain-rise-laboratory-data-lakes |archiveurl=https://web.archive.org/web/20170516235859/http://www.laboratoryequipment.com/article/2017/05/experts-explain-rise-laboratory-data-lakes |title=Experts Explain: The Rise of Laboratory Data Lakes |author=Hayward, S. |work=Laboratory Equipment |publisher=Advantage Business Media |date=15 May 2017 |archivedate=16 May 2017 |accessdate=22 March 2024}}</ref><ref name="ASTME1578">{{cite web |url=https://www.astm.org/e1578-18.html |title=ASTM E1578-18 Standard Guide for Laboratory Informatics |publisher=ASTM International |date=23 August 2019 |accessdate=22 March 2024}}</ref> The SDMS can also act as a gatekeeper, serving platform-independent data to informatics applications and other stakeholders.


As with many other [[laboratory informatics]] tools, the lines between a [[LIMS]], [[ELN]], and an SDMS are at times blurred. However, there are some essential qualities that an SDMS owns that distinguishes it from other informatics systems:
==Purpose and technology==
An SDMS is used to improve data handling and management issues in a number of scientific disciplines. As the four Vs of modern big data—volume, variety, veracity, and velocity—increase time spent on data acquisition and management, taking time away from other aspects of scientific research and complicating aspects of experimental reproducibility, solutions like an SDMS can help better manage the total lifecycle of data.<ref name=StansberryDataFed19">{{Cite journal |last=Stansberry |first=Dale |last2=Somnath |first2=Suhas |last3=Breet |first3=Jessica |last4=Shutt |first4=Gregory |last5=Shankar |first5=Mallikarjun |date=2019-12 |title=DataFed: Towards Reproducible Research via Federated Data Management |url=https://ieeexplore.ieee.org/document/9071425/ |journal=2019 International Conference on Computational Science and Computational Intelligence (CSCI) |publisher=IEEE |place=Las Vegas, NV, USA |pages=1312–1317 |doi=10.1109/CSCI49370.2019.00245 |isbn=978-1-7281-5584-5}}</ref> This is accomplished through a variety of tools, including data normalization and integration, data sharing and management, metadata capture and management, data object and record management, and robust search tools.<ref name=StansberryDataFed19" />


1. While a LIMS has traditionally been built to handle structured, mostly homogeneous data, a SDMS (and systems like it) is built to handle unstructured, mostly heterogeneous data.<ref name="ElliottConsider03">{{cite web |url=https://www.scientificcomputing.com/article/2003/10/considerations-management-laboratory-data |title=Considerations for Management of Laboratory Data |author=Elliott, M.H. |work=Scientific Computing |publisher=Advantage Business Media |date=31 October 2003 |accessdate=29 September 2017}}</ref>
As with many other [[laboratory informatics]] tools, the lines between an SDMS, LIMS, ELN, and other systems are at times blurred, as functionality from these systems makes their way into each other.<ref name="KranjcIntro21" /><ref name="ASTME1578" /> However, there are some essential qualities that an SDMS owns that distinguishes it from other informatics systems:


2. A SDMS typically acts as a seamless "wrapper" for other data systems like LIMS and ELN in the laboratory, though sometimes the SDMS software is readily apparent.
1. While a LIMS has traditionally been built to handle structured, mostly homogeneous data, an SDMS (and systems like it) is built to handle unstructured, mostly heterogeneous data<ref name="ElliottConsider03">{{cite web |url=https://www.scientificcomputing.com/article/2003/10/considerations-management-laboratory-data |archiveurl=https://web.archive.org/web/20170426150419/http://www.scientificcomputing.com/article/2003/10/considerations-management-laboratory-data |title=Considerations for Management of Laboratory Data |author=Elliott, M.H. |work=Scientific Computing |publisher=Advantage Business Media |date=31 October 2003 |archivedate=26 April 2017 |accessdate=22 March 2024}}</ref>, though many can handle structured, unstructured, and semi-structured data.


3. A SDMS is designed primarily for data consolidation, knowledge management, and knowledge asset realization.<ref name="WoodComp07">{{cite web |url=https://www.it.uu.se/edu/course/homepage/lims/vt12/ComprehensiveLaboratoryInformatics.pdf |archiveurl=https://web.archive.org/web/20170825181932/https://www.it.uu.se/edu/course/homepage/lims/vt12/ComprehensiveLaboratoryInformatics.pdf |format=PDF |title=Comprehensive Laboratory Informatics: A Multilayer Approach |author=Wood, S. |work=American Laboratory |page=1 |date=September 2007 |archivedate=25 August 2017}}</ref>
2. An SDMS typically acts as a seamless "wrapper" for other data systems like LIMS and ELN in the laboratory, though sometimes the SDMS software is readily apparent.


An SDMS can be seen as one potential solution for handling unstructured data, which can make up nearly 75 percent of a research and development unit's data.<ref name="SciComp1">{{cite web |url=https://www.scientificcomputing.com/article/2006/12/tomorrow%E2%80%99s-successful-research-organizations-face-critical-challenge |author=Deutsch, S. |title=Tomorrow’s Successful Research Organizations Face a Critical Challenge |work=Scientific Computing |publisher=Advantage Business Media |date=31 December 2006 |accessdate=29 September 2017}}</ref> This includes PDF files, images, instrument data, spreadsheets, and other forms of data rendered in many environments in the laboratory. Traditional SDMSs have focused on acting as a nearly invisible blanket or wrapper that integrate [[information]] from corporate offices (standard operating procedures, safety documents, etc.) with data from lab devices and other data management tools, all to be indexed and searchable from a central database. An SDMS also must be focused on increasing research productivity without sacrificing data sharing and collaboration efforts.<ref name="SciComp1" />
3. An SDMS is designed primarily for data consolidation and reuse, knowledge integration and management, and knowledge asset discovery and realization.<ref name="ASTME1578" /><ref name="WoodComp07">{{cite web |url=https://www.it.uu.se/edu/course/homepage/lims/vt12/ComprehensiveLaboratoryInformatics.pdf |archiveurl=https://web.archive.org/web/20170825181932/https://www.it.uu.se/edu/course/homepage/lims/vt12/ComprehensiveLaboratoryInformatics.pdf |format=PDF |title=Comprehensive Laboratory Informatics: A Multilayer Approach |author=Wood, S. |work=American Laboratory |page=1 |date=September 2007 |archivedate=22 March 2024}}</ref>


Some of the things a standard SDMS may be asked to do include, but are not limited to<ref name="SDMArch">{{cite web |url=http://personal.cscs.ch/~mvalle/sdm/scientific-data-management.html |archiveurl=http://web.archive.org/web/20120306015034/http://personal.cscs.ch/~mvalle/sdm/scientific-data-management.html |author=Valle, Mario |title=Scientific Data Management |publisher=Swiss National Supercomputing Center |archivedate=06 March 2012 |accessdate=05 March 2013}}</ref><ref name="HetwardSelect09">{{cite web |url=https://scholarworks.iupui.edu/handle/1805/2000 |title=Selection of a Scientific Data Management System (SDMS) Based on User Requirements |author=Heyward, J.E. II |publisher=Indiana University-Purdue University Indianapolis |date=05 November 2009 |pages=5 |accessdate=29 September 2017}}</ref>:
An SDMS can be seen as one potential solution for handling unstructured data, which can make up nearly 75 percent of a research and development unit's data.<ref name="SciComp1">{{cite web |url=http://www.rdworldonline.com/tomorrows-successful-research-organizations-face-a-critical-challenge/ |author=Deutsch, S. |title=Tomorrow’s Successful Research Organizations Face a Critical Challenge |work=R&D World |publisher=WTWH Media LLC |date=31 December 2006 |accessdate=22 March 2024}}</ref> This includes PDF files, images, instrument data, spreadsheets, and other forms of data rendered in many environments in the laboratory. Traditional SDMSs have focused on acting as a nearly invisible blanket or wrapper that integrate [[information]] from corporate offices (standard operating procedures, safety documents, etc.) with data from lab devices and other data management tools, all to be indexed and searchable from a central database. An SDMS also must be focused on increasing research productivity without sacrificing data sharing and collaboration efforts.<ref name="SciComp1" />


* retrieve worklists from LIMS and convert them to sequence files
Some of the things a standard SDMS may be asked to do include, but are not limited to<ref name="ASTME1578" /><ref name=StansberryDataFed19" /><ref name="SDMArch">{{cite web |url=http://personal.cscs.ch/~mvalle/sdm/scientific-data-management.html |archiveurl=http://web.archive.org/web/20120306015034/http://personal.cscs.ch/~mvalle/sdm/scientific-data-management.html |author=Valle, Mario |title=Scientific Data Management |publisher=Swiss National Supercomputing Center |archivedate=06 March 2012 |accessdate=22 March 2024}}</ref><ref name="HetwardSelect09">{{cite web |title=Selection of a Scientific Data Management System (SDMS) Based on User Requirements |author=Heyward, J.E. II |publisher=Indiana University-Purdue University Indianapolis |date=05 November 2009 |pages=5 |doi=10.7912/C2/812 |accessdate=22 March 2024}}</ref>:
* interact real-time with simple and complex laboratory instruments
* analyze and create reports on laboratory instrument functions
* perform complex calculations and comparisons of two different sample groups
* monitor environmental conditions and react when base operating parameters are out of range
* act as an operational database that allows selective importation/exportation of ELN data
* manage workflows based on data imported into the SDMS
* validate other computer systems and software in the laboratory


==SDMS vendors==
*store and archive raw data files;
*interact real-time with simple and complex laboratory instruments;
*retrieve worklists from LIMS and convert them to sequence files;
*require review and approval of actions, with electronic signatures;
*capture provenance information of data;
*allow for annotation of records and collections to inform or warn other users of relevant changes or errors;
*analyze and create reports on laboratory instrument functions;
*perform complex calculations and comparisons of two different sample groups;
*monitor environmental conditions and react when base operating parameters are out of range;
*act as an operational database that allows selective importation/exportation of ELN data;
*manage workflows based on data imported into the SDMS;
*validate other computer systems and software in the laboratory; and
*identify and retrieve data and metadata useful for training [[artificial intelligence]] and [[machine learning]] agents.


See the [[SDMS vendor]] page for a list of SDMS vendors past and present.
==Further reading==
 
*{{Cite journal |last=Stansberry |first=Dale |last2=Somnath |first2=Suhas |last3=Breet |first3=Jessica |last4=Shutt |first4=Gregory |last5=Shankar |first5=Mallikarjun |date=2019-12 |title=DataFed: Towards Reproducible Research via Federated Data Management |url=https://ieeexplore.ieee.org/document/9071425/ |journal=2019 International Conference on Computational Science and Computational Intelligence (CSCI) |publisher=IEEE |place=Las Vegas, NV, USA |pages=1312–1317 |doi=10.1109/CSCI49370.2019.00245 |isbn=978-1-7281-5584-5}}
== References ==
==References==
<references />
{{Reflist|colwidth=30em}}


<!---Place all category tags here-->
<!---Place all category tags here-->
[[Category:Laboratory informatics]]
[[Category:Laboratory informatics]]
[[Category:Software systems]]
[[Category:Software systems]]

Latest revision as of 16:14, 22 March 2024

NIST tests standard interfaces for its lab instruments. SDMSs allow labs to integrate raw and processed instrument data with other types of data, unstructured and structured.

A scientific data management system (SDMS) (occasionally referenced to as a laboratory data management system [LDMS][1][2]) is software that acts similarly to a document management system (DMS), capturing, cataloging, and archiving data generated by laboratory instruments (e.g., high-performance liquid chromatography and mass spectrometry instruments) and applications (e.g., laboratory information management systems, electronic laboratory notebooks, and other analytical applications) in a compliant, often pre-defined manner best suitable for its intended use, whether it be structured, unstructured, or semi-structured data.[3][4] The SDMS can also act as a gatekeeper, serving platform-independent data to informatics applications and other stakeholders.

Purpose and technology

An SDMS is used to improve data handling and management issues in a number of scientific disciplines. As the four Vs of modern big data—volume, variety, veracity, and velocity—increase time spent on data acquisition and management, taking time away from other aspects of scientific research and complicating aspects of experimental reproducibility, solutions like an SDMS can help better manage the total lifecycle of data.[5] This is accomplished through a variety of tools, including data normalization and integration, data sharing and management, metadata capture and management, data object and record management, and robust search tools.[5]

As with many other laboratory informatics tools, the lines between an SDMS, LIMS, ELN, and other systems are at times blurred, as functionality from these systems makes their way into each other.[1][4] However, there are some essential qualities that an SDMS owns that distinguishes it from other informatics systems:

1. While a LIMS has traditionally been built to handle structured, mostly homogeneous data, an SDMS (and systems like it) is built to handle unstructured, mostly heterogeneous data[6], though many can handle structured, unstructured, and semi-structured data.

2. An SDMS typically acts as a seamless "wrapper" for other data systems like LIMS and ELN in the laboratory, though sometimes the SDMS software is readily apparent.

3. An SDMS is designed primarily for data consolidation and reuse, knowledge integration and management, and knowledge asset discovery and realization.[4][7]

An SDMS can be seen as one potential solution for handling unstructured data, which can make up nearly 75 percent of a research and development unit's data.[8] This includes PDF files, images, instrument data, spreadsheets, and other forms of data rendered in many environments in the laboratory. Traditional SDMSs have focused on acting as a nearly invisible blanket or wrapper that integrate information from corporate offices (standard operating procedures, safety documents, etc.) with data from lab devices and other data management tools, all to be indexed and searchable from a central database. An SDMS also must be focused on increasing research productivity without sacrificing data sharing and collaboration efforts.[8]

Some of the things a standard SDMS may be asked to do include, but are not limited to[4][5][9][10]:

  • store and archive raw data files;
  • interact real-time with simple and complex laboratory instruments;
  • retrieve worklists from LIMS and convert them to sequence files;
  • require review and approval of actions, with electronic signatures;
  • capture provenance information of data;
  • allow for annotation of records and collections to inform or warn other users of relevant changes or errors;
  • analyze and create reports on laboratory instrument functions;
  • perform complex calculations and comparisons of two different sample groups;
  • monitor environmental conditions and react when base operating parameters are out of range;
  • act as an operational database that allows selective importation/exportation of ELN data;
  • manage workflows based on data imported into the SDMS;
  • validate other computer systems and software in the laboratory; and
  • identify and retrieve data and metadata useful for training artificial intelligence and machine learning agents.

Further reading

References

  1. 1.0 1.1 Kranjc, Tilen (16 August 2021), Zupancic, Klemen; Pavlek, Tea; Erjavec, Jana, eds., "Introduction to Laboratory Software Solutions and Differences Between Them" (in en), Digital Transformation of the Laboratory (Wiley): 75–84, doi:10.1002/9783527825042.ch3, ISBN 978-3-527-34719-3, https://onlinelibrary.wiley.com/doi/10.1002/9783527825042.ch3 
  2. Avunjian, S. (17 November 2023). "Laboratory Software Systems: What You Need to Know to Make an Informed Decision". LigoLab Blog. LigoLab Information Systems. https://www.ligolab.com/post/laboratory-software-systems-what-you-need-to-know-to-make-an-informed-decision. Retrieved 22 March 2024. 
  3. Hayward, S. (15 May 2017). "Experts Explain: The Rise of Laboratory Data Lakes". Laboratory Equipment. Advantage Business Media. Archived from the original on 16 May 2017. https://web.archive.org/web/20170516235859/http://www.laboratoryequipment.com/article/2017/05/experts-explain-rise-laboratory-data-lakes. Retrieved 22 March 2024. 
  4. 4.0 4.1 4.2 4.3 "ASTM E1578-18 Standard Guide for Laboratory Informatics". ASTM International. 23 August 2019. https://www.astm.org/e1578-18.html. Retrieved 22 March 2024. 
  5. 5.0 5.1 5.2 Stansberry, Dale; Somnath, Suhas; Breet, Jessica; Shutt, Gregory; Shankar, Mallikarjun (1 December 2019). "DataFed: Towards Reproducible Research via Federated Data Management". 2019 International Conference on Computational Science and Computational Intelligence (CSCI) (Las Vegas, NV, USA: IEEE): 1312–1317. doi:10.1109/CSCI49370.2019.00245. ISBN 978-1-7281-5584-5. https://ieeexplore.ieee.org/document/9071425/. 
  6. Elliott, M.H. (31 October 2003). "Considerations for Management of Laboratory Data". Scientific Computing. Advantage Business Media. Archived from the original on 26 April 2017. https://web.archive.org/web/20170426150419/http://www.scientificcomputing.com/article/2003/10/considerations-management-laboratory-data. Retrieved 22 March 2024. 
  7. Wood, S. (September 2007). "Comprehensive Laboratory Informatics: A Multilayer Approach" (PDF). American Laboratory. p. 1. Archived from the original on 22 March 2024. https://web.archive.org/web/20170825181932/https://www.it.uu.se/edu/course/homepage/lims/vt12/ComprehensiveLaboratoryInformatics.pdf. 
  8. 8.0 8.1 Deutsch, S. (31 December 2006). "Tomorrow’s Successful Research Organizations Face a Critical Challenge". R&D World. WTWH Media LLC. http://www.rdworldonline.com/tomorrows-successful-research-organizations-face-a-critical-challenge/. Retrieved 22 March 2024. 
  9. Valle, Mario. "Scientific Data Management". Swiss National Supercomputing Center. Archived from the original on 06 March 2012. http://web.archive.org/web/20120306015034/http://personal.cscs.ch/~mvalle/sdm/scientific-data-management.html. Retrieved 22 March 2024. 
  10. Heyward, J.E. II (5 November 2009). "Selection of a Scientific Data Management System (SDMS) Based on User Requirements". Indiana University-Purdue University Indianapolis. pp. 5. doi:10.7912/C2/812.