Difference between revisions of "Journal:MASTR-MS: A web-based collaborative laboratory information management system (LIMS) for metabolomics"

From LIMSWiki
Jump to navigationJump to search
(Created stub. Saving and adding more.)
 
m
 
(5 intermediate revisions by the same user not shown)
Line 6: Line 6:
|title_full  = MASTR-MS: A web-based collaborative laboratory information management system (LIMS) for metabolomics
|title_full  = MASTR-MS: A web-based collaborative laboratory information management system (LIMS) for metabolomics
|journal      = ''Metabolomics''
|journal      = ''Metabolomics''
|authors      = Hunter, A.; Dayalan, S.; De Souza, D.; Power, B.; Lorrimar, R.; Szabo, T.; Nguyen, T.; O'Callaghan, S.; Hack, J.; Pyke, J.; Nahid, A.; Barrero, R.; Roessner, U.; Likic, V.; Tull, D.; Bacic, A.; McConville, M.; Bellgard, M.
|authors      = Hunter, A.; Dayalan, S.; De Souza, D.; Power, B.; Lorrimar, R.; Szabo, T.; Nguyen, T.; O'Callaghan, S.; Hack, J.;<br />Pyke, J.; Nahid, A.; Barrero, R.; Roessner, U.; Likic, V.; Tull, D.; Bacic, A.; McConville, M.; Bellgard, M.
|affiliations = Murdoch University, The University of Melbourne, The Australian Wine Research Institute
|affiliations = Murdoch University, The University of Melbourne, The Australian Wine Research Institute
|contact      = Email: malcolmm at unimelb dot edu dot au -or- mbellgard at ccg dot murdoch dot edu dot au
|contact      = Email: malcolmm at unimelb dot edu dot au -or- mbellgard at ccg dot murdoch dot edu dot au
Line 19: Line 19:
|download    = [https://link.springer.com/content/pdf/10.1007%2Fs11306-016-1142-2.pdf https://link.springer.com/content/pdf/10.1007%2Fs11306-016-1142-2.pdf] (PDF)
|download    = [https://link.springer.com/content/pdf/10.1007%2Fs11306-016-1142-2.pdf https://link.springer.com/content/pdf/10.1007%2Fs11306-016-1142-2.pdf] (PDF)
}}
}}
{{ombox
 
| type      = content
| style    = width: 500px;
| text      = This article should not be considered complete until this message box has been removed. This is a work in progress.
}}
==Abstract==
==Abstract==
===Background===
===Background===
Line 29: Line 25:


===Results===
===Results===
Here we present [[MASTR-MS]], a downloadable and installable LIMS solution that can be deployed either within a single laboratory or used to link workflows across a multisite network. It comprises a node management system that can be used to link and manage projects across one or multiple collaborating laboratories; a user management system which defines different user groups and privileges of users; a quote management system where client quotes are managed; a project management system in which metadata is stored and all aspects of project management, including experimental setup, sample tracking and instrument analysis, are defined; and a data management system that allows the automatic capture and storage of raw and processed data from the analytical instruments to the LIMS.
Here we present [[MASTR-MS]], a downloadable and installable LIMS solution that can be deployed either within a single laboratory or used to link workflows across a multisite network. It comprises the Node Management System that can be used to link and manage projects across one or multiple collaborating laboratories; the User Management System which defines different user groups and privileges of users; the Quote Management System where client quotes are managed; the Project Management System in which metadata is stored and all aspects of project management, including experimental setup, sample tracking and instrument analysis, are defined; and the Data Management System that allows the automatic capture and storage of raw and processed data from the analytical instruments to the LIMS.


===Conclusion===
===Conclusion===
Line 38: Line 34:


==Introduction==
==Introduction==
Metabolomic approaches aim to detect and quantitate levels of all small molecules in a biological system and, together with other "omic" approaches, can be used to generate a systems-wide understanding of biological processes. Metabolomic approaches typically involve the use of advanced mass spectrometry and nuclear magnetic resonance (NMR) spectrometry platforms to maximize coverage of the chemically diverse metabolites that make up biological systems. In many cases, these analytical platforms are located in institutional and/or national core facilities that offer a range of metabolomics capabilities to researchers.<ref name="MetabolomicsAust">{{cite web |url=http://www.metabolomics.net.au/ |title=Metabolomics Australia |publisher=Metabolomics Australia |accessdate=05 December 2014}}</ref><ref name="MetabolomicsInnovCent">{{cite web |url=http://www.metabolomicscentre.ca/ |title=The Metabolomics Inovation Centre |publisher=University of Alberta |accessdate=05 December 2014}}</ref><ref name="NIHMetabolomics">{{cite web |url=http://commonfund.nih.gov/metabolomics/index |title=Metabolomics |work=The Common Fund |publisher=National Institutes of Health, Office of Strategic Coordination |accessdate=05 December 2014}}</ref><ref name="MetaboHUB">{{cite web |url=http://www.metabohub.fr/ |title=MetaboHUB |publisher=MetaboHUB Centre INRA Bordeaux - Aquitaine |accessdate=05 December 2014}}</ref><ref name="ECGuidelines16">{{cite web |url=http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf |format=PDF |title=Guidelines on FAIR Data Management in Horizon 2020 - Version 3.0 |publisher=European Commission |date=26 July 2016 |accessdate=05 December 2014}}</ref> These core facilities, as well as individual research groups with sophisticated metabolomics infrastructure and capability are faced with the challenge of tracking large numbers of samples and the associated metadata, and linking this information with the raw datasets generated by multiple analytical platforms, as well as processed down-stream data sets. Data handling extends beyond collection and curation of raw data, to the management of metadata that defines how the raw data is generated. Major funding agencies, such as Europe’s Horizon 2020<ref name="ECGuidelines16" />, the NIH<ref name="NIHDataSharing03">{{cite web |url=https://grants.nih.gov/grants/policy/data_sharing/data_sharing_guidance.htm |title=NIH Data Sharing Policy and Implementation Guidance |work=Grants and Funding |publisher=National Institutes of Health |date=05 March 2003 |accessdate=05 December 2014}}</ref>, The Wellcome Trust<ref name="WTGuidance14">{{cite web |url=http://www.wellcome.ac.uk/About-us/Policy/Spotlight-issues/Data-sharing/Guidance-for-researchers/index.htm |archiveurl=https://web-beta.archive.org/web/20141018165611/http://www.wellcome.ac.uk/About-us/Policy/Spotlight-issues/Data-sharing/Guidance-for-researchers/index.htm |title=Guidance for researchers: Developing a data management and sharing plan |work=Policy and position statements |publisher=Wellcome Trust |date=2014 |archivedate=18 October 2014 |accessdate=05 December 2014}}</ref> and Australia’s NHMRC<ref name="AGNHMRC_AustralianCode07">{{cite web |url=https://www.nhmrc.gov.au/guidelines-publications/r39 |title=Australian Code for the Responsible Conduct of Research |publisher=National Health and Medical Research Council, Australia |date=2007 |accessdate=05 December 2014}}</ref> have established data management plans that researchers are expected to follow in order to capture, store and share data generated by their grants. Scientific journals are also increasingly requesting that experimental data and metadata associated with metabolomics experiments are made available to the scientific community<ref name="NatureDPs">{{cite web |url=https://www.nature.com/sdata/policies/data-policies |title=Data Policies |work=Nature |publisher=Macmillan Publishers Limited |accessdate=05 December 2014}}</ref><ref name="GigaScienceResearch">{{cite web |url=http://www.gigasciencejournal.com:80/authors/instructions/research |archiveurl=https://web.archive.org/web/20140515012736/http://www.gigasciencejournal.com:80/authors/instructions/research |title=Instructions for authors: Research Articles |work=GigaScience |publisher=BioMed Central Ltd |date=2014 |archivedate=15 May 2014 |accessdate=05 December 2014}}</ref>, leading to the establishment of data repositories, such as MetaboLights<ref name="HaugMetabo13">{{cite journal |title=MetaboLights--An open-access general-purpose repository for metabolomics studies and associated meta-data |journal=Nucleic Acids Research |author=Haug, K.; Salek, R.M.; Conesa, P. |volume=41 |issue=D1 |pages=D781-6 |year=2013 |doi=10.1093/nar/gks1004 |pmid=23109552 |pmc=PMC3531110}}</ref> and Metabolomics Workbench.<ref name="UCSDMetab">{{cite web |url=http://www.metabolomicsworkbench.org/ |title=Metabolics Workbench |publisher=University of California San Diego |accessdate=05 December 2014}}</ref>
LIMS are software solutions that aim to manage the entire workflow of a laboratory. A number of LIMS have been developed or adapted from other applications for curating metabolomics experiments and data management (e.g., SetupX, [[Sesame LIMS|Sesame]]). While these LIMS have features that allow capture of project metadata, experiments and samples, data storage, and data sharing, they exhibit a number of limitations around their capacity to accommodate different vendor instruments and have restricted functionalities to facilitate a collaborative configuration between geographically distributed laboratories. In this paper we present MASTR-MS, the first wholly functional, open-source LIMS solution specifically designed for metabolomics laboratories.
==Materials and methods==
MASTR-MS runs as a Python<ref name="Python">{{cite web |url=https://www.python.org/ |title=Python |publisher=Python Software Foundation |accessdate=05 December 2014}}</ref> web application built on the Django<ref name="Django">{{cite web |url=https://www.djangoproject.com/ |title=Django |publisher=Django Software Foundation |accessdate=05 December 2014}}</ref> framework, utilising a [[PostgreSQL]]<ref name="PostgreSQL">{{cite web |url=https://www.postgresql.org/ |title=PostgreSQL |publisher=PostgreSQL Global Development Group |accessdate=05 December 2014}}</ref> or [[MySQL]]<ref name="MySQL">{{cite web |url=https://www.mysql.com/ |title=MySQL |publisher=Oracle Corporation |accessdate=05 December 2014}}</ref> relational database. MASTR-MS leverages the functionality of the Django framework for user management, users permissions and security. Django is a mature web framework and provides multiple security tools and mechanisms. For example, specific protection is provided against cross-site scripting (XSS), cross-site request forgery (CSRF), SQL injection and clickjacking. A security middleware is also used to enforce SSL/HTTPS for all traffic. MASTR-MS is built using open-source components and communicates using open standards. The client side browser interface leverages Javascript and AJAX for fluid data display and submission, giving a user experience much like a desktop application, but with the flexibility of being available from any internet-connected location on any operating system, with no client-side download or installation.
The DataSync Client is a small desktop application that runs on an instrument’s acquisition computer. This software constantly communicates with the MASTR-MS server and is responsible for transferring raw data from the acquisition computer to the MASTR-MS repository (Supplemental Fig. S9A). The DataSync Client is written in the Python programming language using the wxWidgets<ref name="wxWidgets">{{cite web |url=https://www.wxwidgets.org/ |title=wxWidgets |publisher=wxWidgets Development Team |accessdate=10 November 2016}}</ref> GUI library and runs on Windows and Linux systems. Data is uploaded using the rsync protocol<ref name="rsync">{{cite web |url=https://rsync.samba.org/ |title=rsync |publisher=Wayne Davison |accessdate=05 December 2014}}</ref> and the libraries and plugins required for this are included in the installation package.
As the MASTR-MS server-side component is written in the Python 2.7 programming language, any operating system that has Python 2.7 available for running web applications with a web server can run the application. In practice, the application has only been tested on the Linux operating system and the Apache web server. For installation, operating system packages are available in RPM format for CentOS 6.5. Similarly, as the DataSync Client is also written in Python 2.7 it can run on any operating system that has Python 2.7 available. However it is typically installed on a Windows platform with a connected analytical instrument. For this reason, the DataSync Client is distributed as a Windows executable (.exe) installer. The DataSync Client application is also self-updating by means of a user option to upgrade to a newer version if available.
==Results==
MASTR-MS is a web-based LIMS solution for metabolomics laboratories. The different modules of MASTR-MS allow users to:
* Track all metabolomics samples and associated meta-, analytical- and processed data sets. This starts from the capture of client/collaborator communication; the establishment of new projects, experimental design and sample definitions; and the automatic capture of raw data generated by the instruments.
* Develop an electronic notebook, where users record all relevant information about projects and experiments in MASTR-MS, thus allowing multiple users to work on the same project.
* Methodically manage the vast amount of data generated by the analytical instruments, by associating it with the project, experiment and sample details.
* Facilitate collaboration between geographically distributed laboratories through the sharing of projects and experiment data.
MASTR-MS is equally suited for use in either a large core facility or single-/multi-laboratory environment. Thus, both large national facilities and small individual laboratories would equally benefit from using MASTR-MS.
MASTR-MS comprises five major modules: (1) the Node Management System, (2) the User Management System, (3) the Quote Management System, (4) the Project Management System and (5) the Data Management System. Figure 1 shows the workflow of MASTR-MS using the different functionalities and features. These functions are described in detail below. The user is initially connected to the dashboard when they first log into MASTR-MS, and the available functions are tailored to the level of access of the user. The dashboard gives an “at-a-glance” summary of recent activity on the site and items requiring attention. Depending on the user’s status/level of access, the dashboard shows pending user requests, quotes requiring attention, recently created / modified projects, and recently created / modified experiments.
[[File:Fig1 Hunter Metabolomics2017 13-2.gif|384px]]
{{clear}}
{|
| STYLE="vertical-align:top;"|
{| border="0" cellpadding="5" cellspacing="0" width="384px"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"| <blockquote>'''Figure 1.''' Overview of MASTR-MS system workflow</blockquote>
|-
|}
|}
===Node management system===
This module allows the addition of multiple laboratories to be part of a single MASTR-MS network. For example, a group of geographically dispersed laboratories can have a single deployment of MASTR-MS and share projects and experiments. Such a setup would be established by the module through the generation of different nodes. On the other hand, MASTR-MS can be used within a single laboratory environment in which this module would comprise a single node.
===User management system===
This module defines the different user groups used in MASTR-MS. Each user group has different privileges and permissions to access the different functionalities of MASTR-MS. In addition, this module allows the generation and management of users of the system. MASTR-MS has several user groups.
====Systems administrator====
This user group has access to all functionalities of MASTR-MS. There would normally be one assigned Systems Administrator who would act as the query point for all other users accessing the system, although it is possible to have more than one Systems Administrator. The Systems Administrator has a Laboratory Name assigned to their account (like all other users), allowing a nominated user, usually a member of the organization/laboratory that is hosting the project to act as the Systems Administrator. The Systems Administrator can add new users to the system, assign user groups to any users in any laboratory, edit details of users and delete users of any laboratory.
====Administrator====
This user group has full access to all projects, experiments and experimental data, user accounts and quotes within MASTR-MS, regardless of node. This user group allows selected users to view all projects and experiments across different nodes, allowing seamless sharing and collaboration of data across nodes. Where multiple laboratories have a single MASTR-MS deployment, but prefer not to share projects and experiments, no users would be assigned the Administrator role.
====Node representative====
This user group has full access to quotes for their node and are the preferred contact for quotes and projects run by this node (detailed more in the "Quote management system" section). In a multi-node setup there would typically be at least one user assigned to this group per node.
====Project leader====
This user group is able to create new projects and experiments for their node. Additionally, this group is able to assign staff to specific projects and experiments.
====Staff====
Users of this group are able to participate in the projects and experiments for their node.
====Client====
All other users of the system are clients. This group has no privileges other than viewing the progress of projects to which they have been assigned.
Any user of the system can update their own user record and change their password at any time.
===Quote management system===
This module was designed specifically for core facilities that provide metabolomic services to client researchers. Potential clients can request a pricing quote for running samples of an experiment through the quote request system without having to sign up for an account. At a nominated stage, clients are required to register in MASTR-MS by completing a short information dialog box. This module allows collection of contact details and information about the nature of the request. Files in various formats can be attached to this module. In a multi-node facility, the user can either direct their quote to a specific node with relevant expertise or they can select "Don’t Know" to have all the Node Representatives alerted.
Quote requests made by clients and collaborators that are made through the system are tracked and marked if they have not been attended to yet, so that Node Representatives can quickly see new quotes which require attention. Quotes can only be seen by members of the node to which they were sent, unless the "Don’t Know" option was selected. Node Representatives are able to forward quotes to other nodes if required. The Node Representatives can then begin a dialogue with the potential client and with their team, clarifying the task, and providing formal quotes, attached as PDFs if necessary. Each step of the communications process is time-stamped and tracked within this module. The quote requests and any resulting quotes would eventually be associated with a project and experiment through a selection option in the Experimental Design stage. All documentation relating to the project, including the client and quote issued for the project, along with the project and experimental setup, is thus kept together.
===Project management system===
This module allows the management of projects, experiments, and samples as well as the creation of analytical sample runs. As detailed above, users of different user groups are able to create projects and experiments. When a project is created by either a MASTR-MS Administrator or Project Leader, it can be linked to a specific client from the user list. This allows the client to monitor how the project is progressing. Assigning a Project Manager to the project allows those users to manage all aspects of a project, experiment creation and further access control on an experiment-by-experiment basis (Supplemental Fig. S4). As sample metadata is linked to all experiments within MASTR-MS, sample classes and/or individual samples can be organized into groups and subsequently analyzed on an instrument.
====Experiment details====
The Experiment Status defaults to "New" when first opened, and all experiment metadata is captured in this field (Supplemental Fig. S5A). Once the experiment design has been completed, the Project Manager can change the setting to "Designed" to prevent further changes. The experiment can also be linked to a formal quote that has been previously entered in the quotes system, and if needed, can be assigned an internal job number.
====Access control/roles====
Users can be assigned to an experiment, giving them access to edit the experimental workflow and create samples and runs. Client users can also be added here, giving them access to project progress information (Supplemental Fig. S5B).
====Sample metadata====
MASTR-MS uses sample metadata in order to generate sample classes, which can then be populated with individual samples (Supplemental Fig. S5C).
====Origin/organs/parts metadata====
The first metadata category is the Origin field, which contains information on sample origin and preparation (Supplemental Fig. S5D). Different metadata fields are available depending on whether the source is Microbial, Plant, Animal, Human, Synthetic, or Other.
====Timeline/treatment metadata====
MASTR-MS also accepts time course and treatment metadata, where samples have been collected over multiple time points, or after different experimental treatments. The Origin, Timeline, and Treatment fields are then used to automatically generate sample classes.
====Sample preparation====
MASTR-MS allows an upload of a standard operating procedure (SOP) document to be associated with an experiment. Multiple SOPs can be uploaded and additional notes recorded for each. A SOP is linked with methods used during runs at the time of setting up a run. The SOP is linked at the experiment level, and the option of choosing methods is provided under the runs level. This is to incorporate the option where a user would like to run multiple methods during a run (either by resampling the same vial or from a different vial).
====Automatic sample class generation====
Based on the metadata entered in the Origin, Timeline, and Treatment steps, sample classes are automatically generated based on permutations of the available metadata (Supplemental Fig. S7A). If abbreviations have been provided for a particular metadata category, these will be used during sample class generation. Samples can then be created in each sample class.
Samples can then be viewed and collected together to form a run on a designated analytical instrument platform (Supplemental Fig. S7B). Additional sample information can be imported via CSV and exported from MASTR-MS in the same way. Samples can be randomized before putting them into a run if desired.
====Runs====
Selected samples are added to a new or existing run by clicking the "Add Selected Samples to Run" button. This will display a dialog allowing the user to add either the samples to a new run or to any previous run which is still unlocked for editing (Supplemental Fig. S8A). Runs continue to be unlocked as long as a worklist has not yet been generated for them. Locked runs can be edited and reused if needed using the “Run Cloning” feature, which will duplicate the run data into a new unlocked run.
====Worklist generation====
The goal of run configuration is to streamline sample analysis and generate instrument worklists in a convenient and flexible manner. After sample data has been added to a run, the order and sequencing of additional run elements (Sweeps, Solvents, etc.) can be added via the Rules Generator.
The Rules Generator provides a customizable set of steps (rules) which dictate how worklists are built. It consists of a Start Block, Sample Block, and End Block, each of which allows the insertion of non-sample components into the worklist. These include Pooled Biological QC, Sweep, Reagent Blank, Solvent Blank and Pure Standard.
The sample block, containing the experiment samples, allows ''n'' components to be inserted every ''m'' samples, in random or position order (Supplemental Fig. S8B). Once all three blocks have been designed, the Rule Generator can be enabled, disabling further editing and making the rule available for inclusion in run worklist generation. Rule Generators can be restricted to use by a single user, an entire node, or everybody on the system. Enabled Rule Generators can be cloned in order to generate a new version, which can then be extended and modified.
To generate a worklist within a run, the user selects an instrument (configured and made available by Administrators) and a Rule Generator, if needed, and clicks the "Generate Worklist" button. Once the worklist is generated, further modification of the run is not possible. The specific worklist format is customizable by site administrators to provide flexibility among various instrument models. Once the worklist is generated, it can be used with the instrument to automate the raw data collection process.
===Data management system===
This module facilitates the capture and storage of raw data produced by the instruments. The raw data is captured by the DataSync Client as detailed below and is linked to associated project and experiment details. In addition, post-processed data and any other related files such as presentations, reports and papers can be linked to the data.
====Data acquisition and the DataSync Client====
The DataSync Client allows data to be transferred from connected instruments at nominated frequencies and will run in the background of the acquisition computers as an icon in the System Tray. The software is fully integrated with the MASTR-MS web application. When data synchronization is requested, either scheduled or manually, the DataSync Client communicates with the MASTR-MS system to query all incomplete experiment runs which have been configured for the connected instrument. It then searches the acquired data for required files and transmits them to the MASTR-MS repository via a configurable rsync transport, allowing compression and check-summing for efficient data transfer. The configuration options for individual DataSync Nodes are fully configurable via the MASTR-MS administration interface.
To enable DataSync Client uploads on the instrument, the user simply selects the connected instrument from the list which has been configured on the MASTR-MS system and enters the Rsync username which they have been assigned (Supplemental Fig. S9B). OpenSSH Public Keys can be uploaded to the MASTR-MS system for secure password-less usage, which allows the client to run seamless automated data uploads without need for operator intervention.
The DataSync Client can also be configured with some advanced options. Data archival allows the raw sample data to be automatically replicated in a specified location (e.g., on another hard disk) once confirmation of upload has been achieved, allowing the original data to be deleted if desired.
The software can also be forced to re-synchronize experiment data that has been marked as complete in case the need arises (Supplemental Fig. S9C).
====Run progress====
As data is synced with the MASTR-MS system, run progress is updated to reflect the number of confirmed files acquired versus the number expected. Once the MASTR-MS system has confirmed that run progress is at 100 percent, the run is marked complete and the run data is available to authorized users for download. Component files and Sample files are available for download separately, and Sample files can be packed into compressed archives (zip, tar.gz, tar.bzip) for efficient download, to minimize download sizes.
MASTR-MS is designed in a generic form such that it accommodates the automatic capture and transfer of any type of data from an acquisition computer to the server. This feature allows MASTR-MS to be used with instruments from different vendors with different file types.
==Discussion==
The systematic tracking, analysis and sharing of complex datasets generated by high-throughput omics technologies such as those used in metabolomics represents a major and expanding challenge. Reliance on outdated methods for recording information about projects, experiments, samples and instruments is cumbersome and error-prone. The methodical management of lab data can be achieved by software solutions such as LIMS and electronic notebook systems. An ideal LIMS solution should be able to manage users and user privileges of the lab; manage the setting up of projects, experiments and samples; and manage the resulting data. It should be able to facilitate sharing of meta/experimental data to other collaborating laboratories. The advantages of using task-specific LIMS over the old manual [[laboratory notebook]] or even simple spreadsheets are enormous. With well-designed systems such as LIMS solutions, search and retrieval becomes easy and efficient, especially in a lab that has been operating for several years, thereby having collected information on hundreds of projects and experiments. In addition, security plays an important role in LIMS solutions. Access to information and data about projects, experiments and samples would be controlled to be accessed only by authorized individuals. Finally, all information can be backed up to secure locations, thereby reducing the risk of accidental loss of data (Table 1).
{|
| STYLE="vertical-align:top;"|
{| class="wikitable" border="1" cellpadding="5" cellspacing="0" width="70%"
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;" colspan="2"|'''Table 1.''' User roles and access privileges
|-
  ! style="background-color:#dddddd; padding-left:10px; padding-right:10px;"|User type
  ! style="background-color:#dddddd; padding-left:10px; padding-right:10px;"|Access privilege
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Administrator
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Complete read-write access to all modules and nodes
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Node representative
  | style="background-color:white; padding-left:10px; padding-right:10px;"|For their specific node, complete read-write access to all modules
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Project manager
  | style="background-color:white; padding-left:10px; padding-right:10px;"|For their specific node, read-write access to only projects and experiments associated with them
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Lab assistant
  | style="background-color:white; padding-left:10px; padding-right:10px;"|For their specific node, read-write access to only experiments associated with them
|-
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Client
  | style="background-color:white; padding-left:10px; padding-right:10px;"|Read access to only experiments associated with them
|-
|}
|}
MASTR-MS is a comprehensive web-based LIMS solution that has been tailor-made for metabolomic experiments and is suitable for implementation within a single laboratory environment or across a multi-node research consortium/core facility. It (a) captures the entire lifecycle of a sample, from project and experimental design to the automatic capture and methodical storage of raw data generated by the multiple analytical instruments; (b) stores metadata about projects, experiments and samples and links the raw data with the metadata; (c) acts as a comprehensive electronic workbook; (d) acts as a storage solution for the vast amount of high throughput data generated by metabolomic experiments and (e) facilitates collaboration between different laboratories.
===Scope of MASTR-MS===
MASTR-MS efficiently manages the lifecycle of a sample, capturing information from client communication through to establishing projects, experiments, samples and continuing to automatic capture of raw data from the analytical instruments. MASTR-MS also stores processed data along with results of any statistical analysis and project reports. By design, MASTR-MS does not provide tools for data processing or statistical analysis, allowing researchers maximum flexibility for data processing and analysis, while allowing processed data to be imported and linked to raw data.
An important function of MASTR-MS is to act as an electronic laboratory notebook. To facilitate this, information is collected through free-flowing text fields. The advantage of this approach is that it allows the users to enter the same types of information that they would enter in their traditional lab notebook. The limitation behind this approach is that the entries are not controlled for ontologies, and therefore adopting to standards becomes challenging. Changing the free text entry to controlled vocabulary and incorporating the current MSI standards, as well as adopting the metabolomics community standards (ISA-Tab, mw-Tab) will be considered in future iterations of MASTR-MS.
===Comparison to similar software===
MASTR-MS offers a number of features that distinguish it from other metabolomics LIMS systems such as SetupX and Sesame. SetupX<ref name="ScholzSetupX07">{{cite journal |title=SetupX -- A public study design database for metabolomic projects |journal=Pacific Symposium on Biocomputing |author=Scholz, M.; Fiehn, O. |volume=2007 |pages=169–80 |year=2007 |pmid=17990490}}</ref> is a web-based metabolomics LIMS solution that is XML compatible and built around a relational database management core. It is particularly oriented towards the capture and display of GC–MS metabolomic data through its metabolic annotation database, BinBase.<ref name="SkogersonTheVol11">{{cite journal |title=The volatile compound BinBase mass spectral database |journal=BMC Bioinformatics |author=Skogerson, K.; Wohlgemuth, G.; Barupal, D.K.; Fiehn, O. |volume=12 |pages=321 |year=2011 |doi=10.1186/1471-2105-12-321 |pmid=21816034 |pmc=PMC3199763}}</ref> SetupX is able to handle a wide variety of BioSources (spatial, historical, environmental and genotypic descriptions of biological objects undergoing metabolomic investigations) and Treatments (experimental alterations that influence the metabolic states of BioSources). Compared to SetupX, MASTR-MS has not associated its input fields to ontologies, although it is intended that this will be incorporated into future versions of MASTR-MS as international standards are increasingly being adopted. Compared to SetupX, MASTR-MS offers the following advantages. It is able to cover multiple collaborating labs with a single deployment; lab-based users can generate the sequence list of samples to be run in the analytical instruments, thereby saving time and reducing the possibility of human errors; raw data generated by analyses is automatically captured by MASTR-MS; the user management system is extensive; and collaborators and clients are able to interact with the nodes using the Quote Management System.
Sesame<ref name="ZolnaiProject03">{{cite journal |url=http://www.springerlink.com/content/p3u654x38832uv73/fulltext.pdf |format=PDF |title=Project management system for structural and functional proteomics: Sesame |journal=Journal of Structural and Functional Genomics |author=Zolnai, Zsolt; Lee, Peter T.; Li, Jing; Chapman, Michael R.; Newman, Craig S.; Phillips Jr., George N.; Rayment, Ivan; Ulrich, Eldon L.; Volkman, Brian F.; Markley, John L. |date=January 2003 |volume=4 |issue=1 |pages=11–23 |doi=10.1023/A:1024684404761}}</ref> is also a web-based, platform-independent LIMS. It is based on Java CORBA, a commercial and open-source RDBMS, and was originally developed to facilitate NMR-based structural genomics studies.<ref name="Markley">{{cite journal |title=New bioinformatics resources for metabolomics |journal=Pacific Symposium on Biocomputing |author=Markley, J.L.; Anderson, M.E.; Cui, Q. et al. |volume=2007 |pages=157–168 |year=2007 |pmid=17990489}}</ref> The Sesame module for metabolomics is called "Lamp." The Lamp module was originally designed to process NMR metabolomic analyses of ''Arabidopsis'', although it is flexible enough to be easily adapted to other biological systems and other analytical methods. It consists of a number of different "Views" which provide details about the data, the instruments, and system resources used in a given study. In Sesame, the Views are designed to operate on various kinds of data, and facilitate data capture, editing, processing, analysis, retrieval and report generation. Sesame is a broad LIMS solution whose origins are in structural and functional proteomics, managing data from NMR platforms. Lamp, the module of Sesame that manages metabolomics data, is one of nine application modules of Sesame and was originally designed to manage information about the expression and purification of proteins and store this information. As Sesame and Lamp were not originally designed for metabolomics, its functions and features do not directly reflect the workflow of a typical metabolomics experiment. For example, even though Sesame has an extensive user management system, it does not have the functionalities of MASTR-MS that was specifically designed for metabolomics, such as an exhaustive project, experiment and sample management system, the ability of users of the lab to generate the sequence list of samples to be run in the analytical instruments, automatic capture of raw data from instruments and the ability of collaborators and clients to interact with the nodes using the Quote Management System.


In addition to the above discussed open source solutions, there are several commercial LIMS solutions such as MetaboLIMS from Core Informatics<ref name="CoreInfor">{{cite web |url=https://www.coreinformatics.com/ |title=Core Informatics |publisher=Core Informatics, LLC |accessdate=October 2016}}</ref>, MetLIMS from BioCrates<ref name="BioCrates">{{cite web |url=http://www.biocrates.com/ |title=Biocrates Life Sciences |publisher=Biocrates Life Sciences AG |accessdate=October 2016}}</ref> and Clarity LIMS from GenoLogics.<ref name="ClarityLIMS">{{cite web |url=https://www.genologics.com/editions/clarity-lims-gold/ |title=Clarity LIMS Gold |publisher=GenoLogics Life Sciences Software Inc. |accessdate=October 2016}}</ref> Due to their commercial nature, their functions and features are not readily available to compare against MASTR-MS.


==Conclusion==
This paper describes MASTR-MS, a new, fully integrated, open-source LIMS solution specifically designed for metabolomics laboratories. MASTR-MS can be used to track and share metabolomics experiments within a single laboratory or across large collaborative networks. Its comprehensive functions and features enable researchers and facilities to effectively manage a wide range of different project and experimental data types, and it facilitate the mining of new and existing datasets. The generic design of the data management component of MASTR-MS ensures that it can be used with instruments from different vendors. In addition, we have found that MASTR-MS can provide a LIMS solution for other data-rich technology platforms, such as proteomics, NMR and imaging facilities. MASTR-MS already has considerable community support, and new features will continuously be incorporated, including the capacity for researchers to directly upload their metadata and data to public metabolomics repositories such as MetaboLights and the Metabolomics Workbench. In addition, a reporting and export function is being developed at the user level, enabling the user to query the system and download data. In order to make automatic querying and retrieval easy, an API for MASTR-MS is being planned as well.


==Availability and requirements==
==Availability and requirements==
Line 69: Line 230:


==Supplementary material==
==Supplementary material==
[https://static-content.springer.com/esm/art%3A10.1007%2Fs11306-016-1142-2/MediaObjects/11306_2016_1142_MOESM1_ESM.jpg Supplementary material 1] (JPG 137 KB)
[https://static-content.springer.com/esm/art%3A10.1007%2Fs11306-016-1142-2/MediaObjects/11306_2016_1142_MOESM1_ESM.jpg Supplementary material 1 (S9A)] (JPG 137 KB)


[https://static-content.springer.com/esm/art%3A10.1007%2Fs11306-016-1142-2/MediaObjects/11306_2016_1142_MOESM2_ESM.png Supplementary material 2] (PNG 48 KB)
[https://static-content.springer.com/esm/art%3A10.1007%2Fs11306-016-1142-2/MediaObjects/11306_2016_1142_MOESM2_ESM.png Supplementary material 2 (S4)] (PNG 48 KB)


[https://static-content.springer.com/esm/art%3A10.1007%2Fs11306-016-1142-2/MediaObjects/11306_2016_1142_MOESM3_ESM.png Supplementary material 3] (PNG 6836 KB)
[https://static-content.springer.com/esm/art%3A10.1007%2Fs11306-016-1142-2/MediaObjects/11306_2016_1142_MOESM3_ESM.png Supplementary material 3 (S5A–D)] (PNG 6836 KB)


[https://static-content.springer.com/esm/art%3A10.1007%2Fs11306-016-1142-2/MediaObjects/11306_2016_1142_MOESM4_ESM.png Supplementary material 4] (PNG 3593 KB)
[https://static-content.springer.com/esm/art%3A10.1007%2Fs11306-016-1142-2/MediaObjects/11306_2016_1142_MOESM4_ESM.png Supplementary material 4 (S7A–B)] (PNG 3593 KB)


[https://static-content.springer.com/esm/art%3A10.1007%2Fs11306-016-1142-2/MediaObjects/11306_2016_1142_MOESM5_ESM.png Supplementary material 5] (PNG 2913 KB)
[https://static-content.springer.com/esm/art%3A10.1007%2Fs11306-016-1142-2/MediaObjects/11306_2016_1142_MOESM5_ESM.png Supplementary material 5 (S8A–B)] (PNG 2913 KB)


==References==
==References==
Line 83: Line 244:


==Notes==
==Notes==
This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. The original article's references were in alphabetical order; the references here are shown in order of appearance in the article due to the way the wiki processes references.
This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. The original article's references were in alphabetical order; the references here are shown in order of appearance in the article due to the way the wiki processes references. In one case the original URL had changed; an archived version of the URL was used instead. In two other cases, the original authors inadvertently used an incorrect reference (for Sesame LIMS and for Java CORBA); a correct reference was added to each for this version. The supplemental figures mentioned in the text have a different name than the ones supplied at the end; a best-effort attempt has been made to match the supplemental figure numbers in the text to the files linked in the "Supplementary material" section. (Though S9B and S9C are mentioned, they don't seem to be included, unless they are referencing S9A again.) Some grammar and spelling corrections were also made.


<!--Place all category tags here-->
<!--Place all category tags here-->
[[Category:LIMSwiki journal articles (added in 2016)]]
[[Category:LIMSwiki journal articles (added in 2017)]]
[[Category:LIMSwiki journal articles (all)]]
[[Category:LIMSwiki journal articles (all)]]
[[Category:LIMSwiki journal articles on bioinformatics‎‎]]
[[Category:LIMSwiki journal articles on bioinformatics]]
[[Category:LIMSwiki journal articles on laboratory informatics‎‎]]
[[Category:LIMSwiki journal articles on laboratory informatics]]
[[Category:LIMSwiki journal articles on software]]
[[Category:LIMSwiki journal articles on software]]

Latest revision as of 18:54, 13 July 2020

Full article title MASTR-MS: A web-based collaborative laboratory information management system (LIMS) for metabolomics
Journal Metabolomics
Author(s) Hunter, A.; Dayalan, S.; De Souza, D.; Power, B.; Lorrimar, R.; Szabo, T.; Nguyen, T.; O'Callaghan, S.; Hack, J.;
Pyke, J.; Nahid, A.; Barrero, R.; Roessner, U.; Likic, V.; Tull, D.; Bacic, A.; McConville, M.; Bellgard, M.
Author affiliation(s) Murdoch University, The University of Melbourne, The Australian Wine Research Institute
Primary contact Email: malcolmm at unimelb dot edu dot au -or- mbellgard at ccg dot murdoch dot edu dot au
Year published 2017
Volume and issue 13 (2)
Page(s) 14
DOI 10.1007/s11306-016-1142-2
ISSN 1573-3890
Distribution license Creative Commons Attribution 4.0 International
Website https://link.springer.com/article/10.1007/s11306-016-1142-2
Download https://link.springer.com/content/pdf/10.1007%2Fs11306-016-1142-2.pdf (PDF)

Abstract

Background

An increasing number of research laboratories and core analytical facilities around the world are developing high throughput metabolomic analytical and data processing pipelines that are capable of handling hundreds to thousands of individual samples per year, often over multiple projects, collaborations and sample types. At present, there are no laboratory information management systems (LIMS) that are specifically tailored for metabolomics laboratories that are capable of tracking samples and associated metadata from the beginning to the end of an experiment, including data processing and archiving, and which are also suitable for use in large institutional core facilities or multi-laboratory consortia as well as single laboratory environments.

Results

Here we present MASTR-MS, a downloadable and installable LIMS solution that can be deployed either within a single laboratory or used to link workflows across a multisite network. It comprises the Node Management System that can be used to link and manage projects across one or multiple collaborating laboratories; the User Management System which defines different user groups and privileges of users; the Quote Management System where client quotes are managed; the Project Management System in which metadata is stored and all aspects of project management, including experimental setup, sample tracking and instrument analysis, are defined; and the Data Management System that allows the automatic capture and storage of raw and processed data from the analytical instruments to the LIMS.

Conclusion

MASTR-MS is a comprehensive LIMS solution specifically designed for metabolomics. It captures the entire lifecycle of a sample, starting from project and experiment design to sample analysis, data capture and storage. It acts as an electronic notebook, facilitating project management within a single laboratory or a multi-node collaborative environment. This software is being developed in close consultation with members of the metabolomics research community. It is freely available under the GNU GPL v3 license and can be accessed from https://muccg.github.io/mastr-ms/.

Keywords

MASTR-MS, metabolomics, LIMS, omics

Introduction

Metabolomic approaches aim to detect and quantitate levels of all small molecules in a biological system and, together with other "omic" approaches, can be used to generate a systems-wide understanding of biological processes. Metabolomic approaches typically involve the use of advanced mass spectrometry and nuclear magnetic resonance (NMR) spectrometry platforms to maximize coverage of the chemically diverse metabolites that make up biological systems. In many cases, these analytical platforms are located in institutional and/or national core facilities that offer a range of metabolomics capabilities to researchers.[1][2][3][4][5] These core facilities, as well as individual research groups with sophisticated metabolomics infrastructure and capability are faced with the challenge of tracking large numbers of samples and the associated metadata, and linking this information with the raw datasets generated by multiple analytical platforms, as well as processed down-stream data sets. Data handling extends beyond collection and curation of raw data, to the management of metadata that defines how the raw data is generated. Major funding agencies, such as Europe’s Horizon 2020[5], the NIH[6], The Wellcome Trust[7] and Australia’s NHMRC[8] have established data management plans that researchers are expected to follow in order to capture, store and share data generated by their grants. Scientific journals are also increasingly requesting that experimental data and metadata associated with metabolomics experiments are made available to the scientific community[9][10], leading to the establishment of data repositories, such as MetaboLights[11] and Metabolomics Workbench.[12]

LIMS are software solutions that aim to manage the entire workflow of a laboratory. A number of LIMS have been developed or adapted from other applications for curating metabolomics experiments and data management (e.g., SetupX, Sesame). While these LIMS have features that allow capture of project metadata, experiments and samples, data storage, and data sharing, they exhibit a number of limitations around their capacity to accommodate different vendor instruments and have restricted functionalities to facilitate a collaborative configuration between geographically distributed laboratories. In this paper we present MASTR-MS, the first wholly functional, open-source LIMS solution specifically designed for metabolomics laboratories.

Materials and methods

MASTR-MS runs as a Python[13] web application built on the Django[14] framework, utilising a PostgreSQL[15] or MySQL[16] relational database. MASTR-MS leverages the functionality of the Django framework for user management, users permissions and security. Django is a mature web framework and provides multiple security tools and mechanisms. For example, specific protection is provided against cross-site scripting (XSS), cross-site request forgery (CSRF), SQL injection and clickjacking. A security middleware is also used to enforce SSL/HTTPS for all traffic. MASTR-MS is built using open-source components and communicates using open standards. The client side browser interface leverages Javascript and AJAX for fluid data display and submission, giving a user experience much like a desktop application, but with the flexibility of being available from any internet-connected location on any operating system, with no client-side download or installation.

The DataSync Client is a small desktop application that runs on an instrument’s acquisition computer. This software constantly communicates with the MASTR-MS server and is responsible for transferring raw data from the acquisition computer to the MASTR-MS repository (Supplemental Fig. S9A). The DataSync Client is written in the Python programming language using the wxWidgets[17] GUI library and runs on Windows and Linux systems. Data is uploaded using the rsync protocol[18] and the libraries and plugins required for this are included in the installation package.

As the MASTR-MS server-side component is written in the Python 2.7 programming language, any operating system that has Python 2.7 available for running web applications with a web server can run the application. In practice, the application has only been tested on the Linux operating system and the Apache web server. For installation, operating system packages are available in RPM format for CentOS 6.5. Similarly, as the DataSync Client is also written in Python 2.7 it can run on any operating system that has Python 2.7 available. However it is typically installed on a Windows platform with a connected analytical instrument. For this reason, the DataSync Client is distributed as a Windows executable (.exe) installer. The DataSync Client application is also self-updating by means of a user option to upgrade to a newer version if available.

Results

MASTR-MS is a web-based LIMS solution for metabolomics laboratories. The different modules of MASTR-MS allow users to:

  • Track all metabolomics samples and associated meta-, analytical- and processed data sets. This starts from the capture of client/collaborator communication; the establishment of new projects, experimental design and sample definitions; and the automatic capture of raw data generated by the instruments.
  • Develop an electronic notebook, where users record all relevant information about projects and experiments in MASTR-MS, thus allowing multiple users to work on the same project.
  • Methodically manage the vast amount of data generated by the analytical instruments, by associating it with the project, experiment and sample details.
  • Facilitate collaboration between geographically distributed laboratories through the sharing of projects and experiment data.

MASTR-MS is equally suited for use in either a large core facility or single-/multi-laboratory environment. Thus, both large national facilities and small individual laboratories would equally benefit from using MASTR-MS.

MASTR-MS comprises five major modules: (1) the Node Management System, (2) the User Management System, (3) the Quote Management System, (4) the Project Management System and (5) the Data Management System. Figure 1 shows the workflow of MASTR-MS using the different functionalities and features. These functions are described in detail below. The user is initially connected to the dashboard when they first log into MASTR-MS, and the available functions are tailored to the level of access of the user. The dashboard gives an “at-a-glance” summary of recent activity on the site and items requiring attention. Depending on the user’s status/level of access, the dashboard shows pending user requests, quotes requiring attention, recently created / modified projects, and recently created / modified experiments.


Fig1 Hunter Metabolomics2017 13-2.gif

Figure 1. Overview of MASTR-MS system workflow

Node management system

This module allows the addition of multiple laboratories to be part of a single MASTR-MS network. For example, a group of geographically dispersed laboratories can have a single deployment of MASTR-MS and share projects and experiments. Such a setup would be established by the module through the generation of different nodes. On the other hand, MASTR-MS can be used within a single laboratory environment in which this module would comprise a single node.

User management system

This module defines the different user groups used in MASTR-MS. Each user group has different privileges and permissions to access the different functionalities of MASTR-MS. In addition, this module allows the generation and management of users of the system. MASTR-MS has several user groups.

Systems administrator

This user group has access to all functionalities of MASTR-MS. There would normally be one assigned Systems Administrator who would act as the query point for all other users accessing the system, although it is possible to have more than one Systems Administrator. The Systems Administrator has a Laboratory Name assigned to their account (like all other users), allowing a nominated user, usually a member of the organization/laboratory that is hosting the project to act as the Systems Administrator. The Systems Administrator can add new users to the system, assign user groups to any users in any laboratory, edit details of users and delete users of any laboratory.

Administrator

This user group has full access to all projects, experiments and experimental data, user accounts and quotes within MASTR-MS, regardless of node. This user group allows selected users to view all projects and experiments across different nodes, allowing seamless sharing and collaboration of data across nodes. Where multiple laboratories have a single MASTR-MS deployment, but prefer not to share projects and experiments, no users would be assigned the Administrator role.

Node representative

This user group has full access to quotes for their node and are the preferred contact for quotes and projects run by this node (detailed more in the "Quote management system" section). In a multi-node setup there would typically be at least one user assigned to this group per node.

Project leader

This user group is able to create new projects and experiments for their node. Additionally, this group is able to assign staff to specific projects and experiments.

Staff

Users of this group are able to participate in the projects and experiments for their node.

Client

All other users of the system are clients. This group has no privileges other than viewing the progress of projects to which they have been assigned.

Any user of the system can update their own user record and change their password at any time.

Quote management system

This module was designed specifically for core facilities that provide metabolomic services to client researchers. Potential clients can request a pricing quote for running samples of an experiment through the quote request system without having to sign up for an account. At a nominated stage, clients are required to register in MASTR-MS by completing a short information dialog box. This module allows collection of contact details and information about the nature of the request. Files in various formats can be attached to this module. In a multi-node facility, the user can either direct their quote to a specific node with relevant expertise or they can select "Don’t Know" to have all the Node Representatives alerted.

Quote requests made by clients and collaborators that are made through the system are tracked and marked if they have not been attended to yet, so that Node Representatives can quickly see new quotes which require attention. Quotes can only be seen by members of the node to which they were sent, unless the "Don’t Know" option was selected. Node Representatives are able to forward quotes to other nodes if required. The Node Representatives can then begin a dialogue with the potential client and with their team, clarifying the task, and providing formal quotes, attached as PDFs if necessary. Each step of the communications process is time-stamped and tracked within this module. The quote requests and any resulting quotes would eventually be associated with a project and experiment through a selection option in the Experimental Design stage. All documentation relating to the project, including the client and quote issued for the project, along with the project and experimental setup, is thus kept together.

Project management system

This module allows the management of projects, experiments, and samples as well as the creation of analytical sample runs. As detailed above, users of different user groups are able to create projects and experiments. When a project is created by either a MASTR-MS Administrator or Project Leader, it can be linked to a specific client from the user list. This allows the client to monitor how the project is progressing. Assigning a Project Manager to the project allows those users to manage all aspects of a project, experiment creation and further access control on an experiment-by-experiment basis (Supplemental Fig. S4). As sample metadata is linked to all experiments within MASTR-MS, sample classes and/or individual samples can be organized into groups and subsequently analyzed on an instrument.

Experiment details

The Experiment Status defaults to "New" when first opened, and all experiment metadata is captured in this field (Supplemental Fig. S5A). Once the experiment design has been completed, the Project Manager can change the setting to "Designed" to prevent further changes. The experiment can also be linked to a formal quote that has been previously entered in the quotes system, and if needed, can be assigned an internal job number.

Access control/roles

Users can be assigned to an experiment, giving them access to edit the experimental workflow and create samples and runs. Client users can also be added here, giving them access to project progress information (Supplemental Fig. S5B).

Sample metadata

MASTR-MS uses sample metadata in order to generate sample classes, which can then be populated with individual samples (Supplemental Fig. S5C).

Origin/organs/parts metadata

The first metadata category is the Origin field, which contains information on sample origin and preparation (Supplemental Fig. S5D). Different metadata fields are available depending on whether the source is Microbial, Plant, Animal, Human, Synthetic, or Other.

Timeline/treatment metadata

MASTR-MS also accepts time course and treatment metadata, where samples have been collected over multiple time points, or after different experimental treatments. The Origin, Timeline, and Treatment fields are then used to automatically generate sample classes.

Sample preparation

MASTR-MS allows an upload of a standard operating procedure (SOP) document to be associated with an experiment. Multiple SOPs can be uploaded and additional notes recorded for each. A SOP is linked with methods used during runs at the time of setting up a run. The SOP is linked at the experiment level, and the option of choosing methods is provided under the runs level. This is to incorporate the option where a user would like to run multiple methods during a run (either by resampling the same vial or from a different vial).

Automatic sample class generation

Based on the metadata entered in the Origin, Timeline, and Treatment steps, sample classes are automatically generated based on permutations of the available metadata (Supplemental Fig. S7A). If abbreviations have been provided for a particular metadata category, these will be used during sample class generation. Samples can then be created in each sample class.

Samples can then be viewed and collected together to form a run on a designated analytical instrument platform (Supplemental Fig. S7B). Additional sample information can be imported via CSV and exported from MASTR-MS in the same way. Samples can be randomized before putting them into a run if desired.

Runs

Selected samples are added to a new or existing run by clicking the "Add Selected Samples to Run" button. This will display a dialog allowing the user to add either the samples to a new run or to any previous run which is still unlocked for editing (Supplemental Fig. S8A). Runs continue to be unlocked as long as a worklist has not yet been generated for them. Locked runs can be edited and reused if needed using the “Run Cloning” feature, which will duplicate the run data into a new unlocked run.

Worklist generation

The goal of run configuration is to streamline sample analysis and generate instrument worklists in a convenient and flexible manner. After sample data has been added to a run, the order and sequencing of additional run elements (Sweeps, Solvents, etc.) can be added via the Rules Generator.

The Rules Generator provides a customizable set of steps (rules) which dictate how worklists are built. It consists of a Start Block, Sample Block, and End Block, each of which allows the insertion of non-sample components into the worklist. These include Pooled Biological QC, Sweep, Reagent Blank, Solvent Blank and Pure Standard.

The sample block, containing the experiment samples, allows n components to be inserted every m samples, in random or position order (Supplemental Fig. S8B). Once all three blocks have been designed, the Rule Generator can be enabled, disabling further editing and making the rule available for inclusion in run worklist generation. Rule Generators can be restricted to use by a single user, an entire node, or everybody on the system. Enabled Rule Generators can be cloned in order to generate a new version, which can then be extended and modified.

To generate a worklist within a run, the user selects an instrument (configured and made available by Administrators) and a Rule Generator, if needed, and clicks the "Generate Worklist" button. Once the worklist is generated, further modification of the run is not possible. The specific worklist format is customizable by site administrators to provide flexibility among various instrument models. Once the worklist is generated, it can be used with the instrument to automate the raw data collection process.

Data management system

This module facilitates the capture and storage of raw data produced by the instruments. The raw data is captured by the DataSync Client as detailed below and is linked to associated project and experiment details. In addition, post-processed data and any other related files such as presentations, reports and papers can be linked to the data.

Data acquisition and the DataSync Client

The DataSync Client allows data to be transferred from connected instruments at nominated frequencies and will run in the background of the acquisition computers as an icon in the System Tray. The software is fully integrated with the MASTR-MS web application. When data synchronization is requested, either scheduled or manually, the DataSync Client communicates with the MASTR-MS system to query all incomplete experiment runs which have been configured for the connected instrument. It then searches the acquired data for required files and transmits them to the MASTR-MS repository via a configurable rsync transport, allowing compression and check-summing for efficient data transfer. The configuration options for individual DataSync Nodes are fully configurable via the MASTR-MS administration interface.

To enable DataSync Client uploads on the instrument, the user simply selects the connected instrument from the list which has been configured on the MASTR-MS system and enters the Rsync username which they have been assigned (Supplemental Fig. S9B). OpenSSH Public Keys can be uploaded to the MASTR-MS system for secure password-less usage, which allows the client to run seamless automated data uploads without need for operator intervention.

The DataSync Client can also be configured with some advanced options. Data archival allows the raw sample data to be automatically replicated in a specified location (e.g., on another hard disk) once confirmation of upload has been achieved, allowing the original data to be deleted if desired.

The software can also be forced to re-synchronize experiment data that has been marked as complete in case the need arises (Supplemental Fig. S9C).

Run progress

As data is synced with the MASTR-MS system, run progress is updated to reflect the number of confirmed files acquired versus the number expected. Once the MASTR-MS system has confirmed that run progress is at 100 percent, the run is marked complete and the run data is available to authorized users for download. Component files and Sample files are available for download separately, and Sample files can be packed into compressed archives (zip, tar.gz, tar.bzip) for efficient download, to minimize download sizes.

MASTR-MS is designed in a generic form such that it accommodates the automatic capture and transfer of any type of data from an acquisition computer to the server. This feature allows MASTR-MS to be used with instruments from different vendors with different file types.

Discussion

The systematic tracking, analysis and sharing of complex datasets generated by high-throughput omics technologies such as those used in metabolomics represents a major and expanding challenge. Reliance on outdated methods for recording information about projects, experiments, samples and instruments is cumbersome and error-prone. The methodical management of lab data can be achieved by software solutions such as LIMS and electronic notebook systems. An ideal LIMS solution should be able to manage users and user privileges of the lab; manage the setting up of projects, experiments and samples; and manage the resulting data. It should be able to facilitate sharing of meta/experimental data to other collaborating laboratories. The advantages of using task-specific LIMS over the old manual laboratory notebook or even simple spreadsheets are enormous. With well-designed systems such as LIMS solutions, search and retrieval becomes easy and efficient, especially in a lab that has been operating for several years, thereby having collected information on hundreds of projects and experiments. In addition, security plays an important role in LIMS solutions. Access to information and data about projects, experiments and samples would be controlled to be accessed only by authorized individuals. Finally, all information can be backed up to secure locations, thereby reducing the risk of accidental loss of data (Table 1).

Table 1. User roles and access privileges
User type Access privilege
Administrator Complete read-write access to all modules and nodes
Node representative For their specific node, complete read-write access to all modules
Project manager For their specific node, read-write access to only projects and experiments associated with them
Lab assistant For their specific node, read-write access to only experiments associated with them
Client Read access to only experiments associated with them

MASTR-MS is a comprehensive web-based LIMS solution that has been tailor-made for metabolomic experiments and is suitable for implementation within a single laboratory environment or across a multi-node research consortium/core facility. It (a) captures the entire lifecycle of a sample, from project and experimental design to the automatic capture and methodical storage of raw data generated by the multiple analytical instruments; (b) stores metadata about projects, experiments and samples and links the raw data with the metadata; (c) acts as a comprehensive electronic workbook; (d) acts as a storage solution for the vast amount of high throughput data generated by metabolomic experiments and (e) facilitates collaboration between different laboratories.

Scope of MASTR-MS

MASTR-MS efficiently manages the lifecycle of a sample, capturing information from client communication through to establishing projects, experiments, samples and continuing to automatic capture of raw data from the analytical instruments. MASTR-MS also stores processed data along with results of any statistical analysis and project reports. By design, MASTR-MS does not provide tools for data processing or statistical analysis, allowing researchers maximum flexibility for data processing and analysis, while allowing processed data to be imported and linked to raw data.

An important function of MASTR-MS is to act as an electronic laboratory notebook. To facilitate this, information is collected through free-flowing text fields. The advantage of this approach is that it allows the users to enter the same types of information that they would enter in their traditional lab notebook. The limitation behind this approach is that the entries are not controlled for ontologies, and therefore adopting to standards becomes challenging. Changing the free text entry to controlled vocabulary and incorporating the current MSI standards, as well as adopting the metabolomics community standards (ISA-Tab, mw-Tab) will be considered in future iterations of MASTR-MS.

Comparison to similar software

MASTR-MS offers a number of features that distinguish it from other metabolomics LIMS systems such as SetupX and Sesame. SetupX[19] is a web-based metabolomics LIMS solution that is XML compatible and built around a relational database management core. It is particularly oriented towards the capture and display of GC–MS metabolomic data through its metabolic annotation database, BinBase.[20] SetupX is able to handle a wide variety of BioSources (spatial, historical, environmental and genotypic descriptions of biological objects undergoing metabolomic investigations) and Treatments (experimental alterations that influence the metabolic states of BioSources). Compared to SetupX, MASTR-MS has not associated its input fields to ontologies, although it is intended that this will be incorporated into future versions of MASTR-MS as international standards are increasingly being adopted. Compared to SetupX, MASTR-MS offers the following advantages. It is able to cover multiple collaborating labs with a single deployment; lab-based users can generate the sequence list of samples to be run in the analytical instruments, thereby saving time and reducing the possibility of human errors; raw data generated by analyses is automatically captured by MASTR-MS; the user management system is extensive; and collaborators and clients are able to interact with the nodes using the Quote Management System.

Sesame[21] is also a web-based, platform-independent LIMS. It is based on Java CORBA, a commercial and open-source RDBMS, and was originally developed to facilitate NMR-based structural genomics studies.[22] The Sesame module for metabolomics is called "Lamp." The Lamp module was originally designed to process NMR metabolomic analyses of Arabidopsis, although it is flexible enough to be easily adapted to other biological systems and other analytical methods. It consists of a number of different "Views" which provide details about the data, the instruments, and system resources used in a given study. In Sesame, the Views are designed to operate on various kinds of data, and facilitate data capture, editing, processing, analysis, retrieval and report generation. Sesame is a broad LIMS solution whose origins are in structural and functional proteomics, managing data from NMR platforms. Lamp, the module of Sesame that manages metabolomics data, is one of nine application modules of Sesame and was originally designed to manage information about the expression and purification of proteins and store this information. As Sesame and Lamp were not originally designed for metabolomics, its functions and features do not directly reflect the workflow of a typical metabolomics experiment. For example, even though Sesame has an extensive user management system, it does not have the functionalities of MASTR-MS that was specifically designed for metabolomics, such as an exhaustive project, experiment and sample management system, the ability of users of the lab to generate the sequence list of samples to be run in the analytical instruments, automatic capture of raw data from instruments and the ability of collaborators and clients to interact with the nodes using the Quote Management System.

In addition to the above discussed open source solutions, there are several commercial LIMS solutions such as MetaboLIMS from Core Informatics[23], MetLIMS from BioCrates[24] and Clarity LIMS from GenoLogics.[25] Due to their commercial nature, their functions and features are not readily available to compare against MASTR-MS.

Conclusion

This paper describes MASTR-MS, a new, fully integrated, open-source LIMS solution specifically designed for metabolomics laboratories. MASTR-MS can be used to track and share metabolomics experiments within a single laboratory or across large collaborative networks. Its comprehensive functions and features enable researchers and facilities to effectively manage a wide range of different project and experimental data types, and it facilitate the mining of new and existing datasets. The generic design of the data management component of MASTR-MS ensures that it can be used with instruments from different vendors. In addition, we have found that MASTR-MS can provide a LIMS solution for other data-rich technology platforms, such as proteomics, NMR and imaging facilities. MASTR-MS already has considerable community support, and new features will continuously be incorporated, including the capacity for researchers to directly upload their metadata and data to public metabolomics repositories such as MetaboLights and the Metabolomics Workbench. In addition, a reporting and export function is being developed at the user level, enabling the user to query the system and download data. In order to make automatic querying and retrieval easy, an API for MASTR-MS is being planned as well.

Availability and requirements

Project name: MASTR-MS

Project home page: https://muccg.github.io/mastr-ms/

Operating system(s): Server Installation: Centos 6.x (x86_64); Client: Any operating system and modern web browser can be used as the web client to access MASTR-MS; DataSync Client: Linux or Windows

Programming language: Python 2.7

Software requirements: Apache 2.2 or higher, PostgreSQL 8.4 or higher

License: GNU GPL v3

Any restrictions to use by non-academics: See GNU GPL v3

Acknowledgements

This project is supported by Bioplatforms Australia Ltd., the Australian National Collaborative Research Infrastructure Strategy Program and the Education Investment Fund Super Science Initiative. The authors gratefully acknowledge additional funding from the Australian National Health and Medical Research Council (APP634485, APP1055319) and the EU FP7 Project (HEALTH.2012.2.1.1-1-C): RD Connect: An integrated platform connecting databases, registries, biobanks and clinical bioinformatics for rare disease research. MJM is a NHMRC Principal Research Fellow. AB acknowledges the support of the ARC Centre of Excellence in Plant Cell Walls. The authors acknowledge the many contributions made by other researchers in the Bioplatforms Australia network, including Michael Clarke, Hayden Walker, Dorothee Hayne, Robert Trengove and Catherine Rawlinson.

Adam Hunter, Saravanan Dayalan and David De Souza have contributed equally to this work.

Compliance with ethical standards

Conflict of interest

All authors declare that they have no conflicts of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Supplementary material

Supplementary material 1 (S9A) (JPG 137 KB)

Supplementary material 2 (S4) (PNG 48 KB)

Supplementary material 3 (S5A–D) (PNG 6836 KB)

Supplementary material 4 (S7A–B) (PNG 3593 KB)

Supplementary material 5 (S8A–B) (PNG 2913 KB)

References

  1. "Metabolomics Australia". Metabolomics Australia. http://www.metabolomics.net.au/. Retrieved 05 December 2014. 
  2. "The Metabolomics Inovation Centre". University of Alberta. http://www.metabolomicscentre.ca/. Retrieved 05 December 2014. 
  3. "Metabolomics". The Common Fund. National Institutes of Health, Office of Strategic Coordination. http://commonfund.nih.gov/metabolomics/index. Retrieved 05 December 2014. 
  4. "MetaboHUB". MetaboHUB Centre INRA Bordeaux - Aquitaine. http://www.metabohub.fr/. Retrieved 05 December 2014. 
  5. 5.0 5.1 "Guidelines on FAIR Data Management in Horizon 2020 - Version 3.0" (PDF). European Commission. 26 July 2016. http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf. Retrieved 05 December 2014. 
  6. "NIH Data Sharing Policy and Implementation Guidance". Grants and Funding. National Institutes of Health. 5 March 2003. https://grants.nih.gov/grants/policy/data_sharing/data_sharing_guidance.htm. Retrieved 05 December 2014. 
  7. "Guidance for researchers: Developing a data management and sharing plan". Policy and position statements. Wellcome Trust. 2014. Archived from the original on 18 October 2014. https://web-beta.archive.org/web/20141018165611/http://www.wellcome.ac.uk/About-us/Policy/Spotlight-issues/Data-sharing/Guidance-for-researchers/index.htm. Retrieved 05 December 2014. 
  8. "Australian Code for the Responsible Conduct of Research". National Health and Medical Research Council, Australia. 2007. https://www.nhmrc.gov.au/guidelines-publications/r39. Retrieved 05 December 2014. 
  9. "Data Policies". Nature. Macmillan Publishers Limited. https://www.nature.com/sdata/policies/data-policies. Retrieved 05 December 2014. 
  10. "Instructions for authors: Research Articles". GigaScience. BioMed Central Ltd. 2014. Archived from the original on 15 May 2014. https://web.archive.org/web/20140515012736/http://www.gigasciencejournal.com:80/authors/instructions/research. Retrieved 05 December 2014. 
  11. Haug, K.; Salek, R.M.; Conesa, P. (2013). "MetaboLights--An open-access general-purpose repository for metabolomics studies and associated meta-data". Nucleic Acids Research 41 (D1): D781-6. doi:10.1093/nar/gks1004. PMC PMC3531110. PMID 23109552. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3531110. 
  12. "Metabolics Workbench". University of California San Diego. http://www.metabolomicsworkbench.org/. Retrieved 05 December 2014. 
  13. "Python". Python Software Foundation. https://www.python.org/. Retrieved 05 December 2014. 
  14. "Django". Django Software Foundation. https://www.djangoproject.com/. Retrieved 05 December 2014. 
  15. "PostgreSQL". PostgreSQL Global Development Group. https://www.postgresql.org/. Retrieved 05 December 2014. 
  16. "MySQL". Oracle Corporation. https://www.mysql.com/. Retrieved 05 December 2014. 
  17. "wxWidgets". wxWidgets Development Team. https://www.wxwidgets.org/. Retrieved 10 November 2016. 
  18. "rsync". Wayne Davison. https://rsync.samba.org/. Retrieved 05 December 2014. 
  19. Scholz, M.; Fiehn, O. (2007). "SetupX -- A public study design database for metabolomic projects". Pacific Symposium on Biocomputing 2007: 169–80. PMID 17990490. 
  20. Skogerson, K.; Wohlgemuth, G.; Barupal, D.K.; Fiehn, O. (2011). "The volatile compound BinBase mass spectral database". BMC Bioinformatics 12: 321. doi:10.1186/1471-2105-12-321. PMC PMC3199763. PMID 21816034. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3199763. 
  21. Zolnai, Zsolt; Lee, Peter T.; Li, Jing; Chapman, Michael R.; Newman, Craig S.; Phillips Jr., George N.; Rayment, Ivan; Ulrich, Eldon L.; Volkman, Brian F.; Markley, John L. (January 2003). "Project management system for structural and functional proteomics: Sesame" (PDF). Journal of Structural and Functional Genomics 4 (1): 11–23. doi:10.1023/A:1024684404761. http://www.springerlink.com/content/p3u654x38832uv73/fulltext.pdf. 
  22. Markley, J.L.; Anderson, M.E.; Cui, Q. et al. (2007). "New bioinformatics resources for metabolomics". Pacific Symposium on Biocomputing 2007: 157–168. PMID 17990489. 
  23. "Core Informatics". Core Informatics, LLC. https://www.coreinformatics.com/. Retrieved October 2016. 
  24. "Biocrates Life Sciences". Biocrates Life Sciences AG. http://www.biocrates.com/. Retrieved October 2016. 
  25. "Clarity LIMS Gold". GenoLogics Life Sciences Software Inc.. https://www.genologics.com/editions/clarity-lims-gold/. Retrieved October 2016. 

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. The original article's references were in alphabetical order; the references here are shown in order of appearance in the article due to the way the wiki processes references. In one case the original URL had changed; an archived version of the URL was used instead. In two other cases, the original authors inadvertently used an incorrect reference (for Sesame LIMS and for Java CORBA); a correct reference was added to each for this version. The supplemental figures mentioned in the text have a different name than the ones supplied at the end; a best-effort attempt has been made to match the supplemental figure numbers in the text to the files linked in the "Supplementary material" section. (Though S9B and S9C are mentioned, they don't seem to be included, unless they are referencing S9A again.) Some grammar and spelling corrections were also made.