Journal:NG6: Integrated next generation sequencing storage and processing environment
Full article title | NG6: Integrated next generation sequencing storage and processing environment |
---|---|
Journal | BMC Genomics |
Author(s) | Mariette, J.; Escudié, F.; Allias, N.; Salin, G.; Noirot, C.; Thomas, S.; Klopp, C. |
Author affiliation(s) | Biométrie et Intelligence Artificielle and Génétique Cellulaire |
Primary contact | E-mail: Jerome.Mariette@toulouse.inra.fr |
Year published | 2012 |
Volume and issue | 13 |
Page(s) | 462 |
DOI | 10.1186/1471-2164-13-462 |
ISSN | 1471-2164 |
Distribution license | Creative Commons Attribution 2.0 Generic |
Website | http://bmcgenomics.biomedcentral.com/articles/10.1186/1471-2164-13-462 |
Download | http://bmcgenomics.biomedcentral.com/track/pdf/10.1186/1471-2164-13-462 (PDF) |
This article should not be considered complete until this message box has been removed. This is a work in progress. |
Abstract
Background
Next generation sequencing platforms are now well implanted in sequencing centres and some laboratories. Upcoming smaller scale machines such as the 454 junior from Roche or the MiSeq from Illumina will increase the number of laboratories hosting a sequencer. In such a context, it is important to provide these teams with an easily manageable environment to store and process the produced reads.
Results
We describe a user-friendly information system able to manage large sets of sequencing data. It includes, on one hand, a workflow environment already containing pipelines adapted to different input formats (sff, fasta, fastq and qseq), different sequencers (Roche 454, Illumina HiSeq) and various analyses (quality control, assembly, alignment, diversity studies,…) and, on the other hand, a secured web site giving access to the results. The connected user will be able to download raw and processed data and browse through the analysis result statistics. The provided workflows can easily be modified or extended and new ones can be added. Ergatis is used as a workflow building, running and monitoring system. The analyses can be run locally or in a cluster environment using Sun Grid Engine.
Conclusions
NG6 is a complete information system designed to answer the needs of a sequencing platform. It provides a user-friendly interface to process, store and download high-throughput sequencing data.
Background
Sequencer manufacturers follow different objectives using different platforms.[1] In the first place they release upgrades of second generation platforms producing more data with updated hardware and sequencing kits. This lowers the sequencing cost per base pair but often focuses these machines on medium or large projects. In the second place, they introduce new laboratory scale platforms such as the Illumina MiSeq or the Roche Junior which target smaller projects. And last, they work on the third generation machines which will not depend on amplified material and therefore get rid of some biases. The first two machines types which are already marketed today associated with a larger scope of sequencing protocols, enabling new studies, push towards more sequencing projects and more users.
Once the sequencing is done, the largest part of the work and the longest time period of the project are dedicated to data analysis. Therefore it is important to provide the new smaller production units and the laboratories in which the projects are conducted with efficient and user-friendly processing environments, enabling quality control and routine analysis. These pieces of software should have several features such as access control, metadata storage on the produced reads, quality control including known bias verification and standard analysis. NG6 was developed to match these goals and to be as flexible as possible, in order to follow sequencing technologies upgrades.
Laboratory information management systems (LIMS) are often focused on the traceability of the biological material. Some of them, such as PIMS[2] or even SLIMS[3], have included extensions to monitor the sequencing process. However few of the open-source LIMS also provide the data processing environment. This feature is present in the Galaxy[4] sample tracking module. It is based on the Galaxy workflow engine and provides users with an interface to create and track sequencing requests. Once the sequences have been produced, the user can transfer its data files, build and run workflows to process them.
NG6 is an extensible sequencing provider oriented LIMS. It includes read quality control and first level analysis processes which ease the data validation made jointly by the sequencing facility staff and the end-users. It provides a secured user-friendly interface to visualize and download the raw sequences files and the analysis results.
References
- ↑ Glenn, T.C. (2011). "Field guide to next-generation DNA sequencers". Molecular Ecology Resources 11 (5): 759-769. doi:10.1111/j.1755-0998.2011.03024.x. PMID 21592312.
- ↑ Troshin, P.V. Postis, V.L.; Ashworth, D. et al. (2011). "PIMS sequencing extension: a laboratory information management system for DNA sequencing facilities". BMC Research Notes 4: 48. doi:10.1186/1756-0500-4-48. PMC PMC3058032. PMID 21385349. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3058032.
- ↑ Van Rossum, T.; Tripp, B.; Daley, D. (2010). "SLIMS: A user-friendly sample operations and inventory management system for genotyping labs". Bioinformatics 26 (14): 1808-1810. doi:10.1093/bioinformatics/btq271. PMC PMC2894515. PMID 20513665. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2894515.
- ↑ Giardine, B.; Riemer, C.; Hardison, R.C. et al. (2005). "Galaxy: A platform for interactive large-scale genome analysis". Genome Research 15 (10): 1451–1455. doi:10.1101/gr.4086505. PMC PMC1240089. PMID 16169926. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1240089.
Notes
This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added.