Journal:Neuroimaging, genetics, and clinical data sharing in Python using the CubicWeb framework

From LIMSWiki
Revision as of 21:05, 19 June 2017 by Shawndouglas (talk | contribs) (Saving and adding more.)
Jump to navigationJump to search
Full article title Neuroimaging, genetics, and clinical data sharing in Python using the CubicWeb framework
Journal Frontiers in Neuroinformatics
Author(s) Grigis, Antoine; Goyard, David; Cherbonnier, Robin; Gareau, Thomas; Papadopoulos Orfanos, Dimitri; Chauvat, Nicolas; Di Mascio, Adrien; Schumann, Gunter; Spooren, Will; Murphy, Declan; Frouin, Vincent
Author affiliation(s) Université Paris-Saclay, Logilab, King’s College London, F. Hoffmann-La Roche Pharmaceuticals
Primary contact Email: antoine dot grigis at cea dot fr
Editors Marcus, Daniel
Year published 2017
Volume and issue 11
Page(s) 18
DOI 10.3389/fninf.2017.00018
ISSN 1662-5196
Distribution license Creative Commons Attribution 4.0 International
Website http://journal.frontiersin.org/article/10.3389/fninf.2017.00018/full
Download http://journal.frontiersin.org/article/10.3389/fninf.2017.00018/pdf (PDF)

Abstract

In neurosciences or psychiatry, the emergence of large multi-center population imaging studies raises numerous technological challenges. From distributed data collection, across different institutions and countries, to final data publication service, one must handle the massive, heterogeneous, and complex data from genetics, imaging, demographics, or clinical scores. These data must be both efficiently obtained and downloadable. We present a Python solution, based on the CubicWeb open-source semantic framework, aimed at building population imaging study repositories. In addition, we focus on the tools developed around this framework to overcome the challenges associated with data sharing and collaborative requirements. We describe a set of three highly adaptive web services that transform the CubicWeb framework into a (1) multi-center upload platform, (2) collaborative quality assessment platform, and (3) publication platform endowed with massive-download capabilities. Two major European projects, IMAGEN and EU-AIMS, are currently supported by the described framework. We also present a Python package that enables end users to remotely query neuroimaging, genetics, and clinical data from scripts.

Keywords: web service, data sharing, database, neuroimaging, genetics, medical informatics, Python

Introduction

Health research strategies using neuroimaging have shifted in recent years: the focus has moved from patient care only, to a combination of patient care and prevention. In the case of neurodegenerative and psychiatric diseases, this drives the creation of increasingly numerous massive imaging studies, also known as population imaging (PI) surveys.[1][2] It should be noticed that PI studies no longer consist of image data only. The recent wide availability of high-throughput genomics has augmented the subject data with genetics, epigenetics, and functional genomics. Likewise, the standardization of personality, demographics, and deficit tests in psychiatry facilitates the acquisition of clinical/behavioral records to enrich the subject data in large population studies. Moreover, PI studies now classically encompass more than one single imaging session per subject and cover multiple-time point heterogeneous experiments. Ultimately, these studies with complex imaging and extended data (PIx) require multi-center acquisitions to build a large target population.

A regular PIx infrastructure has to cover the following three main topics: (1) data collection, (2) quality control (QC) with data processing, and (3) data indexing and publication with controlled data sharing mechanisms. Furthermore, PIx infrastructures must evolve during the life cycle of a population imaging project, and they must also be resilient to extreme evolutions of the data content and management. In the projects we manage, we experience several extreme evolutions. The first kind of evolution may affect the published dataset such as adding a new modality for all subjects, a new time point or a new subcohort. Second, the amount of data requested evolves dramatically as the project consortium gets enlarged.[3] Finally, internal ontologies have to evolve constantly in order to match the ongoing initiatives on interoperability.[4][5]

Several existing open-source frameworks support one or several of the described topics, sometimes only for one specific data type. We propose in the following a brief overview of existing systems. Some of these systems have also been reviewed by Nichols and Pohl.[6] IDA[7] is a neuroimaging data repository and management system that supports data collection (topic one) and data sharing (topic three). With this system, the published datasets can be searched using automatically extracted metadata. The XNAT framework[8] is widely used for neuroimaging data and supports all the PIx infrastructure topics, focusing on tools to pipeline, and to audit the processing of image data (topic two). The LORIS[9] and NiDB[10] frameworks represent a significant effort to account for multimodal data involved in PIx studies. These frameworks, although addressing all the required topics, mainly support neuroimaging data. Openclinica[11] and REDCap[12] facilitate the collection of electronic data such as eCRF or questionnaires and are recognized in projects of various sizes that support data collection (topic one). Likewise, laboratory information management systems were developed for the collection of genomic measurements such as SIMBioMS.[13] Finally, the COINS framework brings essential tools for multimodal data support and, more interestingly, emphasizes the importance of providing sharing tools (topics one and three).[14]

The two European studies we manage require a tailored PIx infrastructure. Existing frameworks neither completely handle the diversity of our PIx requirements and project life cycle nor provide efficient tools to collect, check the quality of, and publish evolving data. Additional developments were required for building such complete infrastructure. We based these developments on a more general framework than the dedicated applications described above. In collaboration with Logilab company (Logilab SA, Paris, France), we developed three highly adaptive web services, based on the CubicWeb (CW) pure-Python framework, aimed at creating a (1) multi-center upload platform, (2) collaborative quality assessment platform, and (3) publication platform with massive-download features.[15] These developments were originally instituted for IMAGEN and EU-AIMS projects in order to host their data about mental health in adolescents[16] and autism[17], respectively. The corresponding studies require key features such as upload/browse published data from the web, dynamic selection and filtering of displayed data, support for flexible download operations, high-level request language, multilevel access rights, remote data access, remote user access rights management, collaborative QC, and interoperability.

References

  1. Hurko, O.; Black, S.E.; Doody, R. et al. (2012). "The ADNI Publication Policy: Commensurate recognition of critical contributors who are not authors". NeuroImage 59 (4): 4196–4200. doi:10.1016/j.neuroimage.2011.10.085. PMC PMC3676932. PMID 22100665. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3676932. 
  2. Poldrack, R.A.; Gorgolewski, K.J. (2014). "Making big data open: Data sharing in neuroimaging". Nature Neuroscience 17 (11): 1510–7. doi:10.1038/nn.3818. PMID 25349916. 
  3. Gorgolewski, K.J.; Varoquaux, G.; Rivera, G. et al. (2015). "NeuroVault.org: A web-based repository for collecting and sharing unthresholded statistical maps of the human brain". Frontiers in Neuroinformatics 9: 8. doi:10.3389/fninf.2015.00008. PMC PMC4392315. PMID 25914639. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4392315. 
  4. Scheufele, E.; Aronzon, D.; Coopersmith, R. et al. (2014). "tranSMART: An Open Source Knowledge Management and High Content Data Analytics Platform". AMIA Joint Summits on Translational Science 2014: 96–101. PMC PMC4333702. PMID 25717408. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4333702. 
  5. Gorgolewski, K.J.; Auer, T.; Calhoun, V.D. et al. (2016). "The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments". Scientific Data 3: 160044. doi:10.1038/sdata.2016.44. PMC PMC4978148. PMID 27326542. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4978148. 
  6. Nichols, B.N.; Pohl, K.M. (2015). "Neuroinformatics Software Applications Supporting Electronic Data Capture, Management, and Sharing for the Neuroimaging Community". Neuropsychology Review 25 (3): 356-68. doi:10.1007/s11065-015-9293-x. PMC PMC5400666. PMID 26267019. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5400666. 
  7. Van Horn, J.D.; Toga, A.W. (2009). "Is it time to re-prioritize neuroimaging databases and digital repositories?". NeuroImage 47 (4): 1720-34. doi:10.1016/j.neuroimage.2009.03.086. PMC PMC2754579. PMID 19371790. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2754579. 
  8. Marcus, D.S.; Harms, M.P.; Snyder, A.Z. et al. (2013). "Human Connectome Project informatics: quality control, database services, and data visualization". NeuroImage 80: 202-19. doi:10.1016/j.neuroimage.2013.05.077. PMC PMC3845379. PMID 23707591. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3845379. 
  9. Das, S.; Zijdenbos, A.P.; Harlap, J. et al. (2012). "LORIS: A web-based data management system for multi-center studies". Frontiers in Neuroinformatics 5: 37. doi:10.3389/fninf.2011.00037. PMC PMC3262165. PMID 22319489. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3262165. 
  10. Book, G.A.; Anderson, B.M.; Stevens, M.C. et al. (2013). "Neuroinformatics Database (NiDB) - A modular, portable database for the storage, analysis, and sharing of neuroimaging data". Neuroinformatics 11 (4): 495-505. doi:10.1007/s12021-013-9194-1. PMC PMC3864015. PMID 23912507. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3864015. 
  11. "OpenClinica User Documentation". OpenClinica, LLC. 18 April 2016. https://docs.openclinica.com/. 
  12. Harris, P.A.; Taylor, R.; Thielke, R. et al. (2009). "Research electronic data capture (REDCap) - A metadata-driven methodology and workflow process for providing translational research informatics support". Journal of Biomedical Informatics 42 (2): 377–81. doi:10.1016/j.jbi.2008.08.010. PMC PMC2700030. PMID 18929686. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2700030. 
  13. Krestyaninova, M.; Zarins, A.; Viksna, J. et al. (2009). "A system for information management in biomedical studies – SIMBioMS". Bioinformatics 25 (20): 2768-2769. doi:10.1093/bioinformatics/btp420. PMC PMC2759553. PMID 19633095. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2759553. 
  14. Scott, A.; Courtney, W.; Wood, D. et al. (2011). "COINS: An Innovative Informatics and Neuroimaging Tool Suite Built for Large Heterogeneous Datasets". Frontiers in Neuroinformatics 5: 33. doi:10.3389/fninf.2011.00033. PMC PMC3250631. PMID 22275896. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3250631. 
  15. "CubicWeb - The Semantic Web is a construction game!". Logilab. 2016. https://www.cubicweb.org/. 
  16. Schumann, G.; Loth, E.; Banaschewski, T. et al. (2010). "The IMAGEN study: Reinforcement-related behaviour in normal brain function and psychopathology". Molecular Psychiatry 15 (12): 1128-39. doi:10.1038/mp.2010.4. PMID 21102431. 
  17. Murphy, D.; Spooren, W. (2012). "EU-AIMS: A boost to autism research". Nature Reviews Drug Discovery 11 (11): 815-6. doi:10.1038/nrd3881. PMID 23123927. 

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added. References are in order of appearance rather than alphabetical order (as the original was) due to the way this wiki works.