Journal:Eleven quick tips for architecting biomedical informatics workflows with cloud computing

From LIMSWiki
Revision as of 19:06, 26 June 2018 by Shawndouglas (talk | contribs) (Saving and adding more.)
Jump to navigationJump to search
Full article title Eleven quick tips for architecting biomedical informatics workflows with cloud computing
Journal PLoS Computational Biology
Author(s) Cole, Brian S.; Moore, Jason H.
Author affiliation(s) University of Pennsylvania
Primary contact Email: colebr at upenn dot edu
Editors Ouellette, Francis
Year published 2018
Volume and issue 14(3)
Page(s) e1005994
DOI 10.1371/journal.pcbi.1005994
ISSN 1553-7358
Distribution license Creative Commons Attribution 4.0 International
Website http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005994
Download http://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1005994&type=printable (PDF)

Abstract

Cloud computing has revolutionized the development and operations of hardware and software across diverse technological arenas, yet academic biomedical research has lagged behind despite the numerous and weighty advantages that cloud computing offers. Biomedical researchers who embrace cloud computing can reap rewards in cost reduction, decreased development and maintenance workload, increased reproducibility, ease of sharing data and software, enhanced security, horizontal and vertical scalability, high availability, a thriving technology partner ecosystem, and much more. Despite these advantages that cloud-based workflows offer, the majority of scientific software developed in academia does not utilize cloud computing and must be migrated to the cloud by the user. In this article, we present 11 quick tips for designing biomedical informatics workflows on compute clouds, distilling knowledge gained from experience developing, operating, maintaining, and distributing software and virtualized appliances on the world’s largest cloud. Researchers who follow these tips stand to benefit immediately by migrating their workflows to cloud computing and embracing the paradigm of abstraction.

Introduction

Cloud computing is the on-demand use of computational hardware, software, and networks provided by a third party.[1] The rise of the internet allowed companies to offer fully internet-based file storage services, including Amazon Web Services’ Simple Storage Service, which launched in 2006.[2] Throughout the past decade, cloud computing has expanded from simple file and object storage to a comprehensive array of on-demand services ranging from bare metal servers and networks to fully managed databases and clusters of computers capable of data processing at a massive scale.[3][4]

Modern cloud computing providers and the customers that utilize their services share responsibility for computer systems, with the cloud provider managing the physical hardware and virtualization software and the consumer utilizing the cloud services to architect workflows which may include applications, databases, systems and networks, storage, web servers, and much more.[5][6] In this way, cloud computing allows users to offload the burden of managing physical systems and focus on building and operating solutions.

Cloud computing has revolutionized the way businesses operate. By using a cloud provider instead of operating private data centers, companies can reduce costs by paying for only the hardware they use and only when they use it. In addition, cloud-based technological solutions offer many important advantages when compared to conventional enterprise data centers, including the ability to dynamically scale up under increased load, recover from disaster incidents automatically, remotely monitor application states, automate hardware and software deployments, and manage security through code. In addition, many cloud providers operate multiple data centers across continents, providing redundancy across different locations in the world to increase fault tolerance and reduce latency. Finally, cloud computing has evolved a new paradigm of microservice-centric application design, wherein the traditional monolithic software stack is replaced with loosely coupled components which can each be scaled individually, updated individually, and even replaced with fully managed cloud services such as message passing services, serverless function execution services, managed databases and data lakes, and even container management services. Businesses have exploited these advantages of cloud computing to gain an edge in a competitive landscape, ushering in a new era of computing that emphasizes abstraction, agility, and virtualization.

Scientific computing in academic research environments still mostly utilizes in-house enterprise compute systems such as high-performance computing (HPC) clusters.[7] In these systems, all software, hardware, data storage, networking, and security are the responsibility of the institution, including compliance with applicable state and federal laws such as HIPAA and other regulations which govern data storage for protected health information and human genetic data. The fact that scientific institutions manage their own separate compute systems poses serious problems for reproducibility due to differences in hardware and software across institutions.[8][9][10] Additionally, the HPC model fails to allow researchers to capitalize on the innovations offered by cloud computing. For these reasons, we have compiled a set of eleven quick tips to help biomedical researchers and their teams design solutions using cloud computing. We provide a high-level overview of some best practices for cloud computing with an emphasis on reproducibility, cost reduction, efficiency of development and operations, and ease of implementation.

References

  1. Charlebois, K.; Palmour, N.; Knoppers, B.M. (2016). "The Adoption of Cloud Computing in the Field of Genomics Research: The Influence of Ethical and Legal Issues". PLoS One 11 (10): e0164347. doi:10.1371/journal.pone.0164347. PMC PMC5068798. PMID 27755563. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5068798. 
  2. Fusaro, V.A.; Patil, P.; Gafni, E. et al. (2011). "Biomedical cloud computing with Amazon Web Services". PLoS Computational Biology 7 (8): e1002147. doi:10.1371/journal.pcbi.1002147. PMC PMC3161908. PMID 21901085. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3161908. 
  3. Schadt, E.E.; Linderman, M.D.; Sorenson, J. et al. (2011). "Cloud and heterogeneous computing solutions exist today for the emerging big data problems in biology". Nature Reviews Genetics 12 (3): 224. doi:10.1038/nrg2857-c2. PMID 21301474. 
  4. Muth, T.; Peters, J.; Blackburn, J. et al. (2013). "ProteoCloud: a full-featured open source proteomics cloud computing pipeline". Journal of Proteomics 88: 104–8. doi:10.1016/j.jprot.2012.12.026. PMID 23305951. 
  5. Grossman, R.L.; White, K.P. (2012). "A vision for a biomedical cloud". Journal of Internal Medicine 271 (2): 122–30. doi:10.1111/j.1365-2796.2011.02491.x. PMID 22142244. 
  6. Stein, L.D.; Knoppers, B.M.; Campbell, P. (2015). "Data analysis: Create a cloud commons". Nature 523 (7559): 149–51. doi:10.1038/523149a. PMID 26156357. 
  7. Jackson, K.R.; Ramakrishnan, L.; Muriki, K. et al. (2010). "Performance Analysis of High Performance Computing Applications on the Amazon Web Services Cloud". IEEE Second International Conference on Cloud Computing Technology and Science: 159-168. doi:10.1109/CloudCom.2010.69. 
  8. Sandve, G.K.; Nekrutenko, A.; Taylor, J. et al. (2013). "Ten simple rules for reproducible computational research". PLoS Computational Biology 9 (10): e1003285. doi:10.1371/journal.pcbi.1003285. PMC PMC3812051. PMID 24204232. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3812051. 
  9. Begley, C.G.; Ioannidis, J.P. (2015). "Reproducibility in science: improving the standard for basic and preclinical research". Circulation Research 116 (1): 116–26. doi:10.1161/CIRCRESAHA.114.303819. PMID 25552691. 
  10. Peng, R.D. (2011). "Reproducible research in computational science". Science 334 (6060): 1226–7. doi:10.1126/science.1213847. PMC PMC3383002. PMID 22144613. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3383002. 

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. Grammar has been updated for clarity. In some cases important information was missing from the references, and that information was added. The original title uses "architecting" as a verb; we've kept it in the title to reference the original article, but references in in-line text have been changed to "designing."