Journal:AI meets exascale computing: Advancing cancer research with large-scale high-performance computing

From LIMSWiki
Revision as of 17:08, 19 November 2019 by Shawndouglas (talk | contribs) (Saving and adding more.)
Jump to navigationJump to search
Full article title AI meets exascale computing: Advancing cancer research with large-scale high-performance computing
Journal Frontiers in Oncology
Author(s) Bhattacharya, Tanmoy; Brettin, Thomas; Doroshow, James H.; Evrard, Yvonne A.; Greenspan, Emily J.; Gryshuk, Amy L.;
Hoang, Thuc T.; Vea Lauzon, Carolyn, B.; Nissley, Dwight; Penberthy, Lynne; Stahlberg, Eric; Stevens, Rick; Streitz, Fred;
Tourassi, Georgia; Xia, Fangfang; Zaki, George
Author affiliation(s) Los Alamos National Laboratory, Argonne National Laboratory, National Cancer Institute, Frederick National Laboratory for
Cancer Research, Lawrence Livermore National Laboratory, National Nuclear Security Administration, U.S. Department of
Energy Office of Science, University of Chicago, Oak Ridge National Laboratory
Primary contact Email: george dot zaki at nih dot gov
Editors Meerzaman, Daoud
Year published 2019
Volume and issue 9
Page(s) 984
DOI 10.3389/fonc.2019.00984
ISSN 2234-943X
Distribution license Creative Commons Attribution 4.0 International
Website https://www.frontiersin.org/articles/10.3389/fonc.2019.00984/full
Download https://www.frontiersin.org/articles/10.3389/fonc.2019.00984/pdf (PDF)

Abstract

The application of data science in cancer research has been boosted by major advances in three primary areas: (1) data: diversity, amount, and availability of biomedical data; (2) advances in artificial intelligence (AI) and machine learning (ML) algorithms that enable learning from complex, large-scale data; and (3) advances in computer architectures allowing unprecedented acceleration of simulation and machine learning algorithms. These advances help build in silico ML models that can provide transformative insights from data, including molecular dynamics simulations, next-generation sequencing, omics, imaging, and unstructured clinical text documents. Unique challenges persist, however, in building ML models related to cancer, including: (1) access, sharing, labeling, and integration of multimodal and multi-institutional data across different cancer types; (2) developing AI models for cancer research capable of scaling on next-generation high-performance computers; and (3) assessing robustness and reliability in the AI models. In this paper, we review the National Cancer Institute (NCI) -Department of Energy (DOE) collaboration, the Joint Design of Advanced Computing Solutions for Cancer (JDACS4C), a multi-institution collaborative effort focused on advancing computing and data technologies to accelerate cancer research on the molecular, cellular, and population levels. This collaboration integrates various types of generated data, pre-exascale compute resources, and advances in ML models to increase understanding of basic cancer biology, identify promising new treatment options, predict outcomes, and, eventually, prescribe specialized treatments for patients with cancer.

Keywords: cancer research, high-performance computing, artificial intelligence, deep learning, natural language processing, multi-scale modeling, precision medicine, uncertainty quantification

Introduction

Predictive computational models for patients with cancer can in the future support prevention and treatment decisions by informing choices to achieve the best possible clinical outcome. Toward this vision, in 2015, the national Precision Medicine Initiative (PMI)[1] was announced, motivating efforts to target and advance precision oncology, including looking ahead to the scientific, data, and computational capabilities needed to advance this vision. At the same time, the horizon of computing was changing in the life sciences, as the capabilities and transformations enabled by exascale computing were coming into focus, driven by the accelerated growth in data volumes and anticipated new sources of information catalyzed by new technologies and initiatives such as PMI.

The National Strategic Computing Initiative (NSCI) in 2015 named the Department of Energy (DOE) as a lead agency for “advanced simulation through a capable exascale computing program” and the National Institutes of Health (NIH) as one of the deployment agencies to participate “in the co-design process to integrate the special requirements of their respective missions.” This interagency coordination structure opened the avenue for a tight collaboration between the NCI and the DOE. With shared aims to advance cancer research while shaping the future for exascale computing, the NCI and DOE established the Joint Design of Advanced Computing Solutions for Cancer (JDACS4C) in June of 2016 through a five-year memorandum of understanding with three co-designed pilot efforts to address both national priorities. The high-level goals of these three pilots were to push the frontiers of computing technologies in specific areas of cancer research:

  • at the cellular level: advance the capabilities of patient-derived pre-clinical models to identify new treatments;
  • at the molecular level: further understand the basic biology of undruggable targets; and
  • at the population level: gain critical insights on the drivers of population cancer outcomes.

The pilots would also develop new uncertainty quantification (UQ) methods to evaluate confidence in the AI model predictions.

Using co-design principles, each of the pilots in the JDACS4C collaboration is based on—and driven by—team science, which is the hallmark of the collaboration's success. Enabled by deep learning, Pilot One (cellular-level) combines data in innovative ways to develop computationally predictive models for tumor response to novel therapeutic agents. Pilot Two (molecular-level) combines experimental data, simulation, and AI to provide new windows to understand and explore the biology of cancers related to the Ras superfamily of proteins. Pilot Three (population-level) uses AI and clinical information at unprecedented scales to enable precision cancer surveillance to transform cancer care.

AI and large-scale computing to predict tumor treatment response

After years of efforts within the research and pharmaceutical sectors, many patients with cancer still do not respond to standard-of-care treatments, and emergence of therapy resistance is common. Efforts in precision medicine may someday change this by using a targeted therapeutics approach, individually tailored to each patient based on predictive models that use molecular and drug signatures. The Predictive Modeling for Pre-Clinical Screening Pilot (Pilot One) aims to develop predictive capabilities of drug response in pre-clinical models of cancer to improve and expedite the selection and development of new targeted therapies for patients with cancer. Highlights of the work done in Pilot One are shown in Figure 1.


Fig1 Bhattacharya FrontInOnc2019 9.jpg

Figure 1. Pilot 1 research aims, general workflow, and supporting data

As omics data continues to accumulate, computational models integrating multimodal data sources become possible. Multimodal deep learning[2] aims to enhance learned features for one task by learning features over multiple modalities. Early Pilot One work[3] measured performance of multi-modal deep neural network drug pair response models with five-fold cross validation. Using the NCI-ALMANAC[4] data, best model performance was demonstrated when gene expression, microRNA, proteome, and Dragon7 drug descriptors[5] were combined obtaining an R-squared value of 0.944, which indicates that over 94% of the variation in tumor response is explained by the variation among the contributing gene expression, micro RNA expression, proteomics, and drug property data.

Mechanistically informed feature selection is an alternative approach that has the potential to increase predictive model performance. The LINCS landmark genes[6] for example have been used to train deep learning models to predict gene expression of non-landmark genes[7] and to classify drug-target interactions.[8] Ongoing work in Pilot One is exploring the impact on prediction using gene sets like that of the LINCS landmark genes and other mechanistically defined gene sets. The potential of employing mechanistically informed feature selection extends beyond improving prediction accuracy, to the realm of building models on the basis of existing biological knowledge.

Transfer learning is another area of important research activity. The goal of transfer learning is to improve learning in the target learning task by leveraging knowledge from an existing source task.[9] Given challenges in obtaining sufficient data for target Patient Derived Xenografts (PDXs), where tumors are grown in mouse host animals, ongoing transfer learning work holds promise for learning on cell lines as a source for the target PDX model predictions. Pilot One is first working on generating models that generalize across cell line studies, a precursor to transfer learning from cell lines to PDXs.

Using data from the NCI-ALMANAC[4], NCI-60[10], GDSC[11], CTRP[12], gCSI[13], and CCLE[14], models can be constructed that generalize across cell-line studies. For example, using multi-task networks which combine additional learning of three different classification tasks—tumor/normal, cancer type, and cancer site—with learning of the drug response task, it could be possible to capture more of the total variance and improve precision and recall when training on CTRP and predicting on CCLE. Demonstrating cross-study model capability will provide additional confidence that general models can be developed for prediction tasks on cell lines, PDXs, and organoids.

References

  1. "What is the Precision Medicine Iniative?". Genetics Home Reference. National Institutes of Health. 2019. https://ghr.nlm.nih.gov/primer/precisionmedicine/initiative. Retrieved 20 September 2019. 
  2. Sun, D.; Wang, M.; Li, A. (2019). "A Multimodal Deep Neural Network for Human Breast Cancer Prognosis Prediction by Integrating Multi-Dimensional Data". IEEE/ACM Transactions on Computational Biology and Bioinformatics 16 (3): 841–50. doi:10.1109/TCBB.2018.2806438. 
  3. Xia, F.; Shukla, M.; Brettin, T. et al. (2018). "Predicting tumor cell line response to drug pairs with deep learning". BMC Bioinformatics 19 (Suppl. 18): 486. doi:10.1186/s12859-018-2509-3. PMC PMC6302446. PMID 30577754. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6302446. 
  4. 4.0 4.1 Holbeck, S.L.; Camalier, R.; Crowell, J.A. et al. (2017). "The National Cancer Institute ALMANAC: A Comprehensive Screening Resource for the Detection of Anticancer Drug Pairs with Enhanced Therapeutic Activity". Cancer Research 77 (13): 3564-3576. doi:10.1158/0008-5472.CAN-17-0489. PMC PMC5499996. PMID 28446463. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5499996. 
  5. "Dragon". Kode Chemoinformatics srl. 2019. https://chm.kode-solutions.net/products_dragon.php. Retrieved 30 April 2019. 
  6. Subramanian, A.; Narayan, R.; Corsello, S.M. et al. (2017). "A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles". Cell 171 (6): P1437-1452.E17. doi:10.1016/j.cell.2017.10.049. PMC PMC5990023. PMID 29195078. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5990023. 
  7. Chen, Y.; Li, Y.; Narayan, R. et al. (2016). "Gene expression inference with deep learning". Bioinformatics 32 (12): 1832-9. doi:10.1093/bioinformatics/btw074. PMC PMC4908320. PMID 26873929. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4908320. 
  8. Xie, L.; He, S.; Song, X. et al. (2018). "Deep learning-based transcriptome data classification for drug-target interaction prediction". BMC Genomics 19 (Suppl. 7): 667. doi:10.1186/s12864-018-5031-0. PMC PMC6156897. PMID 30255785. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6156897. 
  9. Torrey, L.; Shavlik, J. (2010). "Chapter 11: Transfer Learning". In Olivas, E.S.; Guerrero, J.D.M.; Martinez-Sober, M. et al.. Handbook of Research on Machine Learning Applications and Trends: Algorithms, Methods, and Techniques. IGI Global. pp. 242–64. doi:10.4018/978-1-60566-766-9. ISBN 9781605667669. 
  10. "NCI-60 Human Tumor Cell Lines Screen". Developmental Therapeutics Program. National Institutes of Health. 26 August 2015. https://dtp.cancer.gov/discovery_development/nci-60/default.htm. 
  11. Yang, W.; Soares, J.; Greninger, P. et al. (2013). "Genomics of Drug Sensitivity in Cancer (GDSC): A resource for therapeutic biomarker discovery in cancer cells". Nucleic Acids Research 41 (DB1): D955-61. doi:10.1093/nar/gks1111. PMC PMC3531057. PMID 23180760. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3531057. 
  12. Basu, A.; Bodycombe, N.E.; Cheah, J.H. et al. (2013). "An interactive resource to identify cancer genetic and lineage dependencies targeted by small molecules". Cell 154 (5): 1151–61. doi:10.1016/j.cell.2013.08.003. PMC PMC3954635. PMID 23993102. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3954635. 
  13. Klijn, C.; Durinck, S.; Stawiski, E.W. et al. (2015). "A comprehensive transcriptional portrait of human cancer cell lines". Nature Biotechnology 33 (3): 306–12. doi:10.1038/nbt.3080. PMID 25485619. 
  14. Barretina, J.; Caponigro, G.; Stransky, N. et al. (2012). "The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity". Nature 483 (7391): 603-7. doi:10.1038/nature11003. PMC PMC3320027. PMID 22460905. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3320027. 

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added.