Journal:Explainability for artificial intelligence in healthcare: A multidisciplinary perspective

From LIMSWiki
Revision as of 22:52, 28 December 2020 by Shawndouglas (talk | contribs) (Saving and adding more.)
Jump to navigationJump to search
Full article title Explainability for artificial intelligence in healthcare: A multidisciplinary perspective
Journal BMC Medical Informatics and Decision Making
Author(s) Amann, Julia; Blasimme Allesandro; Vayena, Effy; Frey, Dietmar; Madai, Vince I.; Precise4Q Consortium
Author affiliation(s) ETH Zürich, Charité – Universitätsmedizin Berlin, Birmingham City University
Primary contact Online contact form
Year published 2020
Volume and issue 20
Page(s) 310
DOI 10.1186/s12911-020-01332-6
ISSN 1472-6947
Distribution license Creative Commons Attribution 4.0 International
Website https://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/s12911-020-01332-6
Download https://bmcmedinformdecismak.biomedcentral.com/track/pdf/10.1186/s12911-020-01332-6.pdf (PDF)

Abstract

Background: Explainability is one of the most heavily debated topics when it comes to the application of artificial intelligence (AI) in healthcare. Even though AI-driven systems have been shown to outperform humans in certain analytical tasks, the lack of explainability continues to spark criticism. Yet, explainability is not a purely technological issue; instead, it invokes a host of medical, legal, ethical, and societal questions that require thorough exploration. This paper provides a comprehensive assessment of the role of explainability in medical AI and makes an ethical evaluation of what explainability means for the adoption of AI-driven tools into clinical practice.

Methods: Taking AI-based clinical decision support systems as a case in point, we adopted a multidisciplinary approach to analyze the relevance of explainability for medical AI from the technological, legal, medical, and patient perspectives. Drawing on the findings of this conceptual analysis, we then conducted an ethical assessment using Beauchamp and Childress' Principles of Biomedical Ethics (autonomy, beneficence, nonmaleficence, and justice) as an analytical framework to determine the need for explainability in medical AI.

Results: Each of the domains highlights a different set of core considerations and values that are relevant for understanding the role of explainability in clinical practice. From the technological point of view, explainability has to be considered both in terms of how it can be achieved and what is beneficial from a development perspective. When looking at the legal perspective, we identified informed consent, certification, and approval as medical devices, and liability as core touchpoints for explainability. Both the medical and patient perspectives emphasize the importance of considering the interplay between human actors and medical AI. We conclude that omitting explainability in clinical decision support systems poses a threat to core ethical values in medicine and may have detrimental consequences for individual and public health.

Conclusions: To ensure that medical AI lives up to its promises, there is a need to sensitize developers, healthcare professionals, and legislators to the challenges and limitations of opaque algorithms in medical AI and to foster multidisciplinary collaboration moving forward.

Background

All over the world, healthcare costs are skyrocketing. Increasing life expectancy, soaring rates of chronic diseases, and the continuous development of costly new therapies contribute to this trend. Thus, it comes as no surprise that scholars predict a grim future for the sustainability of healthcare systems throughout the world. Artificial intelligence (AI) promises to alleviate the impact of these developments by improving healthcare and making it more cost-effective.[1] In clinical practice, AI often comes in the form of clinical decision support systems (CDSSs), assisting clinicians in diagnosis of disease and treatment decisions. Where conventional CDSSs match the characteristics of individual patients to an existing knowledge base, AI-based CDSSs apply artificial intelligence models trained on data from patients matching the use-case at hand. Yet, despite its undeniable potential, AI is not a universal solution. As history has shown, technological progress always goes hand in hand with novel questions and significant challenges. Some of these challenges are tied to the technical properties of AI, while others relate to the legal, medical, and patient perspectives, making it necessary to adopt a multidisciplinary perspective.

In this paper, we take such a multidisciplinary view on a major medical AI challenge: explainability. In its essence, explainability can be understood as a characteristic of an AI-driven system allowing a person to reconstruct why a certain AI came up with the predictions it offered. An important point to note here is that explainability has many facets and, unfortunately, the terminology of explainability is not well defined. Other terms such as interpretability and/or transparency are often used synonymously.[2][3] We thus simply refer to explainability or explainable AI throughout the manuscript and add the necessary context for understanding.

Explainability is a heavily debated topic with far-reaching implications that extend beyond the technical properties of AI. Even though research indicates that AI algorithms can outperform humans in certain analytical tasks (e.g., pattern recognition in imaging), the lack of explainability for AI in the medical domain has been criticized.[4] Legal and ethical uncertainties surrounding this issue may impede progress and prevent novel technologies from fulfilling their potential to improve patient and population health. Yet, without thorough consideration of the role of explainability in medical AI, these technologies may forgo core ethical and professional principles, disregard regulatory issues, and cause considerable harm.[5]

To contribute to the discourse on explainable AI in medicine, this paper seeks to draw attention to the interdisciplinary nature of explainability and its implications for the future of healthcare. In particular, our work focuses on the relevance of explainability for a CDSS. The originality of our work lies in the fact that we look at explainability from multiple perspectives that are often regarded as independent and separable from each other. This paper has two central aims: (1) to provide a comprehensive assessment of the role of explainability in CDSSs for use in clinical practice and; (2) to make an ethical evaluation of what explainability means for the adoption of AI-driven tools into clinical practice.

Methods

Taking AI-based CDSSs as a case in point, we discuss the relevance of explainability for medical AI from the technological, legal, medical, and patient perspective. To this end, we performed a conceptual analysis of the pertinent literature on explainable AI in these domains. In our analysis, we aimed to identify aspects relevant to determining the necessity and role of explainability for each domain, respectively. Drawing on these different perspectives, we then conclude by distilling the ethical implications of explainability for the future use of AI in the healthcare setting. We do the latter by examining explainability against the four ethical principles of autonomy, beneficence, non-maleficence, and justice.

Results

Technological perspective

From the technological perspective, we will explore two issues. First, what explainability methods are, and second, where they are applied in medical AI development.

With regards to methodology, explainability can either be an inherent characteristic of an algorithm or can be approximated by other methods.[2] The latter is highly important for methods that have until recently been labeled as “black-box models,” such as artificial neural network (ANN) models. To explain their predictions, however, numerous methods exist today.[6] Importantly, however, inherent explainability will, in general, be more accurate than methods that only approximate explainability.[2] This can be attributed to the complex characteristics of many modern machine learning methods. In ANNs, for example, the inner workings of sometimes millions of weights between artificial neurons need to be interpreted in a way that humans can understand. Thus, contrasting methods with inherent explainability have a crucial advantage. However, these methods are usually also traditional methods, such as linear or logistic regression. For many use cases, there is an inferiority of these traditional methods in performance compared to modern state-of-the-art methods such as ANNs.[7] Thus, there is a trade-off between performance and explainability, and this trade-off is a big challenge for the developers of CDSSs. It should be noted that some assume that this trade-off does not exist in reality, but it is a mere artifact of suboptimal modelling approaches, as pointed out by Rudin et al.[2] While the work of Rudin et al. is important to raise attention to the shortcomings of approximating explainability methods, it is likely that some approximating methods, in contrast to the notion of Rudin et al.[2] , have value given the complex nature of explaining machine learning models. Additionally, while we can make the qualitative assessment that inherent explainability is likely better than approximated explainability, there exist only initial exploratory attempts to rank explainability methods quantitatively.[8] Notwithstanding, for many applications—and generally in AI product development—there is a de facto preference for modern algorithms such as ANNs. Additionally, it cannot be ruled out that for some applications, such modern methods do exhibit actual higher performance. This necessitates to critically assess explainability methods further, both with regards to technical development, e.g., for methods ranking and optimization of methods for certain inputs, and with regards to the role of explainability from a multiple stakeholder view, as done in the current work.

From the development point-of-view, explainability will regularly be helpful for developers to sanity check their AI models beyond mere performance. For example, it is highly beneficial to rule out that the prediction performance is based on metadata rather than the data itself. A famous non-medical example was the classification task to discern between huskies and wolves, where the prediction was solely driven by the identification of a snowy background rather than real differences between huskies and wolves.[9] This phenomenon is also called a “Clever Hans” phenomenon.[10] Clever Hans phenomena are also found in medicine. An example is the model developed by researchers from Mount Sinai Health System which performed very well in distinguishing high-risk patients from non-high-risk patients based on x-ray imaging. However, when the tool was applied outside of Mount Sinai, the performance plummeted. As it turned out, the AI model did not learn clinically relevant information from the images. In analogy to the snowy background in the example introduced above, the prediction was based on hardware related metadata tied to the specific x-ray machine that was used to image the high-risk ICU patients exclusively at Mount Sinai.[11] Thus, the system was able to distinguish only which machine was used for imaging and not the risk of the patients. Explainability methods allow developers to identify these types of errors before AI tools go into clinical validation and the certification process, as the Clever Hans predictors (snowy background, hardware information) would be identified as prediction relevant by the explainability methods rather than meaningful features from a domain perspective. This saves time and development costs. It should be noted that explainability methods aimed at developers to provide insight into their models have different prerequisites than systems aimed at technologically unsavvy end users such as clinical doctors and patients. For developers, these methods can be more complex in their approach and visualization.

Legal perspective

From the legal perspective, the question arises if and—if yes—to what extent explainability in AI is legally required. Taking the cue from other fields such as public administration, transparency and traceability have to meet even higher standards when it comes to health care and the individual patient.[12] As shown above, artificial intelligence approaches such as machine learning and deep learning have the potential to significantly advance the quality of health care. Identifying patterns in diagnostics, anomaly detection and, in the end, providing decision support are already changing standards of care and clinical practice. To fully exploit these opportunities for improving patients’ outcomes and saving lives by advancing detection, prevention, and treatment of diseases, the sensitive issues of data privacy and security, patient consent, and autonomy have to be fully considered. This means that from a legal perspective, data—across the cycle of acquisition, storage, transfer, processing, and analysis—will have to comply with all laws, regulations and further legal requirements. In addition, the law and its interpretation and implementation have to constantly adapt to the evolving state-of-the-art in technology.[13] Even when fulfilling all of these rather obvious requirements, the question remains if the application of AI-driven solutions and tools demand explainability. In other words, do doctors and patients need information not only about the results that are provided but also about the characteristics and features these results are based upon, and the respective underlying assumptions? And might the necessary inclusion of other stakeholders require an understanding and explainability of algorithms and models?

From a Western legal point-of-view, we identified three core fields for explainability: (1) informed consent, (2) certification and approval as medical devices (according to U.S. Food and Drug Administration [FDA] and Medical Device Reporting [MDR] regulations), and (3) liability.

Personal health data may only be processed by law after the individual consents to its use. In the absence of general laws facilitating the use of personal data and information, this informed consent is the standard for today’s use of patient data in AI applications.[14] This is particularly challenging since the consent has to be specified in advance, i.e., the purpose of the given project and its aims have to be outlined. The natural advantage of AI is that it does not necessitate pre-selection of features and can identify novel patterns or find new biomarkers. If restricted to specific purposes—as required for informed consent—this unique advantage might not be fully exploitable. For obtaining informed consent for diagnostic procedures or interventions, the law requires individual and comprehensive information about and understanding of these processes. In the case of AI-based decision support, the underlying processes and algorithms therefore have to be explained to the individual patient. Just like in the case of obtaining consent for undergoing a magnetic resonance imaging (MRI) procedure, the patient might not necessarily need to know every detail but certainly has to be informed about core principles, and especially the risks. Yet, contrary to an MRI procedure, physicians are unable to provide this type of information for an opaque CDSS. What physicians should at least be able to provide are explanations around two principles: (1) the agent view of AI, i.e., what it takes as input, what it does with the environment, and what it produces as output; and (2) explaining the training of the mapping which produces the output by letting it learn from examples, which encompasses unsupervised, supervised, and reinforcement learning. Yet, it is important to note that for AI-based CDSSs, the extent of the information is a priori highly difficult to define, has to be adjusted to the respective use case, and will certainly need clarification from the legislative bodies. For this, a framework for defining the "right" level of explainability, as Maxwell et al. put it[15], should be developed. Clearly, this also raises important questions about the role and tasks of physicians, underscoring the need for tailored training and professional development in the area of medical AI.

With regard to certification and approval as medical devices, the respective bodies have been slow to introduce requirements for explainable AI and its implications on the development and marketing of products. In a recent discussion paper, the FDA facilitates in its total product lifecycle approach (TPLC) the constant development and improvement of AI-based medical products. Explainability is not mentioned, but an "appropriate level of transparency (clarity) of the output and the algorithm aimed at users" is required.[16] This is mainly aimed at the functions of the software and its modifications over time. The MDR regulation does not specifically regulate the need for explainability with regard to medical devices that use artificial intelligence and machine learning in particular. However, also here, the need for accountability and transparency are set and the evolution of xAI might lead the legislative and the notified bodies to change the regulations and their interpretation accordingly.

In conclusion, both FDA and MDR are currently vaguely requiring explainability, i.e., information for traceability, transparency, and explainability of development of ML/DL models that inform medical treatment. Most certainly, these requirements will be defined more precisely in the future mandating producers of AI-based medical devices/software to provide insight into the training and testing of the models, the data, and the overall development processes. We would also like to mention that there is a current debate on whether the European Union's General Data Protection Regulation (GDPR) requires the use of explainable AI in tools working with patient data.[17][18] Also here, it cannot be ruled out that the currently ambiguous phrasings will be amended in favor of one that promotes explainability in the future.


Medical perspective

Patient perspective

Ethical implications

Conclusion

References

  1. Higgins, D.; Madai, V.I. (2020). "From Bit to Bedside: A Practical Framework for Artificial Intelligence Product Development in Healthcare". Advanced Intelligent Systems 2 (10): 2000052. doi:10.1002/aisy.202000052. 
  2. 2.0 2.1 2.2 2.3 2.4 Rudin, C. (2019). "Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead". Nature Machine Intelligence 1: 206–15. doi:10.1038/s42256-019-0048-x. 
  3. Doran, D.; Schulz, S.; Besold, T.R. (2017). "What Does Explainable AI Really Mean? A New Conceptualization of Perspectives". arXiv. https://arxiv.org/abs/1710.00794v1. 
  4. Shortliffe E.H.; Sepúlveda, M.J. (2018). "Clinical Decision Support in the Era of Artificial Intelligence". JAMA 320 (21): 2199–2200. doi:10.1001/jama.2018.17163. 
  5. Obermeyer, Z.; Powers, B.; Vogeli, C. et al. (2019). "Dissecting racial bias in an algorithm used to manage the health of populations". Science 366 (6464): 447-453. doi:10.1126/science.aax2342. 
  6. Samek, W.; Montavon, G.; Vedaldi, A. et al., ed. (2019). Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. Springer Nature. doi:10.1007/978-3-030-28954-6. ISBN 9783030289546. 
  7. Esteva, A.; Robicquet, A.; Ramsundar, B. et al. (2019). "A guide to deep learning in healthcare". Nature Medicine 25 (1): 24-29. doi:10.1038/s41591-018-0316-z. PMID 30617335. 
  8. Islam, S.R.; Eberle, W.; Ghafoor, S.K. (2019). "Towards Quantification of Explainability in Explainable Artificial Intelligence Methods". arXiv. https://arxiv.org/abs/1911.10104v1. 
  9. Samek, W.; Montavon, G.; Lapuschkin, S. et al. (2020). "Toward Interpretable Machine Learning: Transparent Deep Neural Networks and Beyond". arXiv. https://arxiv.org/abs/2003.07631v1. 
  10. Lapuschkin, S.; Wäldchen, S.; Binder, A. et al. (2019). "Unmasking Clever Hans predictors and assessing what machines really learn". Nature Communications 10 (1): 1096. doi:10.1038/s41467-019-08987-4. PMC PMC6411769. PMID 30858366. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6411769. 
  11. Zech, J.R.; Badgeley, M.A.; Liu, M. et al. (2018). "Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study". PLoS Medicine 15 (11): e1002683. doi:10.1371/journal.pmed.1002683. PMC PMC6219764. PMID 30399157. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6219764. 
  12. Olsen, H.P.; Slosser, J.L.; Hildebrandt, T.T. et al. (2019). "What's in the Box? The Legal Requirement of Explainability in Computationally Aided Decision-Making in Public Administration". iCourts Working Paper Series No. 162. SSRN. doi:10.2139/ssrn.3402974. 
  13. Hörnle, J. (2019). "Juggling more than three balls at once: multilevel jurisdictional challenges in EU Data Protection Regulation". International Journal of Law and Information Technology 27 (2): 142–170. doi:10.1093/ijlit/eaz002. 
  14. Cohen, I.G. (2020). "Informed Consent and Medical Artificial Intelligence: What to Tell the Patient?". Georgetown Law Journal 108: 1425–69. doi:10.2139/ssrn.3529576. 
  15. Maxwell, W.; Beaudouin, V.; Bloch, I. et al. (2020). "Identifying the 'Right' Level of Explanation in a Given Situation". CEUR Workshop Proceedings 2659: 63. doi:10.2139/ssrn.3604924. 
  16. U.S. Food and Drug Administration (2020). "Proposed Regulatory Framework for Modifications to Artificial Intelligence/Machine Learning (AI/ML)-based Software as a Medical Device (SaMD)" (PDF). pp. 20. https://www.fda.gov/files/medical%20devices/published/US-FDA-Artificial-Intelligence-and-Machine-Learning-Discussion-Paper.pdf. Retrieved 05 July 2020. 
  17. Hacker, P.; Krestel, R.; Grundmann, S. et al. (2020). "Explainable AI under contract and tort law: Legal incentives and technical challenges". Artificial Intelligence and Law 28: 415–39. doi:10.1007/s10506-020-09260-6. 
  18. Ferretti, A.; Schneider, M.; Blasime, A. (2018). "Machine Learning in Medicine: Opening the New Data Protection Black Box". European Data Preotection Law Review 104 (3): 320–32. doi:10.21552/edpl/2018/3/10. 

Notes

This presentation is faithful to the original, with only a few minor changes to presentation. In some cases important information was missing from the references, and that information was added.