Definitions, methods, and applications in interpretable machine learning

Machine-learning models have demonstrated great success in learning complex patterns that enable them to make predictions about unobserved data. In addition to using models for prediction, the ability to interpret what a model has learned is receiving an increasing amount of attention. However, this...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings of the National Academy of Sciences - PNAS Vol. 116; no. 44; pp. 22071 - 22080
Main Authors	Murdoch, W. James, Singh, Chandan, Kumbier, Karl, Abbasi-Asl, Reza, Yu, Bin
Format	Journal Article
Language	English
Published	United States National Academy of Sciences 29.10.2019
Subjects	Artificial intelligence Computer simulation explainability interpretability Learning algorithms Machine learning MATHEMATICS AND COMPUTING Modularity Physical Sciences Predictions relevancy Subgroups interpretability relevancy explainability machine learning
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Machine-learning models have demonstrated great success in learning complex patterns that enable them to make predictions about unobserved data. In addition to using models for prediction, the ability to interpret what a model has learned is receiving an increasing amount of attention. However, this increased focus has led to considerable confusion about the notion of interpretability. In particular, it is unclear how the wide array of proposed interpretation methods are related and what common concepts can be used to evaluate them. We aim to address these concerns by defining interpretability in the context of machine learning and introducing the predictive, descriptive, relevant (PDR) framework for discussing interpretations. The PDR framework provides 3 overarching desiderata for evaluation: predictive accuracy, descriptive accuracy, and relevancy, with relevancy judged relative to a human audience. Moreover, to help manage the deluge of interpretation methods, we introduce a categorization of existing techniques into model-based and post hoc categories, with subgroups including sparsity, modularity, and simulatability. To demonstrate how practitioners can use the PDR framework to evaluate and understand interpretations, we provide numerous real-world examples. These examples highlight the often underappreciated role played by human audiences in discussions of interpretability. Finally, based on our framework, we discuss limitations of existing methods and directions for future work. We hope that this work will provide a common vocabulary that will make it easier for both practitioners and researchers to discuss and choose from the full range of interpretation methods.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 USDOE Office of Science (SC) Natural Sciences and Engineering Research Council of Canada (NSERC) National Science Foundation (NSF) AC02-05CH11231; W911NF1710005; N00014-16-1-2664; DMS-1613002; IIS 1741340; CCF-0939370 US Army Research Office (ARO) US Department of the Navy, Office of Naval Research (ONR) 2K.K. and R.A.-A. contributed equally to this work. Contributed by Bin Yu, July 1, 2019 (sent for review January 16, 2019; reviewed by Rich Caruana and Giles Hooker) Author contributions: W.J.M., C.S., K.K., R.A.-A., and B.Y. designed research; W.J.M., C.S., K.K., and R.A.-A., performed research; and W.J.M. and C.S. wrote the paper. Reviewers: R.C., Microsoft Research; and G.H., Cornell University. 1W.J.M. and C.S. contributed equally to this work.
ISSN:	0027-8424 1091-6490 1091-6490
DOI:	10.1073/pnas.1900654116