Dr. Suraj Srinivas
Ecole Polytechnique Fédérale de Lausanne (EPFL) & Idiap Research Institute
Suraj Srinivas recently completed his PhD at Idiap Research Institute and EPFL, where he worked with Prof. Francois Fleuret on gradient-based interpretability and transfer learning problems. Before that, he completed a masters by research from the Indian Institute of Science, Bangalore where he worked with Prof. Venkatesh Babu on neural network compression. His research interests are broadly relating to the interpretability, robustness and compression of deep neural networks. He serves as a programme committee member for NeurIPS, ICML, ICLR,
Pitfalls of Saliency Map Interpretation in Deep Learning
A popular method of interpreting neural networks is to use saliency map representations, which assign importance scores to each input feature of the model. In this talk, I will discuss two of our recent works which expose pitfalls in these methods. First, we will discuss how existing saliency maps cannot satisfy two desirable properties simultaneously, and we propose the “full-gradient representation” which avoids these problems. Based on this representation, we propose an approximate saliency method called FullGrad which we find explains model behaviour better than competing methods in literature. Second, we find that a popular saliency map method, the input-gradients, can be arbitrarily structured due to the shift-invariance of softmax. We investigate why standard neural network models have input-gradients with interpretable structure even when this is unnecessary, and we find that standard models have an implicit generative modelling component, which is responsible for this behaviour. Overall, our works show that interpreting black-box models using off-the shelf interpretability methods can be risky and they must be used with caution.