Prof. Henning Müller and Mara Graziani (PhD student)
University of Applied Sciences of Western Switzerland
The three dimensions of interpretability. Activation Maximization and Feature Attribution
This class will analyze interpretability methods according to the three dimensions of interpretability defined by Samek et. al. on heatmapping.org, namely that of interpreting the data, interpreting the model or interpreting the outcomes. We will then analyze two examples of the last two dimensions: Activation Maximization to interpret the model, and Feature Attribution methods to interpret the outcomes. You will learn to distinguish the main theoretical differences behind these two approaches. During this class, you will see examples of the Lucid toolbox to generate visualizations of what maximally activates a given neuron in the network. You will also learn how to implement class activation mapping as a feature attribution method.
- Overview on the three dimensions of interpretability
- Dimention 1: interpreting the data. Maximum Mean Discrepancy criticism and Influential Instances
- Dimension 2 : interpreting the model with built-in interpretability. Ex. Interpretable Decision Sets
- Dimension 2: interpreting the model with Feature Visualization
- Dimension 3: interpreting the outcome with feature attribution
“Feature Visualization: How neural networks build up their understanding of images“, Olah Chris et al., 2017
Chapter 4, 8 and 9. Samek, W., Montavon, G., Vedaldi, A., Hansen, L.K. and Müller, K.R. eds., 2019. Explainable AI: interpreting, explaining and visualizing deep learning (Vol. 11700). Springer Nature.
Visualize some entire layers by optimizing the deepdream function. What kind of differences do you see as you go deeper in the network?
Keep working on the Grad-CAM notebook. Extract the class object out of the image by applying a fixed threshold and segmenting out anything below that threshold, obtaining images similar to the one below.
How can you obtain this segmentation? How does the segmentation change as you move the threshold?
Check out the Grad-CAM paper. How can you change the grad_cam function in the Colab notebook to obtain conterfactual explanations? Generate some conterfactual explanations for the images in A2.
Comment on the limitations of Grad-CAM and Feature visualization.