
Prof. Henning Müller and Mara Graziani (PhD student)
University of Applied Sciences of Western Switzerland
The three dimensions of interpretability. Activation Maximization and Feature Attribution
This class will analyze interpretability methods according to the three dimensions of interpretability defined by Samek et. al. on heatmapping.org, namely that of interpreting the data, interpreting the model or interpreting the outcomes. We will then analyze two examples of the last two dimensions: Activation Maximization to interpret the model, and Feature Attribution methods to interpret the outcomes. You will learn to distinguish the main theoretical differences behind these two approaches. During this class, you will see examples of the Lucid toolbox to generate visualizations of what maximally activates a given neuron in the network. You will also learn how to implement class activation mapping as a feature attribution method.
Class Outline
- Overview on the three dimensions of interpretability
- Dimention 1: interpreting the data. Maximum Mean Discrepancy criticism and Influential Instances
- Dimension 2 : interpreting the model with built-in interpretability. Ex. Interpretable Decision Sets
- Dimension 2: interpreting the model with Feature Visualization
- Deconvolution
- Gradient Ascent – Assignment 1
- Dissection
- Dimension 3: interpreting the outcome with feature attribution
- Saliency maps
- Sensitivity maps
- Activation maps – Assignment 2
Material
“Feature Visualization: How neural networks build up their understanding of images“, Olah Chris et al., 2017
Book Chapters
Chapter 4, 8 and 9. Samek, W., Montavon, G., Vedaldi, A., Hansen, L.K. and Müller, K.R. eds., 2019. Explainable AI: interpreting, explaining and visualizing deep learning (Vol. 11700). Springer Nature.
Assignments
A1.
Clone the GitHub repository https://github.com/maragraziani/intro-interpretableAI.git.
Open the first Colab notebook L2A1 – Lucid Tutorial. Follow the notebook and generate visualizations with the toolbox. Change the models being analyzed in the notebook, for example, visualize InceptionV1 or ResnetV1_101_slim. Extend the visualizations to the multiple layers in your architecture. Analyze some of the neurons. What do you think they are looking at? Can you try to guess for some of them? Post the results on the Facebook group page describing your results. If you need help, inspiration, check the full analysis of InceptionV1 made by Olah et al.
Visualize some entire layers by optimizing the deepdream function. What kind of differences do you see as you go deeper in the network?
A2.
Open the Colab notebook L2A2 – GradCAM Tutorial. Generate Grad-CAM visualizations for these inputs. Extend the experiments to introduce more models in the analyses. Generate visualizations for the same inputs for VGG, ResNet and InceptionV3 with ImageNet pretrained weights. How do the activation maps differ? How are they similar?
A3.
Keep working on the Grad-CAM notebook. Extract the class object out of the image by applying a fixed threshold and segmenting out anything below that threshold, obtaining images similar to the one below.

How can you obtain this segmentation? How does the segmentation change as you move the threshold?
A4.
Check out the Grad-CAM paper. How can you change the grad_cam function in the Colab notebook to obtain conterfactual explanations? Generate some conterfactual explanations for the images in A2.
A5.
Comment on the limitations of Grad-CAM and Feature visualization.