Existing methods to interpret AI models can be placed along three axes, each representing a “dimension” of interpretability

The three dimensions of interpretability were first defined by Samek et. al. on heatmapping.org, namely that of interpreting the data, interpreting the model or interpreting the outcomes. In the video below, we see examples of the methods for each dimension. We will then analyse two examples of the last two dimensions: Activation Maximization to interpret the model, and Feature Attribution methods to interpret the outcomes. You will learn to distinguish the main theoretical differences behind these two approaches. During this class, you will see examples of the Lucid toolbox to generate visualizations of what maximally activates a given neuron in the network. You will also learn how to implement class activation mapping as a feature attribution method.
READING
- Interpreting the data. Maximum Mean Discrepancycriticism and Influential Instances
- Interpreting the model with built-in interpretability. Ex. Interpretable Decision Sets
- Interpreting the model with Feature Visualization
- Deconvolution
- Gradient Ascent – Assignment 1
- Dissection
- Interpreting the outcome with feature attribution
- Saliency maps
- Sensitivity maps
- Activation maps – Assignment 2
ASSIGNMENTS
Feel free to compare your results / share insights / ask for help on our Discord
- Clone the GitHub repository https://github.com/maragraziani/intro-interpretableAI.git. Open the first Colab notebook L2A1 – Lucid Tutorial. Follow the notebook and generate visualizations with the toolbox. Change the models being analyzed in the notebook, for example, visualize InceptionV1 or ResnetV1_101_slim. Extend the visualizations to the multiple layers in your architecture. Analyze some of the neurons. What do you think they are looking at? Can you try to guess for some of them? Post the results on the Facebook group page describing your results. If you need help, inspiration, check the full analysis of InceptionV1 made by Olah et al. Visualize some entire layers by optimizing the deepdream function. What kind of differences do you see as you go deeper in the network?
- Open the Colab notebook L2A2 – GradCAM Tutorial. Generate Grad-CAM visualizations for these inputs. Extend the experiments to introduce more models in the analyses. Generate visualizations for the same inputs for VGG, ResNet and InceptionV3 with ImageNet pretrained weights. How do the activation maps differ? How are they similar?
- Keep working on the Grad-CAM notebook. Extract the class object out of the image by applying a fixed threshold and segmenting out anything below that threshold, obtaining images similar to the one below. How can you obtain the segmentation below? How does the segmentation change as you move the threshold?

4. Check out the Grad-CAM paper. How can you change the grad_cam function in the Colab notebook to obtain conterfactual explanations? Generate some conterfactual explanations for the images in A2. Comment on the limitations of Grad-CAM and feature visualisation methods
References
“Learning Deep Features for Discriminative Localization“, Zhou Bolei, et al., 2016
“Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization” Selvaraju R.R., et al., 2017
“Feature Visualization: How neural networks build up their understanding of images“, Olah Chris et al., 2017