Lecture 1

May 27 16.00 – 17.30

Prof. Henning Müller and Mara Graziani (PhD student)

University of Applied Sciences of Western Switzerland

The Where and Why of Interpretability

This class will focus on understanding the definition of interpretability. Understanding the inner mechanisms of machine learning, and especially of deep learning algorithms, has been tackled in several research works. The variety in the objectives of these works, i.e. debugging, justifying outcomes, evaluating robustness or fairness, led to inconsistencies in how interpretable AI is defined. Multiple works use different terminology and this may cause confusion when approaching AI interpretability.

During this class, we will look at the definitions of the words “interpretable”, “explainable”, “intelligible”, “understandable”, “transparent” and “comprehensible”, which are often used interchangeably in the literature. To understand multiple perspectives on the terminology we will read and discuss papers describing the problem of “interpreting” automated decisions. We will start by the formal definition by Doshi-Velez and Kim of interpretability as the act of “explaining or presenting in understandable terms to a human”.

What does “understandable to a human” mean from the cognitive and psychological perspective? What are the legal constraints around giving explanations? What is the social impact of an explanation, what are the ethical concerns? These questions, among others, will be discussed in this class.

Class Outline
  1. Introduction to the course and motivation by Prof. Henning Müller
  2. Overview of the course structure, resources, assignments and contact points
  3. Description of course assessment modalities
  4. On the “lack of rigor” in the definition of interpretability
  5. Viewpoint from the cognitive sciences, implications on the society, ethical use of AI
  6. Requirements from the General Data Protection Regulation about the use of AI tools (European legislation)
  7. Taxonomy based on the methodology and other taxonomies


“The Mythos of Model Interpretability”, Lipton Zachary, 2017.

“Towards A Rigorous Science of Interpretable Machine Learning” Finale Doshi-Velez and Been Kim, 2017.

Medium posts referenced in the video: A Gentle Introduction to Deep Learning — [Part 1 ~ Introduction] and Deep Learning vs Classical Machine Learning

“Interpretable machine learning: definitions, methods, and applications.”, Murdoch, et al., 2019


Introduce yourself on the Facebook group dedicated to the course. What is your name? What university are you enrolled in? What is your applicative domain? What sort of tasks would you want to apply interpretability techniques to? No score be given to this assignment, but you are strongly encouraged to interact with the other students.

Read the papers in the material section with a critical eye. Join the discussion on the group page. No score will be given to this assignment, but still, try to find an answer to the questions in the list below:

  • The very first AI systems were easily interpretable. Do you have an example of an easy-to-interpret machine learning model? Can you make this model more complex (e.g. larger number of features, more steps to take to the decision making, intense parameter tuning)? Can you see a trade-off between the complexity of this model and its interpretability? How can we define model transparency? What is the main difference when we refer to transparency as opposed to the explainability of a model?
  • Interpretability is formally defined as the “ability to explain or to present in understandable terms to a human”. How can we characterize an explanation and what makes some explanations better than others?


In the next course, we will analyze the three dimensions of interpretable AI. There is, however, a further dimension of generating explanations for automated outcomes that must be considered as a prior-step to the interpretability analysis. Explanations have a social dimension, being an important part of human interactions and enabling the construction of trust and confidence around algorithms.

  • Given an applicative scenario of your choice (e.g. autonomous-driving, healthcare, credit allowance, …), can you imagine how many different recipents may require an explanation for the models’ automated decision? What kind of detail and information content would be needed for each of them? For example, for AI applications in the medical domain recipients of explanations may be patients, doctor and institutions. Each of the recipients may need a different type of information content as an explanation.


A. Weller in “Transparency: Motivation and Challenges” identifies two main roles acting around transparency: the audience of an explanation and the beneficiary. What is the distinction between these two figures? Do you see some ethical concerns that may arise around the generation of explanations? (hint: what if explanations were developed to soothe, or to convince users?)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Create your website with WordPress.com
Get started
%d bloggers like this: