Design a site like this with
Get started

The “where” and “why” of interpretability

Interpretability of AI has to be placed in the wider context of AI Safety.

WHY Interpretability?

Henning’s opening of the course back in May 2021

WHERE Interpretability?

While not all ML models require interpretability, this is desirable in various applications, in particular in high-stakes predictions. For example, interpretability is important for ML deployed in critical areas such as medicine, finance and law [2]. Interpreting a model can provide insights about the problem and data. It can help detecting biases of the model and understand its failure cases. Besides, social acceptance, when incorporating ML models into people’s daily life may rely on understanding the behavior. The EU regulation on data protection claims the right for meaningful information about the logic involved in automated decision processes, starting the discussion about the meaning of interpretation and algorithmic transparency [9].

Let’s look again at the alternative formal definition by Doshi-Velez and Kim of interpretability as the act of “explaining or presenting in understandable terms to a human”.

What does “understandable to a human” mean from the cognitive and psychological perspective?

The understandability of something identifies the property of an object, may this be a model or the outcome of interpretability methods, to be understood by a human. Because the wiring of the neurons constituting the areas in the semantic memory is a result of individual experiences, understandability incorporates some degree of subjectivity and variability, e.g. what is understandable to someone may not be understandable to someone else. In the specific context of AI interpretability, to achieve understandability there is no need of prior training concerning the feature extraction, hyper-parameter selection and training of ML models.

What are the legal constraints around giving explanations? What is the social impact of an explanation, what are the ethical concerns?


Use these discussion prompts in our Discord

– The very first AI systems were easily interpretable. Do you have an example of an easy-to-interpret machine learning model? Can you make this model more complex (e.g. larger number of features, more steps to take to the decision making, intense parameter tuning)? Can you see a trade-off between the complexity of this model and its interpretability? How can we define model transparency? What is the main difference when we refer to transparency as opposed to the explainability of a model?

– Interpretability is formally defined as the “ability to explain or to present in understandable terms to a human”. How can we characterize an explanation and what makes some explanations better than others?

– There is a dimension of generating explanations for automated outcomes that must be considered as a prior-step to the interpretability analysis. Explanations have a social dimension, being an important part of human interactions and enabling the construction of trust and confidence around algorithms. Given an applicative scenario of your choice (e.g. autonomous-driving, healthcare, credit allowance, …), can you imagine how many different recipents may require an explanation for the models’ automated decision? What kind of detail and information content would be needed for each of them? For example, for AI applications in the medical domain recipients of explanations may be patients, doctor and institutions. Each of the recipients may need a different type of information content as an explanation.

– A. Weller in “Transparency: Motivation and Challenges” identifies two main roles acting around transparency: the audience of an explanation and the beneficiary. What is the distinction between these two figures? Do you see some ethical concerns that may arise around the generation of explanations? (hint: what if explanations were developed to soothe, or to convince users?)


“Interpretable machine learning: definitions, methods, and applications.”, Murdoch, et al., 2019

Towards A Rigorous Science of Interpretable Machine Learning” Finale Doshi-Velez and Been Kim, 2017.

The Mythos of Model Interpretability”, Lipton Zachary, 2017

A Gentle Introduction to Deep Learning — [Part 1 ~ Introduction] and Deep Learning vs Classical Machine Learning


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: