Design a site like this with
Get started

Transparency and interpretability by-design


Early machine learning models such as linear models and decision trees have been defined as so-called inherently interpretable, although their interpretability may depend on the task and type of model.

Take a look at the following example. Given N input-output pairs (x, y) , we assume that there is a linear function that maps each of the x to the labels y. The linear regression model is a function of the type:

f(x) = xW + b

The aim of learning a linear regression model is finding parameters W,b that minimize the error over our observed data, which can be computed as the average sum of squares. The coefficients in W are directly interpretable. They represent how much each feature in x contributes to the value of f(x).

Not all problems, however, can be modelled sufficiently well* as one or a combination of linear regressions.

*sufficiently well = with high testing accuracy


Explainable Boosting Machines (EBMs) are a type of generalised additive model that aims at modelling data interactions in a completely interpretable way. Have a look at the documentation here

Generalized Additive Model with automatic interaction detection. EBMs are often as accurate as state-of-the-art blackbox models while remaining TRANSPARENT

In the context of deep learning, making networks inherently interpretable generally implies increasing their decomposability and algorithmic transparency by reducing the complexity and using prior knowledge about the data structure. Enforced geometric structures of hidden features, for example equivariant representation learning allows us to map transformations in the inputs with predictable transformations in the activations.


Built-in interpretability aims at building models that are interpretable by construction. This can be obtained by following two paths: (i) developing models with a transparent design and inherent interpretability (ii) adding a self-explanatory module that generates explanations for the model predictions. 

In (i), transparency can be introduced in model design in multiple ways:

  1. By introducing parameter sparsity constraints such as in Gradient LASSO,

An L1 penalty is added to the gradients w.r.t. the score function of the prediction compared to a uniformly random guess:


where K is the number of classes.

2. Using intelligible decision functions e.g. monotone functions [Nguyen et al.]

3. Adding interpretability constraints to the model optimization, e.g. the interpretable decision sets by Lakkaraju et al. [KDD16] defines the rules of parsimony, non-redundancy and class coverage of the model.

4. By methods similar to Concept whitening 

In (ii), an explanation generator is added to the DL architecture to obtain an unsupervised self-explanatory design. A module can be trained to select subsets of the input features before these are passed to the network. Since the network outcome is based only on the input features subset,  the input subset itself corresponds to an explanation of the model outcome. 


Use these discussion prompts in our Discord

  • Linear regression and generalised additive models are the icon of transparent decision-making. Read the paper Scrutinizing XAI using linear ground-truth data with suppressor variables. What do the authors observe for the model in Eq. 2 ? What are suppressor variables?
  • Run the code example of EBMs on the Adult dataset, where the taks is to predict if a person is earning more or less than $50k per year using some general data. Run the explain_global() function. What features emerge as informative? What happens if you re-train the EBMs without the most informative feature? Optional: perform an ablation study driven by EBMS. Optional: compare EBMs performance to XGBoost.


Scrutinizing XAI using linear ground-truth data with suppressor variables.

Yin Lou, Rich Caruana, Johannes Gehrke, and Giles Hooker. Accurate intelligible models with pairwise interactions. In Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, 623–631. 2013.

Interpretable Decision Sets: A Joint Framework for Description and Prediction

The Mythos of Model Interpretability”, Lipton Zachary, 2017

%d bloggers like this: