SHAP feature importances tested
I am currently reading Advances in Financial Machine Learning by Marcos Lopez de Prado and the author emphasises examining the trained models before putting any faith in them - something I wholeheartedly agree with. Since interpreting models is important, Marcos put several methods of examining feature importances to the test in a hope to determine weaknesses and strengths of each method.
I have interest in examining tree-based models and I briefly talked about it in my previous posts 1, 2, and I have become an advocate for using the SHAP library for computing feature importances. Marcos examined permutation feature importance, mean impurity decrease and single-feature importances (where a classifier is trained on a single feature at a time), and determined that the first two do quite well: they rank feature that are really important higher than non-important features.
Unfortunately, SHAP is missing from his analysis, and I decided to replicate his test on a synthesised data for the library.
Creating a synthetic dataset
import pandas as pd
from sklearn.datasets import make_classification
n_samples = 10000
n_features = 40
n_informative = 10
n_redundant = 10
X_train, y_train = make_classification(n_samples=n_samples,
n_features=n_features,
n_informative=n_informative,
n_redundant=n_redundant,
shuffle=False)
col_names = [f'I_{i}' for i in range(n_informative)]
col_names += [f'R_{i}' for i in range(n_redundant)]
col_names += [f'N_{i}' for i in range(n_features - n_informative - n_redundant)]
df_train = pd.DataFrame(X_train, columns=col_names)
Following the methodology of Marcos I created a dataset with 10 informative
features I_*
, 10 redundant features (those are linear combinations of the
informative) R_*
and 20 non-informative N_*
. I then trained a LightGBM
classifier and computed the SHAP values. LightGBM
library computes SHAP
values without installing extra dependencies:
import numpy as np
from lightgbm import LGBMClassifier
classifier = LGBMClassifier()
classifier.fit(df_train, y_train)
shap_values = classifier.predict(df_train, pred_contrib=True)[:, :-1]
shap_feature = np.abs(shap_values).sum(axis=0)
shap_feature = shap_feature / shap_feature.sum()
feature_importances = pd.DataFrame(
{'SHAP importance': shap_feature, 'feature name': col_names})
In the above, I sum SHAP contributions for each row. Notice I had to take absolute value as contributions can be negative. Finally, lets plot the SHAP feature importances using Altair:
import altair as alt
base = alt.Chart(feature_importances)
bar = base.mark_bar().encode(
x='SHAP importance:Q',
y=alt.Y("feature name:O",
sort=alt.EncodingSortField(
field='SHAP importance',
order='descending')))
rule = base.mark_rule(color='red').encode(
x='mean(SHAP importance):Q')
(bar + rule).properties(width=630)
In the above bar chart we see that all informative and redundant features score higher than non-informative. This is a manifestation of consistency of SHAP values: more important features should score higher. The red line is the mean score. We see that certain informative and redundant features, specifically I_2, I_9, R_6, R_9 and R_5 are below the average. I don’t think that’s a particularly bad sign, although Marcos typically remarks on such observation as it being a negative aspect of the method.
Conclusion
So far nothing wrong with SHAP values detected. For complete treatment I am including mean impurity decrease (MID) and permutation (based on ROC AUC score) importances. I can remark that all 3 methods are in rough agreement, so perhaps this test isn’t very informative.
And the code:
from sklearn.metrics import roc_auc_score
# Permutations
base = roc_auc_score(y_train, classifier.predict_proba(X_train)[:,1])
importances = []
for i in range(X_train.shape[1]):
A = X_train.copy()
A[:, i] = np.random.shuffle(A[:, i])
proba = classifier.predict_proba(A)[:,1]
importances.append(base - roc_auc_score(y_train, proba))
importances = np.array(importances) / np.array(importances).sum()
perm_importances = pd.DataFrame(
{'permutation importance': importances, 'feature name': col_names})
# MID
mid = classifier.feature_importances_ / np.sum(classifier.feature_importances_)
mid_importances = pd.DataFrame(
{'MID importance': mid, 'feature name': col_names})