A vector with contributions of each feature to the prediction for every input object and the expected value of the model prediction for the object (average prediction given no knowledge about the object).
- is the contribution of the i-th feature.
- is the expected value of the model prediction.
For a given object the sum is equal to the prediction on this object.
This is an implementation of the Consistent Individualized Feature Attribution for Tree Ensembles approach.
See the ShapValues file format.
Use the SHAP package to plot the returned values.
The feature importance is calculated as follows for each feature :
- is the number of input features.
- is the set of all input features.
- is the set of non-zero feature indices (the features that are being observed and not unknown).
- is the model's prediction for the input , where is the expected value of the function conditioned on a subset S of the input features.
The complexity of computation depends on several conditions:
If the mean leaf count in the tree is less than the number of documents and trees are oblivious:
In all other cases:
samples_countis the number of documents in the dataset.
dimensionis the dimensionality for Multiclassification and Multiregression.
trees_countis the number of trees.
depthis the max depth of trees.
average_depthis the average depth of the trees.
leaves_in_treeis the number of leaves in the tree.
features_in_tree_countis the number of features in the tree.