ShapValues
A vector with contributions of each feature to the prediction for every input object and the expected value of the model prediction for the object (average prediction given no knowledge about the object).
- is the contribution of the i-th feature.
- is the expected value of the model prediction.
For a given object the sum is equal to the prediction on this object.
This is an implementation of the Consistent Individualized Feature Attribution for Tree Ensembles approach.
See the ShapValues file format.
Use the SHAP package to plot the returned values.
Calculation principles
The feature importance is calculated as follows for each feature :
- is the number of input features.
- is the set of all input features.
- is the set of non-zero feature indices (the features that are being observed and not unknown).
- is the model's prediction for the input , where is the expected value of the function conditioned on a subset S of the input features.
Complexity of computation
The complexity of computation depends on several conditions:
-
If the mean leaf count in the tree is less than the number of documents and trees are oblivious:
-
In all other cases:
Used variables:
samples_count
is the number of documents in the dataset.dimension
is the dimensionality for Multiclassification and Multiregression.trees_count
is the number of trees.depth
is the max depth of trees.average_depth
is the average depth of the trees.leaves_in_tree
is the number of leaves in the tree.features_in_tree_count
is the number of features in the tree.
Related information
Detailed information regarding usage specifics for different Catboost implementations.