get_feature_importance
Calculate and return the feature importances.
Method call format
get_feature_importance(data=None,
reference_data=None,
type=EFstrType.FeatureImportance,
prettified=False,
thread_count=-1,
verbose=False,
log_cout=sys.stdout,
log_cerr=sys.stderr)
Parameters
data
Description
The dataset for feature importance calculation.
The required dataset depends on the selected feature importance calculation type (specified in the type
parameter):
- PredictionValuesChange — Either None or the same dataset that was used for training if the model does not contain information regarding the weight of leaves. All models trained with CatBoost version 0.9 or higher contain leaf weight information by default.
- LossFunctionChange — Any dataset. Feature importances are calculated on a subset for large datasets.
- PredictionDiff — A list of object pairs.
Possible types
catboost.Pool
Default value
Required parameter for the LossFunctionChange and ShapValues type of feature importances and in case the model does not contain information regarding the weight of leaves.
reference_data
Description
Reference data for Independent Tree SHAP values from Explainable AI for Trees: From Local Explanations to Global Understanding. If type
is ShapValues
and reference_data
is not None
, then Independent Tree SHAP values are calculated.
Possible types
catboost.Pool
Default value
None
type
Alias: fstr_type
(deprecated, use type instead)
Description
The type of feature importance to calculate.
Possible values:
- FeatureImportance: Equal to PredictionValuesChange for non-ranking metrics and LossFunctionChange for ranking metrics (the value is determined automatically).
- ShapValues: A vector with contributions of each feature to the prediction for every input object and the expected value of the model prediction for the object (average prediction given no knowledge about the object).
- Interaction: The value of the feature interaction strength for each pair of features.
- PredictionDiff: A vector with contributions of each feature to the RawFormulaVal difference for each pair of objects.
Possible types
catboost.Pool
Default value
FeatureImportance
prettified
Description
Return the feature importances as a list of the following pairs sorted by feature importance:
(feature_id, feature importance)
Should be used if one of the following values of the typeparameter is selected:
- PredictionValuesChange
- PredictionValuesChange
Possible types
bool
Default value
False
thread_count
Description
The number of threads to calculate feature importance.
Optimizes the speed of execution. This parameter doesn't affect results.
Possible types int
Default value
-1 (the number of threads is equal to the number of processor cores)
verbose
Description
The purpose of this parameter depends on the type of the given value:
- bool — Output progress to stdout.
Works with the ShapValues type of feature importance calculation. - int — The logging period.
Possible types
- bool
- int
Default value
False
log_cout
Output stream or callback for logging.
Possible types
- callable Python object
- python object providing the
write()
method
Default value
sys.stdout
log_cerr
Error stream or callback for logging.
Possible types
- callable Python object
- python object providing the
write()
method
Default value
sys.stderr
Type of return value
Depends on the selected feature strength calculation method:
- PredictionValuesChange, LossFunctionChange or PredictionValuesChange with the
prettified
parameter set toFalse
: a list of length[n_features]
with float feature importances values for each feature - PredictionValuesChange or LossFunctionChange with the
prettified
parameter set toTrue
: a list of length[n_features]
with(feature_id (string), feature_importance (float))
pairs, sorted by feature importance values in descending order - ShapValues: np.array of shape
(n_objects, n_features + 1)
with float ShapValues for each(object, feature)
- Interaction: list of length [ n_features] of three element lists of
(first_feature_index, second_feature_index, interaction_score (float))