get_feature_importance

Calculate and return the feature importances.

Method call format

get_feature_importance(data=None,
                       reference_data=None,
                       type=EFstrType.FeatureImportance,
                       prettified=False,
                       thread_count=-1,
                       verbose=False,
                       log_cout=sys.stdout,
                       log_cerr=sys.stderr)

Parameters

data

Description

The dataset for feature importance calculation.

The required dataset depends on the selected feature importance calculation type (specified in the type parameter):

  • PredictionValuesChange — Either None or the same dataset that was used for training if the model does not contain information regarding the weight of leaves. All models trained with CatBoost version 0.9 or higher contain leaf weight information by default.
  • LossFunctionChange — Any dataset. Feature importances are calculated on a subset for large datasets.
  • PredictionDiff — A list of object pairs.

Possible types

catboost.Pool

Default value

Required parameter for the LossFunctionChange and ShapValues type of feature importances and in case the model does not contain information regarding the weight of leaves.

None otherwise.

reference_data

Description

Reference data for Independent Tree SHAP values from Explainable AI for Trees: From Local Explanations to Global Understanding. If type is ShapValues and reference_data is not None, then Independent Tree SHAP values are calculated.

Possible types

catboost.Pool

Default value

None

type

Alias:fstr_type (deprecated, use type instead)

Description

The type of feature importance to calculate.

Possible values:

  • FeatureImportance: Equal to PredictionValuesChange for non-ranking metrics and LossFunctionChange for ranking metrics (the value is determined automatically).

  • ShapValues: A vector vv with contributions of each feature to the prediction for every input object and the expected value of the model prediction for the object (average prediction given no knowledge about the object).

  • Interaction: The value of the feature interaction strength for each pair of features.

  • PredictionDiff: A vector with contributions of each feature to the RawFormulaVal difference for each pair of objects.

Possible types

Note

It is recommended to use EFStrType for this parameter.

Default value

FeatureImportance

prettified

Description

Return the feature importances as a list of the following pairs sorted by feature importance:

(feature_id, feature importance)

Should be used if one of the following values of the typeparameter is selected:

  • PredictionValuesChange
  • PredictionValuesChange

Possible types

bool

Default value

False

thread_count

Description

The number of threads to use for operation.

Optimizes the speed of execution. This parameter doesn't affect results.

Possible types

int

Default value

-1 (the number of threads is equal to the number of processor cores)

verbose

Description

The purpose of this parameter depends on the type of the given value:

  • bool — Output progress to stdout.

    Works with the ShapValues type of feature importance calculation.

  • int — The logging period.

Possible types

  • bool
  • int

Default value

False

log_cout

Output stream or callback for logging.

Possible types

  • callable Python object
  • python object providing the write() method

Default value

sys.stdout

log_cerr

Error stream or callback for logging.

Possible types

  • callable Python object
  • python object providing the write() method

Default value

sys.stderr

Type of return value

Depends on the selected feature strength calculation method:

  • PredictionValuesChange, LossFunctionChange or PredictionValuesChange with the prettified parameter set to False: a list of length [n_features] with float feature importances values for each feature
  • PredictionValuesChange or LossFunctionChange with the prettified parameter set to True: a list of length [n_features] with (feature_id (string), feature_importance (float)) pairs, sorted by feature importance values in descending order
  • ShapValues: np.array of shape (n_objects, n_features + 1) with float ShapValues for each (object, feature)
  • Interaction: list of length [ n_features] of three element lists of (first_feature_index, second_feature_index, interaction_score (float))