Calculate feature importance

Execution format

catboost fstr [-m <model name>] [--input-path] <dataset> --fstr-type <output format>  [other parameters]

Options

--fstr-type

Description

The feature importance output format.

Possible values:

Default value

Required parameter

-m, --model-file, --model-path

Description

The name of the input file with the description of the model obtained as the result of training.

Default value

model.bin

--model-format

Description

The format of the input model.

Possible values:

  • CatboostBinary.
  • AppleCoreML (only datasets without categorical features are currently supported).
  • json (multiclassification models are not currently supported). Refer to the CatBoost JSON model tutorial for format details.

Default value

CatboostBinary

--input-path

Description

The name of the input file with the dataset description.

This parameter is required in the following cases:

  • The feature importance format is set to LossFunctionChange or ShapValues.
  • The feature impoertance format is set to PredictionValuesChange and the model does not contain information regarding the weight of leaves. All models trained with CatBoost version 0.9 or higher contain leaf weight information by default.

Default value

input.tsv

--column-description, --cd

Description

The path to the input file that contains the columns description.

This parameter is required in the following cases:

  • The feature importance format is set to LossFunctionChange or ShapValues.
  • The feature impoertance format is set to PredictionValuesChange and the model does not contain information regarding the weight of leaves. All models trained with CatBoost version 0.9 or higher contain leaf weight information by default.

Default value

If omitted, it is assumed that the first column in the file with the dataset description defines the label value, and the other columns are the values of numerical features.

--input-graph

Description

The path to the input file that contains the graph information for the dataset.

This information is used for calculation of Graph aggregated features.

Default value

None

-o, --output-path

Description

The path to the output file with data for feature analysis.

Default value

feature_strength.tsv

-T, --thread-count

Description

The number of threads to use for operation.

Optimizes the speed of execution. This parameter doesn't affect results.

Default value

The number of processor cores