Select features

Purpose

Select the best features and drop harmful features from the dataset.

Execution format

catboost select-features -f <file path> --features-for-select <comma-separated indices or names> --num-features-to-select <integer>  [optional parameters]

Options

Except for the options below, the others are the same as in Train a model mode.

--features-for-select

Description

Features which participate in the selection. The following formats are supported: indices, names, index ranges, name ranges. Values are separated by commas, for example: 0,3,5,6,10-15,City,Player1-Player11.

Default value

Required parameter

Supported processing units

CPU and GPU

--num-features-to-select

Description

The number of features to select from the option --features-for-select.

Default value

Required parameter

Supported processing units

CPU and GPU

--features-selection-steps

Description

The number of times for training the model. Use more steps for more accurate selection.

Default value

1

Supported processing units

CPU and GPU

--features-selection-algorithm

Description

The main algorithm is Recursive Feature Elimination with variable feature importance calculation method:

  • RecursiveByPredictionValuesChange — the fastest algorithm and the least accurate method (not recommended for ranking losses).
  • RecursiveByLossFunctionChange — the optimal option according to accuracy/speed balance.
  • RecursiveByShapValues — the most accurate method.

Default value

RecursiveByShapValues

Supported processing units

CPU and GPU

--shap-calc-type

Description

The method of the SHAP values calculations ordered by accuracy:

  • Approximate
  • Regular
  • Exact

Used in RFE based on LossFunctionChange and ShapValues.

Default value

Regular

Supported processing units

CPU and GPU

--train-final-model

Description

If specified, then the model with selected features will be trained and saved to --model-file.

Default value

False

Supported processing units

CPU and GPU

--features-selection-result-path

Description

Path to the file with selection results in the JSON format.

Default value

selection_result.json

Supported processing units

CPU and GPU

Usage examples

catboost select-features --learn-set train.csv --test-set test.csv --column-description train.cd --loss-function RMSE --iterations 100 --features-for-select 0-99 --num-features-to-select 10 --features-selection-steps 3 --features-selection-algorithm RecursiveByShapValues --train-final-model