Select features

Purpose

Select the best features and drop harmful features from the dataset.

Execution format

catboost select-features -f <file path> --features-for-select <comma-separated indices or names> --num-features-to-select <integer>  [optional parameters]

Options

Option Description Default value Supported processing units
--features-for-select

Features which participate in the selection. The following formats are supported: indices, names, index ranges, name ranges. Values are separated by commas, for example: 0,3,5,6,10-15,City,Player1-Player11.

Required parameter CPU and GPU
--num-features-to-select

The number of features to select from the option --features-for-select.

Required parameter CPU and GPU
--features-selection-steps

The number of times for training the model. Use more steps for more accurate selection.

1

CPU and GPU
--features-selection-algorithm

The main algorithm is Recursive Feature Elimination with variable feature importance calculation method:

  • RecursiveByPredictionValuesChange — the fastest algorithm and the least accurate method (not recommended for ranking losses).
  • RecursiveByLossFunctionChange — the optimal option according to accuracy/speed balance.
  • RecursiveByShapValues — the most accurate method.

RecursiveByShapValues

CPU and GPU
--shap-calc-type

The method of the SHAP values calculations ordered by accuracy:

  • Approximate
  • Regular
  • Exact

Used in RFE based on LossFunctionChange and ShapValues.

Regular

CPU and GPU
--train-final-model

If specified, then the model with selected features will be trained and saved to --model-file.

False

CPU and GPU
--features-selection-result-path

Path to the file with selection results in the JSON format.

selection_result.json

CPU and GPU
Option Description Default value Supported processing units
--features-for-select

Features which participate in the selection. The following formats are supported: indices, names, index ranges, name ranges. Values are separated by commas, for example: 0,3,5,6,10-15,City,Player1-Player11.

Required parameter CPU and GPU
--num-features-to-select

The number of features to select from the option --features-for-select.

Required parameter CPU and GPU
--features-selection-steps

The number of times for training the model. Use more steps for more accurate selection.

1

CPU and GPU
--features-selection-algorithm

The main algorithm is Recursive Feature Elimination with variable feature importance calculation method:

  • RecursiveByPredictionValuesChange — the fastest algorithm and the least accurate method (not recommended for ranking losses).
  • RecursiveByLossFunctionChange — the optimal option according to accuracy/speed balance.
  • RecursiveByShapValues — the most accurate method.

RecursiveByShapValues

CPU and GPU
--shap-calc-type

The method of the SHAP values calculations ordered by accuracy:

  • Approximate
  • Regular
  • Exact

Used in RFE based on LossFunctionChange and ShapValues.

Regular

CPU and GPU
--train-final-model

If specified, then the model with selected features will be trained and saved to --model-file.

False

CPU and GPU
--features-selection-result-path

Path to the file with selection results in the JSON format.

selection_result.json

CPU and GPU

The other options are the same as in Train a model mode.

Usage examples

catboost select-features --learn-set train.csv --test-set test.csv --column-description train.cd --loss-function RMSE --iterations 100 --features-for-select 0-99 --num-features-to-select 10 --features-selection-steps 3 --features-selection-algorithm RecursiveByShapValues --train-final-model