Select features
Purpose
Select the best features and drop harmful features from the dataset.
Execution format
catboost select-features -f <file path> --features-for-select <comma-separated indices or names> --num-features-to-select <integer> [optional parameters]
Options
Except for the options below, the others are the same as in Train a model mode.
--features-for-select
Description
Features which participate in the selection. The following formats are supported: indices, names, index ranges, name ranges. Values are separated by commas, for example: 0,3,5,6,10-15,City,Player1-Player11
.
Default value
Required parameter
Supported processing units
CPU and GPU
--num-features-to-select
Description
The number of features to select from the option --features-for-select
.
Default value
Required parameter
Supported processing units
CPU and GPU
--features-selection-steps
Description
The number of times for training the model. Use more steps for more accurate selection.
Default value
1
Supported processing units
CPU and GPU
--features-selection-algorithm
Description
The main algorithm is Recursive Feature Elimination with variable feature importance calculation method:
RecursiveByPredictionValuesChange
— the fastest algorithm and the least accurate method (not recommended for ranking losses).RecursiveByLossFunctionChange
— the optimal option according to accuracy/speed balance.RecursiveByShapValues
— the most accurate method.
Default value
RecursiveByShapValues
Supported processing units
CPU and GPU
--shap-calc-type
Description
The method of the SHAP values calculations ordered by accuracy:
Approximate
Regular
Exact
Used in RFE based on LossFunctionChange and ShapValues.
Default value
Regular
Supported processing units
CPU and GPU
--train-final-model
Description
If specified, then the model with selected features will be trained and saved to --model-file
.
Default value
False
Supported processing units
CPU and GPU
--features-selection-result-path
Description
Path to the file with selection results in the JSON format.
Default value
selection_result.json
Supported processing units
CPU and GPU
Usage examples
catboost select-features --learn-set train.csv --test-set test.csv --column-description train.cd --loss-function RMSE --iterations 100 --features-for-select 0-99 --num-features-to-select 10 --features-selection-steps 3 --features-selection-algorithm RecursiveByShapValues --train-final-model