Calculate object importance
Purpose
- Positive values reflect that the optimized metric increases.
- Negative values reflect that the optimized metric decreases.
This mode is an implementation of the approach described in the Finding Influential Training Samples for Gradient Boosted Decision Trees paper .
Execution format
catboost ostr [optional parameters]
Options
Option | Description | Default value |
---|---|---|
-m --model-file --model-path | The name of the input file with the description of the model obtained as the result of training. | model.bin |
--model-format | The format of the input model. Possible values:
| CatboostBinary |
-f --learn-set | The path to the input file that contains the dataset description. Format:
| Required parameter (the path must be specified). |
-t --test-set | The path to the input file that contains the validation dataset description (the format must be the same as used in the training dataset). | Required parameter |
--cd --column-description | The path to the input file that contains the columns description. | If omitted, it is assumed that the first column in the file with the dataset description defines the label value, and the other columns are the values of numerical features. |
-o --output-path | The path to the output file with calculated object importances. | object_importances.tsv |
-T --thread-count | The number of threads to use during the training. Optimizes the speed of execution. This parameter doesn't affect results. | The number of processor cores |
--delimiter | The delimiter character used to separate the data in the dataset description input file. Only single char delimiters are supported. If the specified value contains more than one character, only the first one is used. Note. Used only if the dataset is given in the Delimiter-separated values format. | The input data is assumed to be tab-separated |
--has-header | Read the column names from the first line of the dataset description file if this parameter is set. Note. Used only if the dataset is given in the Delimiter-separated values format. | False (the first line is supposed to have the same data as the rest of them) |
--update-method | The algorithm accuracy method. Possible values:
Supported parameters:
For example, the following value sets the method to TopKLeaves and limits the number of leaves to 3:
| SinglePoint |
Option | Description | Default value |
---|---|---|
-m --model-file --model-path | The name of the input file with the description of the model obtained as the result of training. | model.bin |
--model-format | The format of the input model. Possible values:
| CatboostBinary |
-f --learn-set | The path to the input file that contains the dataset description. Format:
| Required parameter (the path must be specified). |
-t --test-set | The path to the input file that contains the validation dataset description (the format must be the same as used in the training dataset). | Required parameter |
--cd --column-description | The path to the input file that contains the columns description. | If omitted, it is assumed that the first column in the file with the dataset description defines the label value, and the other columns are the values of numerical features. |
-o --output-path | The path to the output file with calculated object importances. | object_importances.tsv |
-T --thread-count | The number of threads to use during the training. Optimizes the speed of execution. This parameter doesn't affect results. | The number of processor cores |
--delimiter | The delimiter character used to separate the data in the dataset description input file. Only single char delimiters are supported. If the specified value contains more than one character, only the first one is used. Note. Used only if the dataset is given in the Delimiter-separated values format. | The input data is assumed to be tab-separated |
--has-header | Read the column names from the first line of the dataset description file if this parameter is set. Note. Used only if the dataset is given in the Delimiter-separated values format. | False (the first line is supposed to have the same data as the rest of them) |
--update-method | The algorithm accuracy method. Possible values:
Supported parameters:
For example, the following value sets the method to TopKLeaves and limits the number of leaves to 3:
| SinglePoint |