CTR settings
simple_ctr
Description
Quantization settings for simple categorical features. Use this parameter to specify the principles for defining the class of the object for regression tasks. By default, it is considered that an object belongs to the positive class if its' label value is greater than the median of all label values of the dataset.
Format:
['CtrType[:TargetBorderCount=BorderCount][:TargetBorderType=BorderType][:CtrBorderCount=Count][:CtrBorderType=Type][:Prior=num_1/denum_1]..[:Prior=num_N/denum_N]',
'CtrType[:TargetBorderCount=BorderCount][:TargetBorderType=BorderType][:CtrBorderCount=Count][:CtrBorderType=Type][:Prior=num_1/denum_1]..[:Prior=num_N/denum_N]',
...]
Components:
-
CtrType
— The method for transforming categorical features to numerical features.Supported methods for training on CPU:
- Borders
- Buckets
- BinarizedTargetMeanValue
- Counter
Supported methods for training on GPU:
- Borders
- Buckets
- FeatureFreq
- FloatTargetMeanValue
-
TargetBorderCount
— The number of borders for label value quantization. Only used for regression problems. Allowed values are integers from 1 to 255 inclusively. The default value is 1.This option is available for training on CPU only.
-
TargetBorderType
— The quantization type for the label value. Only used for regression problems.Possible values:
- Median
- Uniform
- UniformAndQuantiles
- MaxLogSum
- MinEntropy
- GreedyLogSum
By default, MinEntropy.
This option is available for training on CPU only.
-
CtrBorderCount
— The number of splits for categorical features. Allowed values are integers from 1 to 255 inclusively. -
CtrBorderType
— The quantization type for categorical features.Supported values for training on CPU:
- Uniform
Supported values for training on GPU:
- Median
- Uniform
- UniformAndQuantiles
- MaxLogSum
- MinEntropy
- GreedyLogSum
-
Prior
— Use the specified priors during training (several values can be specified).Possible formats:
- One number — Adds the value to the numerator.
- Two slash-delimited numbers (for GPU only) — Use this format to set a fraction. The number is added to the numerator and the second is added to the denominator.
Examples
-
simple_ctr='Borders:TargetBorderCount=2'
Two new features with differing quantization settings are generated. The first one concludes that an object belongs to the positive class when the label value exceeds the first border. The second one concludes that an object belongs to the positive class when the label value exceeds the second border.
For example, if the label takes three different values (0, 1, 2), the first border is 0.5 while the second one is 1.5.
-
simple_ctr='Buckets:TargetBorderCount=2'
The number of features depends on the number of different labels. For example, three new features are generated if the label takes three different values (0, 1, 2). In this case, the first one concludes that an object belongs to the positive class when the value of the feature is equal to 0 or belongs to the bucket indexed 0. The second one concludes that an object belongs to the positive class when the value of the feature is equal to 1 or belongs to the bucket indexed 1, and so on.
Type
string
Supported processing units
CPU and GPU
combinations_ctr
Description
Quantization settings for combinations of categorical features.
['CtrType[:TargetBorderCount=BorderCount][:TargetBorderType=BorderType][:CtrBorderCount=Count][:CtrBorderType=Type][:Prior=num_1/denum_1]..[:Prior=num_N/denum_N]',
'CtrType[:TargetBorderCount=BorderCount][:TargetBorderType=BorderType][:CtrBorderCount=Count][:CtrBorderType=Type][:Prior=num_1/denum_1]..[:Prior=num_N/denum_N]',
...]
Components:
-
CtrType
— The method for transforming categorical features to numerical features.Supported methods for training on CPU:
- Borders
- Buckets
- BinarizedTargetMeanValue
- Counter
Supported methods for training on GPU:
- Borders
- Buckets
- FeatureFreq
- FloatTargetMeanValue
-
TargetBorderCount
— The number of borders for label value quantization. Only used for regression problems. Allowed values are integers from 1 to 255 inclusively. The default value is 1.This option is available for training on CPU only.
-
TargetBorderType
— The quantization type for the label value. Only used for regression problems.Possible values:
- Median
- Uniform
- UniformAndQuantiles
- MaxLogSum
- MinEntropy
- GreedyLogSum
By default, MinEntropy.
This option is available for training on CPU only.
-
CtrBorderCount
— The number of splits for categorical features. Allowed values are integers from 1 to 255 inclusively. -
CtrBorderType
— The quantization type for categorical features.Supported values for training on CPU:
- Uniform
Supported values for training on GPU:
- Uniform
- Median
-
Prior
— Use the specified priors during training (several values can be specified).Possible formats:
- One number — Adds the value to the numerator.
- Two slash-delimited numbers (for GPU only) — Use this format to set a fraction. The number is added to the numerator and the second is added to the denominator.
Type
string
Supported processing units
CPU and GPU
per_feature_ctr
Description
Per-feature quantization settings for categorical features.
['FeatureId:CtrType:[:TargetBorderCount=BorderCount][:TargetBorderType=BorderType][:CtrBorderCount=Count][:CtrBorderType=Type][:Prior=num_1/denum_1]..[:Prior=num_N/denum_N]',
'FeatureId:CtrType:[:TargetBorderCount=BorderCount][:TargetBorderType=BorderType][:CtrBorderCount=Count][:CtrBorderType=Type][:Prior=num_1/denum_1]..[:Prior=num_N/denum_N]',
...]
Components:
FeatureId
— A zero-based feature identifier.
Type
string
Supported processing units
CPU and GPU
ctr_target_border_count
Description
The maximum number of borders to use in target quantization for categorical features that need it. Allowed values are integers from 1 to 255 inclusively.
The value of the TargetBorderCount
component overrides this parameter if it is specified for one of the following parameters:
simple_ctr
combinations_ctr
per_feature_ctr
Type
int
Default value
Number_of_classes - 1 for Multiclassification problems when training on CPU, 1 otherwise
Supported processing units
CPU and GPU
counter_calc_method
Description
The method for calculating the Counter CTR type.
Possible values:
- SkipTest — Objects from the validation dataset are not considered at all
- Full — All objects from both learn and validation datasets are considered
Type
string
Default value
None (SkipTest is used)
Supported processing units
CPU and GPU
max_ctr_complexity
Description
The maximum number of features that can be combined.
Each resulting combination consists of one or more categorical features and can optionally contain binary features in the following form: numeric feature > value
.
Type
int
Default value
The default value depends on the processing unit type, combined features' type and the selected mode:
- GPU for categorical features in MultiClass and MultiClassOneVsAll modes: 1
- In all other cases: 4
Supported processing units
CPU and GPU
ctr_leaf_count_limit
Description
The maximum number of leaves with categorical features. If the quantity exceeds the specified value a part of leaves is discarded.
The leaves to be discarded are selected as follows:
- The leaves are sorted by the frequency of the values.
- The top
N
leaves are selected, where N is the value specified in the parameter. - All leaves starting from
N+1
are discarded.
This option reduces the resulting model size and the amount of memory required for training. Note that the resulting quality of the model can be affected.
Type
int
Default value
None
The number of different category values is not limited
Supported processing units
CPU
store_all_simple_ctr
Description
Ignore categorical features, which are not used in feature combinations, when choosing candidates for exclusion.
There is no point in using this parameter without the --ctr-leaf-count-limit
for the Command-line version parameter.
Type
bool
Default value
None (set to False)
Both simple features and feature combinations are taken in account when limiting the number of leafs with categorical features
Supported processing units
CPU
final_ctr_computation_mode
Description
Final CTR computation mode.
Possible values:
- Default — Compute final CTRs for learn and validation datasets.
- Skip — Do not compute final CTRs for learn and validation datasets. In this case, the resulting model can not be applied. This mode decreases the size of the resulting model. It can be useful for research purposes when only the metric values have to be calculated.
Type
string
Default value
Default
Supported processing units
CPU and GPU