CTR settings

simple_ctr

Description

Quantization settings for simple categorical features. Use this parameter to specify the principles for defining the class of the object for regression tasks. By default, it is considered that an object belongs to the positive class if its' label value is greater than the median of all label values of the dataset.

Format:

['CtrType[:TargetBorderCount=BorderCount][:TargetBorderType=BorderType][:CtrBorderCount=Count][:CtrBorderType=Type][:Prior=num_1/denum_1]..[:Prior=num_N/denum_N]',
 'CtrType[:TargetBorderCount=BorderCount][:TargetBorderType=BorderType][:CtrBorderCount=Count][:CtrBorderType=Type][:Prior=num_1/denum_1]..[:Prior=num_N/denum_N]',
  ...]

Components:

  • CtrType — The method for transforming categorical features to numerical features.

    Supported methods for training on CPU:

    • Borders
    • Buckets
    • BinarizedTargetMeanValue
    • Counter

    Supported methods for training on GPU:

    • Borders
    • Buckets
    • FeatureFreq
    • FloatTargetMeanValue
  • TargetBorderCount — The number of borders for label value quantization. Only used for regression problems. Allowed values are integers from 1 to 255 inclusively. The default value is 1.

    This option is available for training on CPU only.

  • TargetBorderType — The quantization type for the label value. Only used for regression problems.

    Possible values:

    • Median
    • Uniform
    • UniformAndQuantiles
    • MaxLogSum
    • MinEntropy
    • GreedyLogSum

    By default, MinEntropy.

    This option is available for training on CPU only.

  • CtrBorderCount — The number of splits for categorical features. Allowed values are integers from 1 to 255 inclusively.

  • CtrBorderType — The quantization type for categorical features.

    Supported values for training on CPU:

    • Uniform

    Supported values for training on GPU:

    • Median
    • Uniform
    • UniformAndQuantiles
    • MaxLogSum
    • MinEntropy
    • GreedyLogSum
  • Prior — Use the specified priors during training (several values can be specified).

    Possible formats:

    • One number — Adds the value to the numerator.
    • Two slash-delimited numbers (for GPU only) — Use this format to set a fraction. The number is added to the numerator and the second is added to the denominator.

Examples

  •   simple_ctr='Borders:TargetBorderCount=2'
    

Two new features with differing quantization settings are generated. The first one concludes that an object belongs to the positive class when the label value exceeds the first border. The second one concludes that an object belongs to the positive class when the label value exceeds the second border.

For example, if the label takes three different values (0, 1, 2), the first border is 0.5 while the second one is 1.5.

  •   simple_ctr='Buckets:TargetBorderCount=2'
    

The number of features depends on the number of different labels. For example, three new features are generated if the label takes three different values (0, 1, 2). In this case, the first one concludes that an object belongs to the positive class when the value of the feature is equal to 0 or belongs to the bucket indexed 0. The second one concludes that an object belongs to the positive class when the value of the feature is equal to 1 or belongs to the bucket indexed 1, and so on.

Type

string

Supported processing units

CPU and GPU

combinations_ctr

Description

Quantization settings for combinations of categorical features.

['CtrType[:TargetBorderCount=BorderCount][:TargetBorderType=BorderType][:CtrBorderCount=Count][:CtrBorderType=Type][:Prior=num_1/denum_1]..[:Prior=num_N/denum_N]',
 'CtrType[:TargetBorderCount=BorderCount][:TargetBorderType=BorderType][:CtrBorderCount=Count][:CtrBorderType=Type][:Prior=num_1/denum_1]..[:Prior=num_N/denum_N]',
  ...]

Components:

  • CtrType — The method for transforming categorical features to numerical features.

    Supported methods for training on CPU:

    • Borders
    • Buckets
    • BinarizedTargetMeanValue
    • Counter

    Supported methods for training on GPU:

    • Borders
    • Buckets
    • FeatureFreq
    • FloatTargetMeanValue
  • TargetBorderCount — The number of borders for label value quantization. Only used for regression problems. Allowed values are integers from 1 to 255 inclusively. The default value is 1.

    This option is available for training on CPU only.

  • TargetBorderType — The quantization type for the label value. Only used for regression problems.

    Possible values:

    • Median
    • Uniform
    • UniformAndQuantiles
    • MaxLogSum
    • MinEntropy
    • GreedyLogSum

    By default, MinEntropy.

    This option is available for training on CPU only.

  • CtrBorderCount — The number of splits for categorical features. Allowed values are integers from 1 to 255 inclusively.

  • CtrBorderType — The quantization type for categorical features.

    Supported values for training on CPU:

    • Uniform

    Supported values for training on GPU:

    • Uniform
    • Median
  • Prior — Use the specified priors during training (several values can be specified).

    Possible formats:

    • One number — Adds the value to the numerator.
    • Two slash-delimited numbers (for GPU only) — Use this format to set a fraction. The number is added to the numerator and the second is added to the denominator.

Type

string

Supported processing units

CPU and GPU

per_feature_ctr

Description

Per-feature quantization settings for categorical features.

['FeatureId:CtrType:[:TargetBorderCount=BorderCount][:TargetBorderType=BorderType][:CtrBorderCount=Count][:CtrBorderType=Type][:Prior=num_1/denum_1]..[:Prior=num_N/denum_N]',
 'FeatureId:CtrType:[:TargetBorderCount=BorderCount][:TargetBorderType=BorderType][:CtrBorderCount=Count][:CtrBorderType=Type][:Prior=num_1/denum_1]..[:Prior=num_N/denum_N]',
  ...]

Components:

  • FeatureId — A zero-based feature identifier.

Type

string

Supported processing units

CPU and GPU

ctr_target_border_count

Description

The maximum number of borders to use in target quantization for categorical features that need it. Allowed values are integers from 1 to 255 inclusively.

The value of the TargetBorderCount component overrides this parameter if it is specified for one of the following parameters:

  • simple_ctr
  • combinations_ctr
  • per_feature_ctr

Type

int

Default value

Number_of_classes - 1 for Multiclassification problems when training on CPU, 1 otherwise

Supported processing units

CPU and GPU

counter_calc_method

Description

The method for calculating the Counter CTR type.

Possible values:

  • SkipTest — Objects from the validation dataset are not considered at all
  • Full — All objects from both learn and validation datasets are considered

Type

string

Default value

None (SkipTest is used)

Supported processing units

CPU and GPU

max_ctr_complexity

Description

The maximum number of features that can be combined.

Each resulting combination consists of one or more categorical features and can optionally contain binary features in the following form: numeric feature > value.

Type

int

Default value

The default value depends on the processing unit type, combined features' type and the selected mode:

  • GPU for categorical features in MultiClass and MultiClassOneVsAll modes: 1
  • In all other cases: 4

Supported processing units

CPU and GPU

ctr_leaf_count_limit

Description

The maximum number of leaves with categorical features. If the quantity exceeds the specified value a part of leaves is discarded.

The leaves to be discarded are selected as follows:

  1. The leaves are sorted by the frequency of the values.
  2. The top N leaves are selected, where N is the value specified in the parameter.
  3. All leaves starting from N+1 are discarded.

This option reduces the resulting model size and the amount of memory required for training. Note that the resulting quality of the model can be affected.

Type

int

Default value

None

The number of different category values is not limited

Supported processing units

CPU

store_all_simple_ctr

Description

Ignore categorical features, which are not used in feature combinations, when choosing candidates for exclusion.

There is no point in using this parameter without the --ctr-leaf-count-limit for the Command-line version parameter.

Type

bool

Default value

None (set to False)

Both simple features and feature combinations are taken in account when limiting the number of leafs with categorical features

Supported processing units

CPU

final_ctr_computation_mode

Description

Final CTR computation mode.

Possible values:

  • Default — Compute final CTRs for learn and validation datasets.
  • Skip — Do not compute final CTRs for learn and validation datasets. In this case, the resulting model can not be applied. This mode decreases the size of the resulting model. It can be useful for research purposes when only the metric values have to be calculated.

Type

string

Default value

Default

Supported processing units

CPU and GPU