Bootstrap options
Regularization
To prevent overfitting, the weight of each training example is varied over steps of choosing different splits (not over scoring different candidates for one split) or different trees.
Speeding up
When building a new tree, CatBoost calculates a score for each of the numerous split candidates. The computation complexity of this procedure is
, where:
is the number of numerical features, each providing many split candidates.
is the number of examples.
Usually, this computation dominates over all other steps of each CatBoost iteration (see Table 1 in the paper). Hence, it seems appealing to speed up this procedure by using only a part of examples for scoring all the split candidates.
Bootstrap type | Description | Associated parameters |
---|---|---|
Bayesian | The weight of an example is set to the following value:
Note. The Bayesian bootstrap serves only for the regularization, not for speeding up. |
bagging_temperature Command-line version: --bagging-temperature Defines the settings of the Bayesian bootstrap. It is used by default in classification and regression modes. Use the Bayesian bootstrap to assign random weights to objects. The weights are sampled from exponential distribution if the value of this parameter is set to “1”. All weights are equal to 1 if the value of this parameter is set to “0”. Possible values are in the range This parameter can be used if the selected bootstrap type is Bayesian. sampling_unit Command-line version: --sampling-unit The sampling scheme. Possible values: |
Bernoulli | Corresponds to Stochastic Gradient Boosting (SGB, refer to the paper for details). Each example is independently sampled for choosing the current split with the probability defined by the subsample parameter. All the sampled examples have equal weights. Though SGB was originally proposed for regularization, it speeds up calculations almost |
subsample Command-line version: --bagging-temperature Sample rate for bagging. This parameter can be used if one of the following bootstrap types is selected: sampling_unit Command-line version: --sampling-unit The sampling scheme. Possible values: |
MVS (supported only on CPU) | Implements the importance sampling algorithm called Minimum Variance Sampling (MVS). Scoring of a split candidate is based on estimating of the expected gradient in each leaf (provided by this candidate), where the gradient
For this estimation, MVS samples the subsample examples Then, the estimate of the expected gradient is calculated as follows:
|
mvs_reg Command-line version: --mvs-reg Affects the weight of the denominator and can be used for balancing between the importance and Bernoulli sampling (setting it to 0 implies importance sampling and to sampling_unit Command-line version: --sampling-unit The sampling scheme. Possible values: |
Poisson (refer to the paper for details; supported only on GPU) | The weights of examples are i.i.d. sampled from the Poisson distribution with the parameter |
subsample Command-line version: --bagging-temperature Sample rate for bagging. This parameter can be used if one of the following bootstrap types is selected: sampling_unit Command-line version: --sampling-unit The sampling scheme. Possible values: |
No | All training examples are used with equal weights. |
sampling_unit Command-line version: --sampling-unit The sampling scheme. Possible values: |
Bootstrap type | Description | Associated parameters |
---|---|---|
Bayesian | The weight of an example is set to the following value:
Note. The Bayesian bootstrap serves only for the regularization, not for speeding up. |
bagging_temperature Command-line version: --bagging-temperature Defines the settings of the Bayesian bootstrap. It is used by default in classification and regression modes. Use the Bayesian bootstrap to assign random weights to objects. The weights are sampled from exponential distribution if the value of this parameter is set to “1”. All weights are equal to 1 if the value of this parameter is set to “0”. Possible values are in the range This parameter can be used if the selected bootstrap type is Bayesian. sampling_unit Command-line version: --sampling-unit The sampling scheme. Possible values: |
Bernoulli | Corresponds to Stochastic Gradient Boosting (SGB, refer to the paper for details). Each example is independently sampled for choosing the current split with the probability defined by the subsample parameter. All the sampled examples have equal weights. Though SGB was originally proposed for regularization, it speeds up calculations almost |
subsample Command-line version: --bagging-temperature Sample rate for bagging. This parameter can be used if one of the following bootstrap types is selected: sampling_unit Command-line version: --sampling-unit The sampling scheme. Possible values: |
MVS (supported only on CPU) | Implements the importance sampling algorithm called Minimum Variance Sampling (MVS). Scoring of a split candidate is based on estimating of the expected gradient in each leaf (provided by this candidate), where the gradient
For this estimation, MVS samples the subsample examples Then, the estimate of the expected gradient is calculated as follows:
|
mvs_reg Command-line version: --mvs-reg Affects the weight of the denominator and can be used for balancing between the importance and Bernoulli sampling (setting it to 0 implies importance sampling and to sampling_unit Command-line version: --sampling-unit The sampling scheme. Possible values: |
Poisson (refer to the paper for details; supported only on GPU) | The weights of examples are i.i.d. sampled from the Poisson distribution with the parameter |
subsample Command-line version: --bagging-temperature Sample rate for bagging. This parameter can be used if one of the following bootstrap types is selected: sampling_unit Command-line version: --sampling-unit The sampling scheme. Possible values: |
No | All training examples are used with equal weights. |
sampling_unit Command-line version: --sampling-unit The sampling scheme. Possible values: |
The frequency of resampling and reweighting is defined by the sampling_frequency parameter:
- PerTree — Before constructing each new tree
- PerTreeLevel — Before choosing each new split of a tree
It is recommended to use MVS when speeding up is an important issue and regularization is not. It is usually the case when operating large data. For regularization, other options might be more appropriate.
Related papers
- Estimating Uncertainty for Massive Data Streams
-
N. Chamandy, O. Muralidharan, A. Najmi, and S. Naid, 2012
- Stochastic gradient boosting
-
J. H. Friedman
Computational Statistics & Data Analysis, 38(4):367–378, 2002
- Training Deep Models Faster with Robust, Approximate Importance Sampling
-
T. B. Johnson and C. Guestrin
In Advances in Neural Information Processing Systems, pages 7276–7286, 2018.
- Lightgbm: A highly efficient gradient boosting decision tree
-
G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.-Y. Liu..
In Advances in Neural Information Processing Systems, pages 3146–3154, 2017.
- CatBoost: unbiased boosting with categorical features
-
Liudmila Prokhorenkova, Gleb Gusev, Aleksandr Vorobev, Anna Veronika Dorogush, Andrey Gulin. NeurIPS, 2018
NeurIPS 2018 paper with explanation of Ordered boosting principles and ordered categorical features statistics.