Overfitting detector

If overfitting occurs, CatBoost can stop the training earlier than the training parameters dictate. For example, it can be stopped before the specified number of trees are built. This option is set in the starting parameters.

The following overfitting detection methods are supported:

IncToDec

Before building each new tree, CatBoost checks the resulting loss change on the validation dataset. The overfit detector is triggered if the ThresholdThreshold value set in the starting parameters is greater than CurrentPValueCurrentPValue:

CurrentPValue<ThresholdCurrentPValue < Threshold

How CurrentPValueCurrentPValue is calculated from a set of values for the maximizing metric score[i]score[i]:

  1. ExpectedIncExpectedInc is calculated:

    ExpectedInc=maxi1i2i0.99ii1(score[i2]score[i1])ExpectedInc = max_{i_{1} \leq i_{2} \leq i } 0.99^{i - i_{1}} \cdot (score[i_{2}] - score[i_{1}])

  2. xx is calculated:

    x=ExpectedInc[i]maxjiscore[j]score[i]x = \frac{ExpectedInc[i]}{max_{j \leq i} { } score[j] - score[i]}

  3. CurrentPValueCurrentPValue is calculated:

    CurrentPValue=exp(0.5x)CurrentPValue = exp \left(- \frac{0.5}{x}\right)

Iter

Before building each new tree, CatBoost checks the number of iterations since the iteration with the optimal loss function value.

The model is considered overfitted if the number of iterations exceeds the value specified in the training parameters.

In this article: