Score functions

The common approach to solve supervised learning tasks is to minimize the loss function :

  • is the value of the loss function at the point
  • is the weight of the -th object
  • is the regularization term.
For example, these formulas take the following form for linear regression:
  • (mean squared error)
  • (L2 regularization)

Gradient boosting

Boosting is a method which builds a prediction model as an ensemble of weak learners .

In our case, is a decision tree. Trees are built sequentially and each next tree is built to approximate negative gradients of the loss function at predictions of the current ensemble:

Thus, it performs a gradient descent optimization of the function . The quality of the gradient approximation is measured by a score function .

Types of score functions

Let's suppose that it is required to add a new tree to the ensemble. A score function is required in order to choose between candidate trees. Given a candidate tree let denote ,  — the weight of -th object, and – the corresponding gradient of . Let’s consider the following score functions:

Finding the optimal tree structure

Let's suppose that it is required to find the structure for the tree of depth 1. The structure of such tree is determined by the index of some feature and a border value . Let be the value of the -th feature on the -th object and and be the values at leafs of . Then, equals to if and if . Now the goal is to find the best and in terms of the chosen score function.

For the L2 score function the formula takes the following form:

Let's denote and .

The optimal values for and are the weighted averages:

After expanding brackets and removing terms, which are constant in the optimization:

The latter argmax can be calculated by brute force search.

The L2 score function can be calculated for each leaf separately. This is not true for Cosine. The restrictions imposed on the growing policy are shown below.

The situation is slightly more complex when the tree depth is bigger than 1. In this case, it is required to consider the growing policy of the tree:
  • SymmetricTree or Depthwise — S is calculated over all leaves:

  • Lossguide — The leaves are updated one by one. Therefore, it is required to compute the gain of a split — the change of the score function after the split. The gain is calculated as follows:

Second-order score functions

Let's apply the Taylor expansion to the loss function at the point :

  • is the l2 regularization parameter

Since the first term is constant in optimization, the formula takes the following form after regrouping by leaves:

So, the optimal value of is:

The summation is over such that the object gets to the considered leaf. Then these optimal values of can be used instead of weighted averages of gradients ( and in the example above) in the same score functions.

CatBoost score functions

CatBoost provides the following score functions:

Score function Description
L2

Use the first derivatives during the calculation.

Cosine (can not be used with the Lossguide tree growing policy)
NewtonL2

Use the second derivatives during the calculation. This may improve the resulting quality of the model.

NewtonCosine (can not be used with the Lossguide tree growing policy)
LOOL2

Provide different heuristics of the L2 estimates

SolarL2
SatL2
Score function Description
L2

Use the first derivatives during the calculation.

Cosine (can not be used with the Lossguide tree growing policy)
NewtonL2

Use the second derivatives during the calculation. This may improve the resulting quality of the model.

NewtonCosine (can not be used with the Lossguide tree growing policy)
LOOL2

Provide different heuristics of the L2 estimates

SolarL2
SatL2

Usage

Use the corresponding parameter to set the score function during the training:

Restriction.

The supported score functions vary depending on the processing unit type:

  • GPU — All score types

  • CPU — Cosine, L2

Python package R package Command-line interface Description
score_function score_function --score-function

The score type used to select the next split during the tree construction.

Possible values:

  • Cosine (do not use this score type with the Lossguide tree growing policy)
  • L2
  • LOOL2
  • NewtonCosine (do not use this score type with the Lossguide tree growing policy)
  • NewtonL2
  • SatL2
  • SolarL2
Python package R package Command-line interface Description
score_function score_function --score-function

The score type used to select the next split during the tree construction.

Possible values:

  • Cosine (do not use this score type with the Lossguide tree growing policy)
  • L2
  • LOOL2
  • NewtonCosine (do not use this score type with the Lossguide tree growing policy)
  • NewtonL2
  • SatL2
  • SolarL2