Score functions
The common approach to solve supervised learning tasks is to minimize the loss function :
is the value of the loss function at the point
is the weight of the
-th object
is the regularization term.
(mean squared error)
(L2 regularization)
Gradient boosting
Boosting is a method which builds a prediction model as an ensemble of weak learners
.
In our case, is a decision tree. Trees are built sequentially and each next tree is built to approximate negative gradients
of the loss function
at predictions of the current ensemble:
Thus, it performs a gradient descent optimization of the function . The quality of the gradient approximation is measured by a score function
.
Types of score functions
Finding the optimal tree structure
Let's suppose that it is required to find the structure for the tree of depth 1. The structure of such tree is determined by the index
of some feature and a border value
. Let
be the value of the
-th feature on the
-th object and
and
be the values at leafs of
. Then,
equals to
if
and
if
. Now the goal is to find the best
and
in terms of the chosen score function.
For the L2 score function the formula takes the following form:
Let's denote and
.
After expanding brackets and removing terms, which are constant in the optimization:
The latter argmax can be calculated by brute force search.
- L2 score function: S is converted into a sum over leaves
. The next step is to find
, where
are the optimal values in leaves after the
split.
- Depthwise and Lossguide methods:
are sets of
.
stands for the index of the leaf, therefore the score function
takes the following form:
. Since
is a convex function, different
and
(splits for different leaves) can be searched separately by finding the optimal
.
- SymmetricTree method: The same
are attempted to be found for each leaf, thus it's required to optimize the total sum over all leaves
.
Second-order score functions
Let's apply the Taylor expansion to the loss function at the point :
is the l2 regularization parameter
Since the first term is constant in optimization, the formula takes the following form after regrouping by leaves:
So, the optimal value of is:
The summation is over such that the object
gets to the considered leaf. Then these optimal values of
can be used instead of weighted averages of gradients (
and
in the example above) in the same score functions.
CatBoost score functions
CatBoost provides the following score functions:
Score function | Description |
---|---|
L2 | Use the first derivatives during the calculation. |
Cosine (can not be used with the Lossguide tree growing policy) | |
NewtonL2 | Use the second derivatives during the calculation. This may improve the resulting quality of the model. |
NewtonCosine (can not be used with the Lossguide tree growing policy) |
Score function | Description |
---|---|
L2 | Use the first derivatives during the calculation. |
Cosine (can not be used with the Lossguide tree growing policy) | |
NewtonL2 | Use the second derivatives during the calculation. This may improve the resulting quality of the model. |
NewtonCosine (can not be used with the Lossguide tree growing policy) |
Per-object and per-feature penalties
Per-feature penalties for the first occurrence of the feature in the model. The given value is subtracted from the score if the current candidate is the first one to include the feature in the model.
Per-object penalties for the first use of the feature for the object. The given value is multiplied by the number of objects that are divided by the current split and use the feature for the first time.
The final score is calculated as follows:
is the feature weight
is the per-feature penalty
is the per-object penalty
is the current split
is the current leaf
Usage
Use the corresponding parameter to set the score function during the training:
The supported score functions vary depending on the processing unit type:
GPU — All score types
CPU — Cosine, L2
Python package | R package | Command-line interface | Description |
---|---|---|---|
score_function | score_function | --score-function | The score type used to select the next split during the tree construction. Possible values:
|
Python package | R package | Command-line interface | Description |
---|---|---|---|
score_function | score_function | --score-function | The score type used to select the next split during the tree construction. Possible values:
|