Choosing the tree structure
This is a greedy method. Features are selected in order along with their splits for substitution in each leaf. Candidates are selected based on data from the preliminary calculation of splits and the transformation of categorical features to numerical features. The tree depth and other rules for choosing the structure are set in the starting parameters.
How a feature-split
pair is chosen for a leaf:
-
A list is formed of possible candidates (
feature-split pairs
) to be assigned to a leaf as the split. -
A number of penalty functions are calculated for each object (on the condition that all of the candidates obtained from step 1 have been assigned to the leaf).
-
The split with the smallest penalty is selected.
The resulting value is assigned to the leaf.
This procedure is repeated for all following leaves (the number of leaves needs to match the depth of the tree).
Before building each new tree, a random permutation of classification objects is performed. A metric, which determines the direction for further improving the function, is used to select the structure of the next tree. The value is calculated sequentially for each object. The permutation obtained before building the tree is used in the calculation – the data for the objects are used in the order in which they were placed before the procedure.