Feature interaction

Interaction

The value of the feature interaction strength for each pair of features.

All splits of features f1f1 and f2f2 in all trees of the resulting ensemble are observed when calculating the interaction between these features.

If splits of both features are present in the tree, then we are looking on how much leaf value changes when these splits have the same value and they have opposite values.

See the Interaction file format.

Calculation principles

interaction(f1,f2)=treesleafs:split(f1)=split(f2)LeafValueleafs:split(f1)split(f2)LeafValueinteraction(f_{1}, f_{2}) = \sum_{trees} \left |\sum_{leafs: split(f_1)=split(f_2)} LeafValue { } - \sum_{leafs: split(f_1)\ne split(f_2)}LeafValue \right |
The sum inside the modulus always contains an even number of terms. The first half of terms contains leaf values when splits by f1f1 have the same value as splits by f2f2, the second half contains leaf values when two splits have different values, and the second half is in the sum with a different sign.

The larger the difference between sums of leaf values, the bigger the interaction. This process reflects the following idea: let's fix one feature and see if the changes to the other one will result in large formula changes.

InternalInteraction

The value of the feature interaction strength for each pair of features that are used in the model. Internally the model uses feature combinations as separate features. All feature combinations that are used in the model are listed separately. For example, if the model contains a feature named F1 and a combination of features {F2, F3}, the interaction between F1 and the combination of features {F2, F3} is listed in the output file.

See the InternalInteraction file format.

Calculation principles

interaction(f1,f2)=treesleafs:split(f1)=split(f2)LeafValueleafs:split(f1)split(f2)LeafValueinteraction(f_{1}, f_{2}) = \sum_{trees} \left |\sum_{leafs: split(f_1)=split(f_2)} LeafValue { } - \sum_{leafs: split(f_1) \neq split(f_2)}LeafValue \right |

Detailed information regarding usage specifics for different Catboost implementations.