Feature importance

The following types of feature importance files are created depending on the task and the execution parameters:

Regular feature importance

Contains

The individual importance values for each of the input features (the default feature importances calculation method for non-ranking metrics).

Possible values:

Format

  • The rows are sorted in descending order of the feature importance value.

  • Each row contains information related to one feature.

    Format:

    <feature strength><\t><feature name>
    
    • feature strength is the value of the of the regular feature importance.
    • feature name is the zero-based index of the feature.

    An alphanumeric identifier is used instead if specified in the corresponding Num or Categ column of the input data.

    For example, let's assume that the columns description file has the following structure:

    0<\t>Label value<\t>
    1<\t>Num
    2<\t>Num<\t>ratio
    3<\t>Categ
    4<\t>Auxiliary
    5<\t>Num
    

    The input dataset description file contains the following line:

    120<\t>80<\t>0.8<\t>rock<\t>some useless information<\t>12
    

    The table below shows the compliance between the given feature values and the feature indices.

Feature value Feature index or name
80 0
0.8 ratio
rock 2
12 3

Example

8.4 <\t> 2
5.5 <\t> 0
2.6 <\t> 3
1.5 <\t> ratio

InternalFeatureImportance

Contains

The importance values both for each of the input features and for their combinations (if any).

Format

  • The rows are sorted in descending order of the feature importance value.

  • Each row contains information related to one feature or a combination of features.

    Format:

    <feature strength><\t><{feature name 1,.., feature name n} pr<value> tb<value> type<value>
    
    • feature strength is the value of the internal feature importance.

    • feature name is the zero-based index of the feature.

    An alphanumeric identifier is used instead if specified in the corresponding Num or Categ column of the input data.

    For example, let's assume that the columns description file has the following structure:

    0<\t>Label value<\t>
    1<\t>Num
    2<\t>Num<\t>ratio
    3<\t>Categ
    4<\t>Auxiliary
    5<\t>Num
    

    The input dataset description file contains the following line:

    120<\t>80<\t>0.8<\t>rock<\t>some useless information<\t>12
    

    The table below shows the compliance between the given feature values and the feature indices.

    Feature value Feature index or name
    80 0
    0.8 ratio
    rock 2
    12 3
    • pr is the prior value.
    • tb is the label value border value.
    • type is the feature border type.

Example

8.4<\t>0
5.2<\t>{2, ratio} pr2 tb0 type0
2.6<\t>{2} pr2 tb0 type0