Feature importance
The following types of feature importance files are created depending on the task and the execution parameters:
Regular feature importance
Contains
The individual importance values for each of the input features (the default feature importances calculation method for non-ranking metrics).
Possible values:
- PredictionValuesChange for non-ranking metrics
- LossFunctionChange for ranking metrics
Format
-
The rows are sorted in descending order of the feature importance value.
-
Each row contains information related to one feature.
Format:
<feature strength><\t><feature name>
feature strength
is the value of the of the regular feature importance.feature name
is the zero-based index of the feature.
An alphanumeric identifier is used instead if specified in the corresponding
Num
orCateg
column of the input data.For example, let's assume that the columns description file has the following structure:
0<\t>Label value<\t> 1<\t>Num 2<\t>Num<\t>ratio 3<\t>Categ 4<\t>Auxiliary 5<\t>Num
The input dataset description file contains the following line:
120<\t>80<\t>0.8<\t>rock<\t>some useless information<\t>12
The table below shows the compliance between the given feature values and the feature indices.
Feature value | Feature index or name |
---|---|
80 | 0 |
0.8 | ratio |
rock | 2 |
12 | 3 |
Example
8.4 <\t> 2
5.5 <\t> 0
2.6 <\t> 3
1.5 <\t> ratio
InternalFeatureImportance
Contains
The importance values both for each of the input features and for their combinations (if any).
Format
-
The rows are sorted in descending order of the feature importance value.
-
Each row contains information related to one feature or a combination of features.
Format:
<feature strength><\t><{feature name 1,.., feature name n} pr<value> tb<value> type<value>
-
feature strength
is the value of the internal feature importance. -
feature name
is the zero-based index of the feature.
An alphanumeric identifier is used instead if specified in the corresponding
Num
orCateg
column of the input data.For example, let's assume that the columns description file has the following structure:
0<\t>Label value<\t> 1<\t>Num 2<\t>Num<\t>ratio 3<\t>Categ 4<\t>Auxiliary 5<\t>Num
The input dataset description file contains the following line:
120<\t>80<\t>0.8<\t>rock<\t>some useless information<\t>12
The table below shows the compliance between the given feature values and the feature indices.
Feature value Feature index or name 80 0 0.8 ratio rock 2 12 3 pr
is the prior value.tb
is the label value border value.type
is the feature border type.
-
Example
8.4<\t>0
5.2<\t>{2, ratio} pr2 tb0 type0
2.6<\t>{2} pr2 tb0 type0