Model values

The results of applying the model on a dataset.

The output information and format depends on the machine learning problem being solved:

Additionally, the required output columns can be defined in the --output-columns command-line applying parameter. All columns that are supposed to be output according to the chosen parameters are output by default.

Parameter description

A comma-separated list of columns names to output when forming the results of applying the model (including the ones obtained for the validation dataset when training).

Prediction and feature values can be output for each object of the input dataset. Auxiliary columns can be output, if name for column was specified in column description. Additionally, some column types can be output if specified in the input data.

Supported prediction types
  • Probability
  • Class
  • RawFormulaVal
  • Exponent
  • LogProbability
Supported column types
  • Label
  • Baseline
  • Weight
  • SampleId (DocId)
  • GroupId (QueryId)
  • QueryId
  • SubgroupId
  • Timestamp
  • GroupWeight

The output columns can be set in any order. Format:

<prediction type 1>,[<prediction type 2> .. <prediction type N>][columns to output],[#<feature index 1>[:<name to output (user-defined)>] .. #<feature index N>[:<column name to output>]]

Example

--output-columns Probability,#3,#4:Feature4,Label,SampleId

In this example, features with indices 3 and 4 are output. The header contains the index (#3) for the feature indexed 3 and the string Feature4 for the feature indexed 4.

A fragment of the output

Probability	#3	Feature4	Label	SampleId
0.4984999565	1	50.7799987793	0	0
0.8543220144	1	48.6333312988	2	1
0.7358535042	1	52.5699996948	1	2
0.8788711681	1	48.1699981689	2	3

Note

At least one of the specified columns must contain prediction values. For example, the following value raises an error:

--output-columns SampleId

Regression or Ranking

Contains

A number resulting from applying the model.

  • RMSEWithUncertainty (only for models trained with RMSEWithUncertainty loss) It is RawFormulaVal with exponent function applied to second dimension of approx due to obtain estimation of the variance (See the Uncertainty section).

Header format

The first row in the output file contains a tab-separated description of data in the corresponding column.

Format:

[EvalSet:]SampleId<\t><Prediction type 1><\t>..<\t><Prediction type N>[<\t>Label]
  • EvalSet: is output for the evaluation file only if several validation datasets are input.

  • Prediction type is specified in the starting parameters and takes one or several of the following values:

    • Probability
    • Class
    • RawFormulaVal
    • Exponent
    • LogProbability
    • RMSEWithUncertainty
    • VirtEnsembles
    • TotalUncertainty
  • Labelis only output for the validation dataset in training mode and the cross-validation dataset in cross-validation mode if it is specified in the input dataset.

Format

Each row starting from the second contains tab-separated information about a single object from the input dataset.

Format:

[<Validation dataset ID>:]<SampleId><\t><model value for prediction type 1><\t>..<\t><model value for prediction type N>[<\t><Label>]
  • Validation dataset ID is the serial number of the input validation dataset. The value is output if several validation datasets are input for model evaluation purposes.
  • SampleId is an alphanumeric ID of the object given in the Dataset description in delimiter-separated values format. If the identifiers are not set in the input data the objects are sequentially numbered, starting from zero.
  • model value for prediction type is the float number resulting from applying the model for the corresponding prediction type.
  • label is the label value for the object. This value is only output for the validation dataset in training mode and the cross-validation dataset in cross-validation mode if it is specified in the input dataset.

Example

The resulting file without alphanumeric IDs:

SampleId<\t>Probability<\t>Class
0<\t>0.8<\t>1
1<\t>0.3<\t>0

The resulting file for the cross-validation mode with alphanumeric IDs set:

SampleId<\t>Probability<\t>Label
LT<\t>75.1<\t>73.6
LV<\t>73.2<\t>72.15
PL<\t>78.22<\t>77.5

Classification

Contains

Depends on the selected output mode for approximated values of the formula:

  • RawFormulaVal —A number resulting from applying the model.
  • Probability — A number indicating the probability that the object belongs to the class (a sigmoid of the result of applying the model).
  • Class — The predicted class (output with the value 1 if the probability is higher than 0.5, otherwise 0).

Header format

The first row in the output file contains a tab-separated description of data in the corresponding column.

Format:

[EvalSet:]SampleId<\t><Prediction type 1><\t>..<\t><Prediction type N>[<\t>Label]
  • EvalSet: is output for the evaluation file only if several validation datasets are input.

  • Prediction type is specified in the starting parameters and takes one or several of the following values:

    • Probability
    • Class
    • RawFormulaVal
    • Exponent
    • LogProbability
    • VirtEnsembles
    • TotalUncertainty
  • Labelis only output for the validation dataset in training mode and the cross-validation dataset in cross-validation mode if it is specified in the input dataset.

Format

Each row in the output file contains tab-separated information about a single object from the input dataset.

Format:

[<Validation dataset ID>:]<SampleId><\t><model value>[<\t><Label>]
  • Validation dataset ID is the serial number of the input validation dataset. The value is output if several validation datasets are input for model evaluation purposes.
  • SampleId is an alphanumeric ID of the object given in the Dataset description in delimiter-separated values format. If the identifiers are not set in the input data the objects are sequentially numbered, starting from zero.
  • model value is the number resulting from applying the model for the corresponding prediction type.
  • labelis the label value for the object. This value is only output for the validation dataset in training mode and the cross-validation dataset in cross-validation mode if it is specified in the input dataset.

Example

The resulting file for the RawFormulaVal cross-validation mode:

SampleId<\t>RawFormulaVal<\t>Label
0<\t>0.1685379577<\t>1
1<\t>0.2379356203<\t>1
2<\t>-0.04871954376<\t>1

The resulting file for the Probability cross-validation mode with alphanumeric IDs set for objects:

SampleId<\t>Probability<\t>Label
SampleId1<\t>0.5592048528<\t>1
SampleId2<\t>0.5595881735<\t>1
SampleId3<\t>0.5592048528<\t>1

The resulting file for the Class mode:

SampleId<\t>Class
0<\t>0
1<\t>1
2<\t>1
3<\t>s0

Multiclassification

Contains

Depends on the selected output mode for approximated values of the formula:

  • RawFormulaVal — A list of numbers resulting from applying the model. Values for the different classes are tab-separated.
  • Probability — A list of numbers indicating the probability that the object belongs to each of the classes. Values for the different classes are tab-separated.
  • Class —The number of the class that the object most likely belongs to.

Header format

The first row in the output file contains a tab-separated description of data in the corresponding column.

Format:

[EvalSet:]SampleId</t><PredictionType1>[:Class=<ClassID>]</t>..</t><PredictionTypeN>:Class=<ClassID>[<\t>Label]
  • EvalSet: is output for the evaluation file only if several validation datasets are input.

  • Prediction type is specified in the starting parameters and takes one or several of the following values:

    • Probability
    • Class
    • RawFormulaVal
    • Exponent
    • LogProbability
    • VirtEnsembles
    • TotalUncertainty
  • ClassID is the identifier of the class being described in the column. It is omitted for the Class prediction type.

  • Labelis only output for the validation dataset in training mode and the cross-validation dataset in cross-validation mode if it is specified in the input dataset.

The number of Prediction type–ClassID pairs depends on the input parameters. It is always limited to one pair for the Class prediction type.

Format

Each row in the output file contains tab-separated information about a single object from the input dataset.

Format:

[Validation dataset ID:]<SampleId><\t><Model value 1>..<Model value N>[<\t><Label>]
  • Validation dataset ID is the serial number of the input validation dataset. The value is output if several validation datasets are input for model evaluation purposes.
  • SampleId is an alphanumeric ID of the object given in the Dataset description in delimiter-separated values format. If the identifiers are not set in the input data the objects are sequentially numbered, starting from zero.
  • Model value is a number or a list of numbers depending on the selected output mode for approximated values of the formula for the corresponding prediction type.
  • labelis the label value for the object. This value is only output for the validation dataset in training mode and the cross-validation dataset in cross-validation mode if it is specified in the input dataset.

Example

The resulting file for prediction in  Class mode with alphanumeric IDs set for objects:

SampleId<\t>Class
SampleId1<\t>2
SampleId2<\t>1
SampleId3<\t>2

The resulting file for the Probability cross-validation mode:

SampleId<\t>Probability:Class=0<\t>CProbability:Class=1<\t>Probability:Class=2<\t>Label
1<\t>0.3232259635</t>0.315456703</t>0.3613173334</t>2
2<\t>0.335771253</t>0.3247524917</t>0.3394762553</t>0
3<\t>0.3181931812</t>0.3242628483</t>0.3575439705</t>1

The resulting file for the RawFormulaVal cross-validation mode:

SampleId<\t>RawFormulaVal:Class=0<\t>RawFormulaVal:Class=1<\t>RawFormulaVal:Class=2<\t>Label
1<\t>0.001232427024</t>-0.04141999431</t>0.04018756728</t>2
2<\t>-0.04822847313</t>-0.05520994445</t>0.1034384176</t>2
3<\t>-0.05717915565</t>-0.06548867981</t>0.1226678355</t>2

The resulting file for prediction in RawFormulaVal and Probability modes:

SampleId<\t>Probability:Class=0<\t>Probability:Class=1<\t>RawFormulaVal:Class=0<\t>RawFormulaVal:Class=1
1<\t>0.01593276944<\t>0.02337982256<\t>-1.494255509<\t>-1.110760101
2<\t>0.4060707366<\t>0.09565861257<\t>0.4137085351<\t>-1.032033103
3<\t>0.006235130003<\t>0.01759049831<\t>-2.03020042<\t>-0.9930409613

Multiregression

Contains

The numbers resulting from applying the model.

Header format

The first row in the output file contains a tab-separated description of data in the corresponding column.

Format:

[EvalSet:]SampleId<\t><Prediction type 1:Dim=0><\t>..<\t><Prediction type 1:Dim=M><\t>..<\t><Prediction type N:Dim=0><\t>..<\t><Prediction type N:Dim=M>[<\t>Label 1<\t>..<\t>Label M]
  • EvalSet: is output for the evaluation file only if several validation datasets are input.

  • Prediction type is specified in the starting parameters and takes one or several of the following values:

    • Probability
    • RawFormulaVal
  • Dim is the identifier of the label value from the labels vector.

  • Labelis only output for the validation dataset in training mode and the cross-validation dataset in cross-validation mode if it is specified in the input dataset.

Format

Each row starting from the second contains tab-separated information about a single object from the input dataset.

Format:

[<Validation dataset ID>:]<SampleId><\t><model value for prediction type 1 and dimension 0><\t>..<\t><model value for prediction type N and dimension 0><\t>..<\t><model value for prediction type N and dimension M>[<\t><Label 1><\t>..<\t><Label M>]
  • Validation dataset ID is the serial number of the input validation dataset. The value is output if several validation datasets are input for model evaluation purposes.
  • SampleId is an alphanumeric ID of the object given in the Dataset description in delimiter-separated values format. If the identifiers are not set in the input data the objects are sequentially numbered, starting from zero.
  • model value for prediction type and dimension is the float number resulting from applying the model for the corresponding prediction type and label dimension ID.
  • labelis the label value for the object. This value is only output for the validation dataset in training mode and the cross-validation dataset in cross-validation mode if it is specified in the input dataset.

Example

The resulting file for one prediction type and two label values:

SampleId	RawFormulaVal:Dim=0	RawFormulaVal:Dim=1	Label:Dim=0	Label:Dim=1
0	-0.8314181536	0.0933995366	-1.3208890577723018	0.025460321322479378
1	0.2543254277	0.2431282743	0.6210866265458542	0.4546559804798615
2	0.1160567752	0.2253731494	-0.11214938100684924	0.1511700693162882
3	0.6113665201	-0.4325967375	1.7442698560770338	-0.3495077953076593
4	1.939011554	0.132468688	2.2505580326330246	0.2428380077970017
5	0.7703675176	-0.1621147587	0.9890286771162449	-0.2159423001955859
6	-1.404801493	0.1512662166	-2.5933998532067952	-0.41641507539874467
7	-1.029800989	0.3759548141	-3.210838860491183	0.9898948495861314
8	1.068953517	-0.2148923107	1.0980660342295991	-0.36311073650281733
9	-0.6949856863	0.1497956154	-0.8800937021376194	0.10275208758300074

Regression or Ranking prediction

Previous