Apply a model

Note

The model prediction results will be correct only if the features data in the input dataset contains all the features used in the model. Typically, the order of these features must match the order of the corresponding columns that is provided during the training. But if feature names are provided both during the training and in the third column of the feature descriptions in the columns description file (specified with the --column-description/--cd parameter), they can be matched by names instead of the columns order.

Execution format

catboost calc [optional parameters]

Options

-m, --model-file,--model-path

Description

The name of the input file with the description of the model obtained as the result of training.

Default value

model.bin

--model-format

Description

The format of the input model.

Possible values:

  • CatboostBinary.AppleCoreML(only datasets without categorical features are currently supported).json (multiclassification models are not currently supported). Refer to the CatBoost JSON model tutorial for format details.

Default value

CatboostBinary

--input-path

Description

The name of the input file with the dataset description.

Default value

input.tsv

--column-description, --cd

Description

The path to the input file that contains the columns description.

Default value

If omitted, it is assumed that the first column in the file with the dataset description defines the label value, and the other columns are the values of numerical features.

--input-pairs

Description

The path to the input file that contains the pairs description for the dataset.

This information is used for the calculation of Pairwise metrics.

Default value

Omitted

Pairwise metrics require pairs of data. If this data is not provided explicitly by specifying this parameter, pairs are generated automatically in each group using object label values

--input-graph

Description

The path to the input file that contains the graph information for the dataset.

This information is used for calculation of Graph aggregated features.

Default value

Omitted

-o, --output-path

Description

Defines the output settings for the resulting values of the model.

Supported value formats and types:

  • stream://<stream> — Output the results to one of the program's standard output streams.

    stream is the name of the output stream. Possible values: stdout or stderr.

    For example, set the following value to output the results of applying the model to stdout:

    -o stream://stdout
    
  • [<path>/]<filename>.tsv — Write the results into the specified file.

    • path is the optional path to the directory, where the resulting file should be saved to. By default, the file is saved to the same directory, from which the application is launched.
    • filename is the name of the output file.

    For example, set the following value to output the results of applying the model to the /home/model/output-results.tsv file:

    -o /home/model/output-results.tsv
    

The output data format depends on the machine learning task being solved.

Default value

output.tsv

--output-columns

Description

A comma-separated list of columns names to output when forming the results of applying the model (including the ones obtained for the validation dataset when training).

Prediction and feature values can be output for each object of the input dataset. Auxiliary column can be output, if name for column was specified in column description. Additionally, some column types can be output if specified in the input data.

Supported prediction types
  • Probability
  • Class
  • RawFormulaVal
  • Exponent
  • LogProbability
Supported column types
  • Label
  • Baseline
  • Weight
  • SampleId (DocId)
  • GroupId (QueryId)
  • QueryId
  • SubgroupId
  • Timestamp
  • GroupWeight

The output columns can be set in any order. Format:

<prediction type 1>,[<prediction type 2> .. <prediction type N>][columns to output],[#<feature index 1>[:<name to output (user-defined)>] .. #<feature index N>[:<column name to output>]]

Example

--output-columns Probability,#3,#4:Feature4,Label,SampleId

In this example, features with indices 3 and 4 are output. The header contains the index (#3) for the feature indexed 3 and the string Feature4 for the feature indexed 4.

A fragment of the output
Probability	#3	Feature4	Label	SampleId
0.4984999565	1	50.7799987793	0	0
0.8543220144	1	48.6333312988	2	1
0.7358535042	1	52.5699996948	1	2
0.8788711681	1	48.1699981689	2	3

Note

At least one of the specified columns must contain prediction values. For example, the following value raises an error:

--output-columns SampleId

Default value

All columns that are supposed to be output according to the chosen parameters are output

-T, --thread-count

Description

The number of threads to use for operation.

Optimizes the speed of execution. This parameter doesn't affect results.

Default value

The number of processor cores

--tree-count-limit

Description

The number of trees from the model to use when applying. If specified, the first trees are used.

Default value

0 (if value equals to 0 this parameter is ignored and all trees from the model are used)

--eval-period

Description

To reduce the number of trees to use when the model is applied or the metrics are calculated, set the step of the trees to use to eval-period.

This parameter defines the step to iterate over the range [--ntree-start; --ntree-end). For example, let's assume that the following parameter values are set:

  • --ntree-start is set 0
  • --ntree-end is set to N (the total tree count)
  • --eval-period is set to 2

In this case, the results are returned for the following tree ranges: [0, 2), [0, 4), ... , [0, N).

Default value

0 (the staged prediction mode is turned off)

--prediction-type

Description

A comma-separated list of prediction types.

Supported prediction types:

  • Probability
  • Class
  • RawFormulaVal
  • Exponent
  • LogProbability

Default value

RawFormulaVal