Apply a model

Note.
The model prediction results will be correct only if the features data in the input dataset contains all the features used in the model. Typically, the order of these features must match the order of the corresponding columns that is provided during the training. But if feature names are provided both during the training and in the third column of the feature descriptions in the columns description file (specified with the --column-description/--cd parameter), they can be matched by names instead of columns order.

Execution format

catboost calc [optional parameters]

Options

Option Description Default value

-m

--model-path

The name of the input file with the description of the model obtained as the result of training.

model.bin
--model-format

The format of the input model.

Possible values:
  • CatboostBinary.
  • AppleCoreML (only datasets without categorical features are currently supported).
  • json (multiclassification models are not currently supported). Refer to the CatBoost JSON model tutorial for format details.
CatboostBinary

--input-path

The name of the input file with the dataset description.

input.tsv

--column-description

--cd

The path to the input file that contains the columns description.

If omitted, it is assumed that the first column in the file with the dataset description defines the label value, and the other columns are the values of numerical features.

-o

--output-path

Defines the output settings for the resulting values of the model.

Supported value formats and types:
  • stream://<stream> — Output the results to one of the program's standard output streams.

    stream is the name of the output stream. Possible values: stdout or stderr.

    For example, set the following value to output the results of applying the model to stdout:
    -o stream://stdout
  • [<path>/]<filename>.tsv — Write the results into the specified file.
    • path is the optional path to the directory, where the resulting file should be saved to. By default, the file is saved to the same directory, from which the application is launched.
    • filename is the name of the output file.
    For example, set the following value to output the results of applying the model to the /home/model/output-results.tsv file:
    -o /home/model/output-results.tsv
The output data format depends on the machine learning task being solved.
output.tsv
--output-columns

A comma-separated list of columns names to output when forming the results of applying the model.

Prediction and feature values can be output for each object of the input dataset. Additionally, some column types can be output if specified in the input data.

Supported prediction types
  • Probability
  • Class
  • RawFormulaVal
Supported column types
  • Label
  • Baseline
  • Weight
  • SampleId (Alias: DocId)
  • GroupId (Alias: QueryId)
  • QueryId
  • SubgroupId
  • Timestamp
The output columns can be set in any order. Format:
<prediction type 1>,[<prediction type 2> .. <prediction type N>][columns to output],[#<feature index 1>[:<name to output (user-defined)>] .. #<feature index N>[:<column name to output>]]
Example
--output-columns Probability,#3,#4:Feature4,Label,SampleId
A fragment of the output
Probability	#3	Feature4	Label	SampleId
0.4984999565	1	50.7799987793	0	0
0.8543220144	1	48.6333312988	2	1
0.7358535042	1	52.5699996948	1	2
0.8788711681	1	48.1699981689	2	3
Note.
At least one of the specified columns must contain prediction values. For example, the following value raises an error:
--output-columns SampleId

All columns that are supposed to be output according to the chosen parameters are output

-T

--thread-count

The number of threads to use during training.

Optimizes the speed of execution. This parameter doesn't affect results.

The number of processor cores

--tree-count-limit

The number of trees from the model to use when applying. If specified, the first <value> trees are used.

0 (if value equals to 0 this parameter is ignored and all trees from the model are used)
--eval-period

To reduce the number of trees to use when the model is applied or the metrics are calculated, set the step of the trees to use to eval-period.

This parameter defines the step to iterate over the range [--ntree-start; --ntree-end). For example, let's assume that the following parameter values are set:

  • --ntree-start is set 0
  • --ntree-end is set to N (the total tree count)
  • --eval-period is set to 2

In this case, the results are returned for the following tree ranges: [0, 2), [0, 4), ... , [0, N).

0 (the staged prediction mode is turned off)

--prediction-type

A comma-separated list of prediction types.

Supported prediction types:
  • Probability
  • Class
  • RawFormulaVal
RawFormulaVal
Option Description Default value

-m

--model-path

The name of the input file with the description of the model obtained as the result of training.

model.bin
--model-format

The format of the input model.

Possible values:
  • CatboostBinary.
  • AppleCoreML (only datasets without categorical features are currently supported).
  • json (multiclassification models are not currently supported). Refer to the CatBoost JSON model tutorial for format details.
CatboostBinary

--input-path

The name of the input file with the dataset description.

input.tsv

--column-description

--cd

The path to the input file that contains the columns description.

If omitted, it is assumed that the first column in the file with the dataset description defines the label value, and the other columns are the values of numerical features.

-o

--output-path

Defines the output settings for the resulting values of the model.

Supported value formats and types:
  • stream://<stream> — Output the results to one of the program's standard output streams.

    stream is the name of the output stream. Possible values: stdout or stderr.

    For example, set the following value to output the results of applying the model to stdout:
    -o stream://stdout
  • [<path>/]<filename>.tsv — Write the results into the specified file.
    • path is the optional path to the directory, where the resulting file should be saved to. By default, the file is saved to the same directory, from which the application is launched.
    • filename is the name of the output file.
    For example, set the following value to output the results of applying the model to the /home/model/output-results.tsv file:
    -o /home/model/output-results.tsv
The output data format depends on the machine learning task being solved.
output.tsv
--output-columns

A comma-separated list of columns names to output when forming the results of applying the model.

Prediction and feature values can be output for each object of the input dataset. Additionally, some column types can be output if specified in the input data.

Supported prediction types
  • Probability
  • Class
  • RawFormulaVal
Supported column types
  • Label
  • Baseline
  • Weight
  • SampleId (Alias: DocId)
  • GroupId (Alias: QueryId)
  • QueryId
  • SubgroupId
  • Timestamp
The output columns can be set in any order. Format:
<prediction type 1>,[<prediction type 2> .. <prediction type N>][columns to output],[#<feature index 1>[:<name to output (user-defined)>] .. #<feature index N>[:<column name to output>]]
Example
--output-columns Probability,#3,#4:Feature4,Label,SampleId
A fragment of the output
Probability	#3	Feature4	Label	SampleId
0.4984999565	1	50.7799987793	0	0
0.8543220144	1	48.6333312988	2	1
0.7358535042	1	52.5699996948	1	2
0.8788711681	1	48.1699981689	2	3
Note.
At least one of the specified columns must contain prediction values. For example, the following value raises an error:
--output-columns SampleId

All columns that are supposed to be output according to the chosen parameters are output

-T

--thread-count

The number of threads to use during training.

Optimizes the speed of execution. This parameter doesn't affect results.

The number of processor cores

--tree-count-limit

The number of trees from the model to use when applying. If specified, the first <value> trees are used.

0 (if value equals to 0 this parameter is ignored and all trees from the model are used)
--eval-period

To reduce the number of trees to use when the model is applied or the metrics are calculated, set the step of the trees to use to eval-period.

This parameter defines the step to iterate over the range [--ntree-start; --ntree-end). For example, let's assume that the following parameter values are set:

  • --ntree-start is set 0
  • --ntree-end is set to N (the total tree count)
  • --eval-period is set to 2

In this case, the results are returned for the following tree ranges: [0, 2), [0, 4), ... , [0, N).

0 (the staged prediction mode is turned off)

--prediction-type

A comma-separated list of prediction types.

Supported prediction types:
  • Probability
  • Class
  • RawFormulaVal
RawFormulaVal