Apply a model
Note
The model prediction results will be correct only if the features data in the input dataset contains all the features used in the model. Typically, the order of these features must match the order of the corresponding columns that is provided during the training. But if feature names are provided both during the training and in the third column of the feature descriptions in the columns description file (specified with the --column-description
/--cd
parameter), they can be matched by names instead of the columns order.
Execution format
catboost calc [optional parameters]
Options
-m, --model-file,--model-path
Description
The name of the input file with the description of the model obtained as the result of training.
Default value
model.bin
--model-format
Description
The format of the input model.
Possible values:
- CatboostBinary.AppleCoreML(only datasets without categorical features are currently supported).json (multiclassification models are not currently supported). Refer to the CatBoost JSON model tutorial for format details.
Default value
CatboostBinary
--input-path
Description
The name of the input file with the dataset description.
Default value
input.tsv
--column-description, --cd
Description
The path to the input file that contains the columns description.
Default value
If omitted, it is assumed that the first column in the file with the dataset description defines the label value, and the other columns are the values of numerical features.
--input-pairs
Description
The path to the input file that contains the pairs description for the dataset.
This information is used for the calculation of Pairwise metrics.
Default value
Omitted
Pairwise metrics require pairs of data. If this data is not provided explicitly by specifying this parameter, pairs are generated automatically in each group using object label values
--input-graph
Description
The path to the input file that contains the graph information for the dataset.
This information is used for calculation of Graph aggregated features.
Default value
Omitted
-o, --output-path
Description
Defines the output settings for the resulting values of the model.
Supported value formats and types:
-
stream://<stream>
— Output the results to one of the program's standard output streams.stream
is the name of the output stream. Possible values:stdout
orstderr
.For example, set the following value to output the results of applying the model to
stdout
:-o stream://stdout
-
[<path>/]<filename>.tsv
— Write the results into the specified file.path
is the optional path to the directory, where the resulting file should be saved to. By default, the file is saved to the same directory, from which the application is launched.filename
is the name of the output file.
For example, set the following value to output the results of applying the model to the
/home/model/output-results.tsv
file:-o /home/model/output-results.tsv
The output data format depends on the machine learning task being solved.
Default value
output.tsv
--output-columns
Description
A comma-separated list of columns names to output when forming the results of applying the model (including the ones obtained for the validation dataset when training).
Prediction and feature values can be output for each object of the input dataset. Auxiliary column can be output, if name for column was specified in column description. Additionally, some column types can be output if specified in the input data.
Supported prediction types
- Probability
- Class
- RawFormulaVal
- Exponent
- LogProbability
Supported column types
Label
Baseline
Weight
SampleId
(DocId
)GroupId
(QueryId
)QueryId
SubgroupId
Timestamp
GroupWeight
The output columns can be set in any order. Format:
<prediction type 1>,[<prediction type 2> .. <prediction type N>][columns to output],[#<feature index 1>[:<name to output (user-defined)>] .. #<feature index N>[:<column name to output>]]
Example
--output-columns Probability,#3,#4:Feature4,Label,SampleId
In this example, features with indices 3 and 4 are output. The header contains the index (#3
) for the feature indexed 3 and the string Feature4
for the feature indexed 4.
A fragment of the output
Probability #3 Feature4 Label SampleId
0.4984999565 1 50.7799987793 0 0
0.8543220144 1 48.6333312988 2 1
0.7358535042 1 52.5699996948 1 2
0.8788711681 1 48.1699981689 2 3
Note
At least one of the specified columns must contain prediction values. For example, the following value raises an error:
--output-columns SampleId
Default value
All columns that are supposed to be output according to the chosen parameters are output
-T, --thread-count
Description
The number of threads to use for operation.
Optimizes the speed of execution. This parameter doesn't affect results.
Default value
The number of processor cores
--tree-count-limit
Description
The number of trees from the model to use when applying. If specified, the first
Default value
0 (if value equals to 0 this parameter is ignored and all trees from the model are used)
--eval-period
Description
To reduce the number of trees to use when the model is applied or the metrics are calculated, set the step of the trees to use to eval-period
.
This parameter defines the step to iterate over the range [--ntree-start; --ntree-end)
. For example, let's assume that the following parameter values are set:
--ntree-start
is set 0--ntree-end
is set to N (the total tree count)--eval-period
is set to 2
In this case, the results are returned for the following tree ranges: [0, 2)
, [0, 4)
, ... , [0, N)
.
Default value
0 (the staged prediction mode is turned off)
--prediction-type
Description
A comma-separated list of prediction types.
Supported prediction types:
- Probability
- Class
- RawFormulaVal
- Exponent
- LogProbability
Default value
RawFormulaVal