Scale and bias

Purpose

Set and/or print the model scale and bias.

Execution format

catboost normalize-model [optional parameters]

Options

Option Description Default value

-m

--model-file

--model-path

The name of the input file with the description of the model obtained as the result of training.

model.bin
--model-format

The format of the input mode.

Possible values:

  • CatboostBinary.
  • AppleCoreML (only datasets without categorical features are currently supported).
  • json (multiclassification models are not currently supported). Refer to the CatBoost JSON model tutorial for format details.
  • onnx — ONNX-ML format (only datasets without categorical features are currently supported). Refer to https://onnx.ai for details. See the ONNX section for details on applying the resulting model.
  • pmml — PMML version 4.3 format. Categorical features must be interpreted as one-hot encoded during the training if present in the training dataset. This can be accomplished by setting the --one-hot-max-size/one_hot_max_size parameter to a value that is greater than the maximum number of unique categorical feature values among all categorical features in the dataset. See the PMML section for details on applying the resulting model.

CatboostBinary

--column-description

--cd

The path to the input file that contains the columns description.

If omitted, it is assumed that the first column in the file with the dataset description defines the label value, and the other columns are the values of numerical features.

--delimiter

The delimiter character used to separate the data in the dataset description input file.

Only single char delimiters are supported. If the specified value contains more than one character, only the first one is used.
Note. Used only if the dataset is given in the Delimiter-separated values format.
The input data is assumed to be tab-separated
--has-header

Read the column names from the first line of the dataset description file if this parameter is set.

Note. Used only if the dataset is given in the Delimiter-separated values format.
False (the first line is supposed to have the same data as the rest of them)
--set-scale

The model scale.

1
--set-bias

The model bias.

The model prediction results are calculated as follows:

The value of this parameters affects the prediction by changing the default value of the bias.

Depends on the value of the boost_from_average parameter:

  • True — The best constant value for the specified loss function
  • False — 0
--print-scale-and-bias

Return the scale and bias of the model.

These values affect the results of applying the model, since the model prediction results are calculated as follows:

Scale and bias are not output
--logging-level

The logging level to output to stdout.

Possible values:
  • Silent — Do not output any logging information to stdout.

  • Verbose — Output the following data to stdout:

    • optimized metric
    • elapsed time of training
    • remaining time of training
  • Info — Output additional information and the number of trees.

  • Debug — Output debugging information.
Info

-T

--thread-count

The number of threads to use.

4

--input-path

The name of the input file with the dataset description.

input.tsv
--output-model

The path to the output model.

model.bin
--output-model-format
The format of the output model.

Possible values:

  • CatboostBinary.
  • AppleCoreML (only datasets without categorical features are currently supported).
  • CPP (multiclassification models are not currently supported). See the C++ section for details on applying the resulting model.
  • Python (multiclassification models are not currently supported).See the Python section for details on applying the resulting model.
  • json (multiclassification models are not currently supported). Refer to the CatBoost JSON model tutorial for format details.
  • onnx — ONNX-ML format (only datasets without categorical features are currently supported). Refer to https://onnx.ai for details. See the ONNX section for details on applying the resulting model.
  • pmml — PMML version 4.3 format. Categorical features must be interpreted as one-hot encoded during the training if present in the training dataset. This can be accomplished by setting the --one-hot-max-size/one_hot_max_size parameter to a value that is greater than the maximum number of unique categorical feature values among all categorical features in the dataset. See the PMML section for details on applying the resulting model.

CatboostBinary
Option Description Default value

-m

--model-file

--model-path

The name of the input file with the description of the model obtained as the result of training.

model.bin
--model-format

The format of the input mode.

Possible values:

  • CatboostBinary.
  • AppleCoreML (only datasets without categorical features are currently supported).
  • json (multiclassification models are not currently supported). Refer to the CatBoost JSON model tutorial for format details.
  • onnx — ONNX-ML format (only datasets without categorical features are currently supported). Refer to https://onnx.ai for details. See the ONNX section for details on applying the resulting model.
  • pmml — PMML version 4.3 format. Categorical features must be interpreted as one-hot encoded during the training if present in the training dataset. This can be accomplished by setting the --one-hot-max-size/one_hot_max_size parameter to a value that is greater than the maximum number of unique categorical feature values among all categorical features in the dataset. See the PMML section for details on applying the resulting model.

CatboostBinary

--column-description

--cd

The path to the input file that contains the columns description.

If omitted, it is assumed that the first column in the file with the dataset description defines the label value, and the other columns are the values of numerical features.

--delimiter

The delimiter character used to separate the data in the dataset description input file.

Only single char delimiters are supported. If the specified value contains more than one character, only the first one is used.
Note. Used only if the dataset is given in the Delimiter-separated values format.
The input data is assumed to be tab-separated
--has-header

Read the column names from the first line of the dataset description file if this parameter is set.

Note. Used only if the dataset is given in the Delimiter-separated values format.
False (the first line is supposed to have the same data as the rest of them)
--set-scale

The model scale.

1
--set-bias

The model bias.

The model prediction results are calculated as follows:

The value of this parameters affects the prediction by changing the default value of the bias.

Depends on the value of the boost_from_average parameter:

  • True — The best constant value for the specified loss function
  • False — 0
--print-scale-and-bias

Return the scale and bias of the model.

These values affect the results of applying the model, since the model prediction results are calculated as follows:

Scale and bias are not output
--logging-level

The logging level to output to stdout.

Possible values:
  • Silent — Do not output any logging information to stdout.

  • Verbose — Output the following data to stdout:

    • optimized metric
    • elapsed time of training
    • remaining time of training
  • Info — Output additional information and the number of trees.

  • Debug — Output debugging information.
Info

-T

--thread-count

The number of threads to use.

4

--input-path

The name of the input file with the dataset description.

input.tsv
--output-model

The path to the output model.

model.bin
--output-model-format
The format of the output model.

Possible values:

  • CatboostBinary.
  • AppleCoreML (only datasets without categorical features are currently supported).
  • CPP (multiclassification models are not currently supported). See the C++ section for details on applying the resulting model.
  • Python (multiclassification models are not currently supported).See the Python section for details on applying the resulting model.
  • json (multiclassification models are not currently supported). Refer to the CatBoost JSON model tutorial for format details.
  • onnx — ONNX-ML format (only datasets without categorical features are currently supported). Refer to https://onnx.ai for details. See the ONNX section for details on applying the resulting model.
  • pmml — PMML version 4.3 format. Categorical features must be interpreted as one-hot encoded during the training if present in the training dataset. This can be accomplished by setting the --one-hot-max-size/one_hot_max_size parameter to a value that is greater than the maximum number of unique categorical feature values among all categorical features in the dataset. See the PMML section for details on applying the resulting model.

CatboostBinary

Usage examples

Set the scale and bias to 0.8:

catboost normalize-model --set-scale 0.8 --set-bias 0.8 --print-scale-and-bias

The output of this example:

Input model scale 1 bias 1.405940652
Output model scale 0.8 bias 0.8