Input file settings
These parameters are only for Command-line.
-f, --learn-set
Description
The path to the input file that contains the dataset description.
Format:
[scheme://]<path>
-
scheme
(optional) defines the type of the input dataset. Possible values:quantized://
— catboost.Pool quantized pool.libsvm://
— dataset in the extended libsvm format.
If omitted, a dataset in the Native CatBoost Delimiter-separated values format is expected.
-
path
defines the path to the dataset description.
Default value
Required parameter (the path must be specified).
Supported processing units
CPU and GPU
-t, --test-set
Description
A comma-separated list of input files that contain the validation dataset description (the format must be the same as used in the training dataset).
Default value
Omitted. If this parameter is omitted, the validation dataset isn't used.
Supported processing units
CPU and GPU
Alert
Only a single validation dataset can be input if the training is performed on GPU (--task-type
is set to GPU)
--cd, --column-description
Description
The path to the input file that contains the columns description.
Default value:
If omitted, it is assumed that the first column in the file with the dataset description defines the label value, and the other columns are the values of numerical features.
Supported processing units
CPU and GPU
--learn-pairs
Description
The path to the input file that contains the pairs description for the training dataset.
This information is used for calculation and optimization of Ranking: objectives and metrics.
Default value
Omitted.
Pairwise metrics require pairs data. If this data is not provided explicitly by specifying this parameter, pairs are generated automatically in each group using object label values.
Supported processing units
CPU and GPU
--test-pairs
Description
The path to the input file that contains the pairs description for the validation dataset.
This information is used for calculation and optimization of Pairwise metrics.
Default value
Omitted.
Pairwise metrics require pairs data. If this data is not provided explicitly by specifying this parameter, pairs are generated automatically in each group using object label values.
Supported processing units
CPU and GPU
--learn-group-weights
Description
The path to the input file that contains the weights of groups. Refer to the Group weights section for format details.
The dataset must contain the GroupId column in order to apply the file with the group weights.
The weights from this file take precedence if they are also specified in the Dataset description in delimiter-separated values format file.
Default value:
Omitted (group weights are either read from the dataset description or set to 1 for all groups if absent in the input dataset)
Supported processing units
CPU and GPU
--test-group-weights
Description
The path to the input file that contains the weights of groups for the validation dataset. Refer to the Group weights section for format details.
The dataset must contain the GroupId column in order to apply the file with the group weights.
The weights from this file take precedence if they are also specified in the Dataset description in delimiter-separated values format file.
Default value:
Omitted (group weights are either read from the dataset description or set to 1 for all groups if absent in the input dataset)
Supported processing units
CPU and GPU
--force-unit-auto-pair-weights
Description
For each auto-generated pair in pairwise losses, set the pair weight equal to one.
Default value:
Omitted (for each auto-generated pair, the weight is set equal to the weight of the group containing the elements of the pair)
Supported processing units
CPU and GPU
--learn-baseline
Description
The path to the input file that contains baseline values for the training dataset. Refer to the Baseline section for format details.
Default value
Omitted
Supported processing units
CPU and GPU
--test-baseline
Description
The path to the input file that contains baseline values for the validation dataset. Refer to the Baseline section for format details.
Default value
Omitted
Supported processing units
CPU and GPU
--learn-graph
Description
The path to the input file that contains the graph information for the training dataset.
Graph information is used to calculate the graph aggregated features.
Default value
Omitted.
Supported processing units
CPU and GPU
--test-graph
Description
The path to the input file that contains the graph information for the validation dataset.
Graph information is used to calculate the graph aggregated features.
Default value
Omitted.
Supported processing units
CPU and GPU
--delimiter
Description
The delimiter character used to separate the data in the dataset description input file.
Only single char delimiters are supported. If the specified value contains more than one character, only the first one is used.
Note
Used only if the dataset is given in the Delimiter-separated values format.
Default value
The input data is assumed to be tab-separated
Supported processing units
CPU and GPU
--has-header
Description
Read the column names from the first line of the dataset description file if this parameter is set.
Note
Used only if the dataset is given in the Delimiter-separated values format.
Default value:
False (the first line is supposed to have the same data as the rest of them)
Supported processing units
CPU and GPU
--params-file
Description
The path to the input JSON file that contains the training parameters, for example:
{
"learning_rate": 0.1,
"bootstrap_type": "No"
}
Names of training parameters are the same as for the Python package or the R package.
If a parameter is specified in both the JSON file and the corresponding command-line parameter, the command-line value is used.
Default value
Omitted
Supported processing units
CPU and GPU
--nan-mode
Description
The method for processing missing values in the input dataset.
Possible values:
- "Forbidden" — Missing values are not supported, their presence is interpreted as an error.
- "Min" — Missing values are processed as the minimum value (less than all other values) for the feature. It is guaranteed that a split that separates missing values from all other values is considered when selecting trees.
- "Max" — Missing values are processed as the maximum value (greater than all other values) for the feature. It is guaranteed that a split that separates missing values from all other values is considered when selecting trees.
Using the Min or Max value of this parameter guarantees that a split between missing values and other values is considered when selecting a new split in the tree.
Note
The method for processing missing values can be set individually for each feature in the Custom quantization borders and missing value modes input file. Such values override the ones specified in this parameter.
Default value
Min
Supported processing units
CPU and GPU