Input file settings

-f, --learn-set
-t, --test-set
--cd, --column-description
--learn-pairs
--test-pairs
--learn-group-weights
--test-group-weights
--force-unit-auto-pair-weights
--learn-baseline
--test-baseline
--delimiter
--has-header
--params-file
--nan-mode

These parameters are only for Command-line.

-f, --learn-set

Description

The path to the input file that contains the dataset description.

Format:

[scheme://]<path>

scheme (optional) defines the type of the input dataset. Possible values:
- quantized:// — catboost.Pool quantized pool.
- libsvm:// — dataset in the extended libsvm format.
If omitted, a dataset in the Native CatBoost Delimiter-separated values format is expected.
path defines the path to the dataset description.

Default value

Required parameter (the path must be specified).

Supported processing units

CPU and GPU

-t, --test-set

Description

A comma-separated list of input files that contain the validation dataset description (the format must be the same as used in the training dataset).

Default value

Omitted. If this parameter is omitted, the validation dataset isn't used.

Supported processing units

CPU and GPU

Alert

Only a single validation dataset can be input if the training is performed on GPU (--task-type is set to GPU)

--cd, --column-description

Description

The path to the input file that contains the columns description.

Default value:

If omitted, it is assumed that the first column in the file with the dataset description defines the label value, and the other columns are the values of numerical features.

Supported processing units

CPU and GPU

--learn-pairs

Description

The path to the input file that contains the pairs description for the training dataset.

This information is used for calculation and optimization of Ranking: objectives and metrics.

Default value

Omitted.

Pairwise metrics require pairs data. If this data is not provided explicitly by specifying this parameter, pairs are generated automatically in each group using object label values.

Supported processing units

CPU and GPU

--test-pairs

Description

The path to the input file that contains the pairs description for the validation dataset.

This information is used for calculation and optimization of Pairwise metrics.

Default value

Omitted.

Pairwise metrics require pairs data. If this data is not provided explicitly by specifying this parameter, pairs are generated automatically in each group using object label values.

Supported processing units

CPU and GPU

--learn-group-weights

Description

The path to the input file that contains the weights of groups. Refer to the Group weights section for format details.

The dataset must contain the GroupId column in order to apply the file with the group weights.

The weights from this file take precedence if they are also specified in the Dataset description in delimiter-separated values format file.

Default value:

Omitted (group weights are either read from the dataset description or set to 1 for all groups if absent in the input dataset)

Supported processing units

CPU and GPU

--test-group-weights

Description

The path to the input file that contains the weights of groups for the validation dataset. Refer to the Group weights section for format details.

The dataset must contain the GroupId column in order to apply the file with the group weights.

The weights from this file take precedence if they are also specified in the Dataset description in delimiter-separated values format file.

Default value:

Omitted (group weights are either read from the dataset description or set to 1 for all groups if absent in the input dataset)

Supported processing units

CPU and GPU

--force-unit-auto-pair-weights

Description

For each auto-generated pair in pairwise losses, set the pair weight equal to one.

Default value:

Omitted (for each auto-generated pair, the weight is set equal to the weight of the group containing the elements of the pair)

Supported processing units

CPU and GPU

--learn-baseline

Description

The path to the input file that contains baseline values for the training dataset. Refer to the Baseline section for format details.

Default value

Omitted

Supported processing units

CPU and GPU

--test-baseline

Description

The path to the input file that contains baseline values for the validation dataset. Refer to the Baseline section for format details.

Default value

Omitted

Supported processing units

CPU and GPU

--delimiter

Description

The delimiter character used to separate the data in the dataset description input file.

Only single char delimiters are supported. If the specified value contains more than one character, only the first one is used.

Note

Used only if the dataset is given in the Delimiter-separated values format.

Default value

The input data is assumed to be tab-separated

Supported processing units

CPU and GPU

--has-header

Description

Read the column names from the first line of the dataset description file if this parameter is set.

Note

Used only if the dataset is given in the Delimiter-separated values format.

Default value:

False (the first line is supposed to have the same data as the rest of them)

Supported processing units

CPU and GPU

--params-file

Description

The path to the input JSON file that contains the training parameters, for example:

{
"learning_rate": 0.1,
"bootstrap_type": "No"
}

Names of training parameters are the same as for the Python package or the R package.

If a parameter is specified in both the JSON file and the corresponding command-line parameter, the command-line value is used.

Default value

Omitted

Supported processing units

CPU and GPU

--nan-mode

Description

The method for processing missing values in the input dataset.

Possible values:

"Forbidden" — Missing values are not supported, their presence is interpreted as an error.
"Min" — Missing values are processed as the minimum value (less than all other values) for the feature. It is guaranteed that a split that separates missing values from all other values is considered when selecting trees.
"Max" — Missing values are processed as the maximum value (greater than all other values) for the feature. It is guaranteed that a split that separates missing values from all other values is considered when selecting trees.

Using the Min or Max value of this parameter guarantees that a split between missing values and other values is considered when selecting a new split in the tree.

Note

The method for processing missing values can be set individually for each feature in the Custom quantization borders and missing value modes input file. Such values override the ones specified in this parameter.

Default value

Min

Supported processing units

CPU and GPU

Input file settings

-f, --learn-set-f, --learn-set

DescriptionDescription

-t, --test-set-t, --test-set

DescriptionDescription

--cd, --column-description--cd, --column-description

DescriptionDescription

--learn-pairs--learn-pairs

DescriptionDescription

--test-pairs--test-pairs

DescriptionDescription

--learn-group-weights--learn-group-weights

DescriptionDescription

--test-group-weights--test-group-weights

DescriptionDescription

--force-unit-auto-pair-weights--force-unit-auto-pair-weights

DescriptionDescription

--learn-baseline--learn-baseline

DescriptionDescription

--test-baseline--test-baseline

DescriptionDescription

--delimiter--delimiter

DescriptionDescription

--has-header--has-header

DescriptionDescription

--params-file--params-file

DescriptionDescription

--nan-mode--nan-mode

DescriptionDescription

Was the article helpful?

-f, --learn-set

Description

-t, --test-set

Description

--cd, --column-description

Description

--learn-pairs

Description

--test-pairs

Description

--learn-group-weights

Description

--test-group-weights

Description

--force-unit-auto-pair-weights

Description

--learn-baseline

Description

--test-baseline

Description

--delimiter

Description

--has-header

Description

--params-file

Description

--nan-mode

Description