catboost.save_pool

catboost.save_pool(data, 
                   label = NULL, 
                   weight = NULL, 
                   baseline = NULL, 
                   pool_path = "data.pool", 
                   cd_path = "cd.pool")

Purpose

Save the dataset to the CatBoost format. Files with the following data are created:

Use the catboost.load_pool function to read the resulting files. These files can also be used in the  Command-line version and the Python package.

Arguments

Argument Description Default value
data

A data.frame or matrix with features.

The following column types are supported:
  • double
  • factor. It is assumed that categorical features are given in this type of columns. A standard CatBoost processing procedure is applied to this type of columns:
    1. The values are converted to strings.
    2. The ConvertCatFeatureToFloat function is applied to the resulting string.
Required argument
label

The target variables (in other words, the objects' label values) of the dataset.

NULL
weight The weights of objects. NULL
baseline

A vector of formula values for all input objects. The training starts from these values for all input objects instead of starting from zero.

NULL
pool_path

The path to the output file that contains the dataset description.

data.pool
cd_path The path to the output file that contains the columns description. cd.pool
Argument Description Default value
data

A data.frame or matrix with features.

The following column types are supported:
  • double
  • factor. It is assumed that categorical features are given in this type of columns. A standard CatBoost processing procedure is applied to this type of columns:
    1. The values are converted to strings.
    2. The ConvertCatFeatureToFloat function is applied to the resulting string.
Required argument
label

The target variables (in other words, the objects' label values) of the dataset.

NULL
weight The weights of objects. NULL
baseline

A vector of formula values for all input objects. The training starts from these values for all input objects instead of starting from zero.

NULL
pool_path

The path to the output file that contains the dataset description.

data.pool
cd_path The path to the output file that contains the columns description. cd.pool