create_cd
Generate the columns description file with the given structure.
Method call format
create_cd(label=None,
cat_features=None,
text_features=None,
embedding_features=None,
weight=None,
baseline=None,
doc_id=None,
group_id=None,
subgroup_id=None,
timestamp=None,
auxiliary_columns=None,
feature_names=None,
output_path='train.cd')
Parameters
label
Description
A zero-based index of the column that defines the target variable (in other words, the object's label value).
Possible types
int
Default value
None
cat_features
Description
Zero-based indices of columns that define categorical features.
Possible types
- int
- list of int
Default value
None
text_features
Description
Zero-based indices of columns that define text features.
Possible types
- int
- list of int
Default value
None
embedding_features
Description
Zero-based indices of columns that define embedding features.
Possible types
- int
- list of int
Default value
None
weight
Description
A zero-based index of the column that defines the object's weight.
Possible types
int
Default value
None
baseline
Description
A zero-based index of the column that defines the initial formula values for all input objects.
Possible types
int
Default value
None
doc_id
Description
A zero-based index of the column that defines the alphanumeric ID of the object.
Possible types
int
Default value
None
group_id
Description
A zero-based index of the column that defines the identifier of the object's group.
Possible types
int
Default value
None
subgroup_id
Description
A zero-based index of the column that defines the identifier of the object's subgroup.
Possible types
int
Default value
None
timestamp
Description
A zero-based index of the column that defines the timestamp of the object.
Possible types
int
Default value
None
auxiliary_columns
Description
Zero-based indices of columns that define arbitrary data.
Possible types
- int
- list of int
Default value
None
feature_names
Description
A dictionary with the list of column indices and the corresponding feature names.
Possible types
dict
For example, use the feature_names
dictionary to set the names of features in the columns indexed as 4, 5 and 12:
feature_names = {
4: 'Categ1',
5: 'Categ2',
12: 'Num1'
}
Default value
None
output_path
Description
The path to the output file with columns description.
Possible types
string
Default value
train.cd
Note
A parameter for creating columns of the Num type is not provided, because columns that contain numerical features don't require descriptions.
Usage examples
from catboost.utils import create_cd
feature_names = {
4: 'Categ1',
5: 'Categ2',
12: 'Num1'
}
create_cd(
label=0,
cat_features=(4, 5, 6),
weight=1,
baseline=2,
doc_id=3,
group_id=7,
subgroup_id=8,
timestamp=9,
auxiliary_columns=(10, 11),
feature_names=feature_names,
output_path='train.cd'
)