Pool

class Pool(data, 
           label=None,
           cat_features=None,
           column_description=None,
           pairs=None,
           delimiter='\t',
           has_header=False,
           weight=None, 
           group_id=None,
           group_weight=None,
           subgroup_id=None,
           pairs_weight=None
           baseline=None,
           feature_names=None,
           thread_count=-1)

Purpose

Dataset processing.

The fastest way to pass the features data to the Pool constructor (and other CatBoost, CatBoostClassifier, CatBoostRegressor methods that accept it) if most (or all) of your features are numerical is to pass it using FeaturesData class. Another way to get similar performance with datasets that contain numerical features only is to pass features data as numpy.ndarray with numpy.float32 dtype.

Parameters

Parameter Possible types Description Default value
data
  • list
  • numpy.array
  • pandas.DataFrame
  • pandas.Series

Dataset in the form of a two-dimensional feature matrix.

Required parameter
catboost.FeaturesData