grid_search

Method call format
Parameters
Return value
Examples

A simple grid search over specified parameter values for a model.

Note

After searching, the model is trained and ready to use.

Method call format

grid_search(param_grid,
            X,
            y=None,
            cv=3,
            partition_random_seed=0,
            calc_cv_statistics=True,
            search_by_train_test_split=True,
            refit=True,
            shuffle=True,
            stratified=None,
            train_size=0.8,
            verbose=True,
            plot=False,
            log_cout=sys.stdout,
            log_cerr=sys.stderr)

Dictionary with parameters names (string) as keys and lists of parameter settings to try as values, or a list of such dictionaries, in which case the grids spanned by each dictionary in the list are explored.

This enables searching over any sequence of parameter settings.

Possible types

dict
list

Default value

Required parameter

X

Description

The description is different for each group of possible types.

Possible types

catboost.Pool

The input training dataset.

Note

If a nontrivial value of the cat_features parameter is specified in the constructor of this class, CatBoost checks the equivalence of categorical features indices specification from the constructor parameters and in this Pool class.

numpy.ndarray, pandas.DataFrame

The input training dataset in the form of a two-dimensional feature matrix.

pandas.SparseDataFrame, scipy.sparse.spmatrix (all subclasses except dia_matrix)

The input training dataset in the form of a two-dimensional sparse feature matrix.

Possible types

The input training dataset in the form of a two-dimensional sparse feature matrix.

Default value

Required parameter

y

Description

The target variables (in other words, the objects' label values) for the training dataset.

Must be in the form of a one- or two- dimensional array. The type of data in the array depends on the machine learning task being solved:

Regression — One-dimensional array of numeric values.
Multiregression - Two-dimensional array of numeric values. The first index is for a dimension, the second index is for an object.

Note

Do not use this parameter if the input training dataset (specified in the X parameter) type is catboost.Pool.

Possible types

list
numpy.ndarray
pandas.DataFrame
pandas.Series

Default value

None

Supported processing units

CPU and GPU

cv

Description

The cross-validation splitting strategy.

The interpretation of this parameter depends on the input data type:

None — Use the default three-fold cross-validation.
int — The number of folds in a (Stratified)KFold
object — One of the scikit-learn Splitter Classes with the split method.
An iterable yielding train and test splits as arrays of indices.

Possible types

int
scikit-learn splitter object
cross-validation generator
iterable

Default value

None

partition_random_seed

Description

Use this as the seed value for random permutation of the data.

Description

The purpose of this parameter depends on the type of the given value:

int — The frequency of iterations to print the information to stdout.
bool — Print the information to stdout on every iteration (if set to True) or disable any logging (if set to False).

Possible types

bool
int

Default value

True

plot

Description

Draw train and evaluation metrics for every set of parameters in Jupyter Jupyter Notebook.

Possible types

bool

Default value

False

log_cout

Output stream or callback for logging.

Possible types

callable Python object
python object providing the write() method

Default value

sys.stdout

log_cerr

Error stream or callback for logging.

Possible types

callable Python object
python object providing the write() method

Default value

sys.stderr

Return value

Dict with two fields:

params — dict of best-found parameters.
cv_results — dict or pandas.core.frame.DataFrame with cross-validation results. Сolumns are: test-error-mean, test-error-std, train-error-mean, train-error-std.

Examples

from catboost import CatBoostRegressor
import numpy as np

train_data = np.random.randint(1, 100, size=(100, 10))
train_labels = np.random.randint(2, size=(100))

model = CatBoostRegressor()

grid = {'learning_rate': [0.03, 0.1],
        'depth': [4, 6, 10],
        'l2_leaf_reg': [1, 3, 5, 7, 9]}

grid_search_result = model.grid_search(grid,
                                       X=train_data,
                                       y=train_labels,
                                       plot=True)

The following is a chart plotted with Jupyter Notebook for the given example.

grid_search

Method call formatMethod call format

ParametersParameters

param_gridparam_grid

DescriptionDescription

XX

DescriptionDescription

yy

DescriptionDescription

cvcv

DescriptionDescription

partition_random_seedpartition_random_seed

DescriptionDescription

calc_cv_statisticscalc_cv_statistics

DescriptionDescription

search_by_train_test_splitsearch_by_train_test_split

DescriptionDescription

refitrefit

DescriptionDescription

shuffleshuffle

DescriptionDescription

stratifiedstratified

DescriptionDescription

train_sizetrain_size

DescriptionDescription

verboseverbose

DescriptionDescription

plotplot

DescriptionDescription

log_coutlog_cout

log_cerrlog_cerr

Return valueReturn value

ExamplesExamples

Was the article helpful?

Method call format

Parameters

param_grid

Description

X

Description

y

Description

cv

Description

partition_random_seed

Description

calc_cv_statistics

Description

search_by_train_test_split

Description

refit

Description

shuffle

Description

stratified

Description

train_size

Description

verbose

Description

plot

Description

log_cout

log_cerr

Return value

Examples