compare

Draw train and evaluation metrics in Jupyter Notebook for two trained models.

Method call format

compare(model,
        data=None,
        metrics=None,
        ntree_start=0,
        ntree_end=0,
        eval_period=1,
        thread_count=-1,
        tmp_dir=None,
        log_cout=sys.stdout,
        log_cerr=sys.stderr)

Parameters

model

Description

The CatBoost model to compare with.

Possible types

CatBoost Model

Default value

Required parameter

metrics

Description

The list of metrics to be calculated.
Supported metrics
For example, if the AUC and Logloss metrics should be calculated, use the following construction:

['Logloss', 'AUC']

Possible types

list of strings

Default value

Required parameter

data

Description

A file or matrix with the input dataset, on which the compared metric values should be calculated.

Possible types

catboost.Pool

Default value

Required parameter

ntree_start

Description

To reduce the number of trees to use when the model is applied or the metrics are calculated, set the range of the tree indices to[ntree_start; ntree_end) and the eval_period parameter to k to calculate metrics on every k-th iteration.

This parameter defines the index of the first tree to be used when applying the model or calculating the metrics (the inclusive left border of the range). Indices are zero-based.

Possible types

int

Default value

ntree_end

Description

This parameter defines the index of the first tree not to be used when applying the model or calculating the metrics (the exclusive right border of the range). Indices are zero-based.

Possible types

int

Default value

0 (the index of the last tree to use equals to the number of trees in the
model minus one)

eval_period

Description

This parameter defines the step to iterate over the range [ntree_start; ntree_end). For example, let's assume that the following parameter values are set:

ntree_start is set 0
ntree_end is set to N (the total tree count)
eval_period is set to 2

In this case, the metrics are calculated for the following tree ranges: [0, 2), [0, 4), ... , [0, N)

Possible types

int

Default value

1 (the trees are applied sequentially: the first tree, then the first two
trees, etc.)

thread_count

Description

The number of threads to use.

Optimizes the speed of execution. This parameter doesn't affect results.

Possible types

int

Default value

-1 (the number of threads is equal to the number of processor cores)

tmp_dir

Description

The name of the temporary directory for intermediate results.

Possible types

String

Default value

None (the name is generated)

log_cout

Output stream or callback for logging.

Possible types

callable Python object
python object providing the write() method

Default value

sys.stdout

log_cerr

Error stream or callback for logging.

Possible types

callable Python object
python object providing the write() method

Default value

sys.stderr

Examples

from catboost import Pool, CatBoost

train_data = [[0, 3],
              [4, 1],
              [8, 1],
              [9, 1]]
train_labels = [0, 0, 1, 1]

eval_data = [[1, 3],
             [4, 2],
             [8, 2],
             [8, 3]]

eval_labels = [1, 0, 0, 1]

train_dataset = Pool(train_data, train_labels)

eval_dataset = Pool(eval_data, eval_labels)

model1 = CatBoost(params={'iterations': 100,
                          'learning_rate': 0.1})
model1.fit(train_dataset, verbose=False)

model2 = CatBoost(params={'iterations': 100,
                          'learning_rate': 0.3})
model2.fit(train_dataset, eval_set=eval_dataset, verbose=False)

model1.compare(model2, eval_dataset, ['RMSE'])

The following is a chart plotted with Jupyter Notebook for the given example.

Was the article helpful?

calc_feature_statistics

copy