compare
Draw train and evaluation metrics in Jupyter Notebook for two trained models.
Method call format
compare(model,
data=None,
metrics=None,
ntree_start=0,
ntree_end=0,
eval_period=1,
thread_count=-1,
tmp_dir=None,
log_cout=sys.stdout,
log_cerr=sys.stderr)
Parameters
model
Description
The CatBoost model to compare with.
Possible types
CatBoost Model
Default value
Required parameter
metrics
Description
The list of metrics to be calculated.
Supported metrics
For example, if the AUC and Logloss metrics should be calculated, use the following construction:
['Logloss', 'AUC']
Possible types
list of strings
Default value
Required parameter
data
Description
A file or matrix with the input dataset, on which the compared metric values should be calculated.
Possible types
catboost.Pool
Default value
Required parameter
ntree_start
Description
To reduce the number of trees to use when the model is applied or the metrics are calculated, set the range of the tree indices to[ntree_start; ntree_end)
and the eval_period
parameter to k to calculate metrics on every k-th iteration.
This parameter defines the index of the first tree to be used when applying the model or calculating the metrics (the inclusive left border of the range). Indices are zero-based.
Possible types
int
Default value
0
ntree_end
Description
To reduce the number of trees to use when the model is applied or the metrics are calculated, set the range of the tree indices to[ntree_start; ntree_end)
and the eval_period
parameter to k to calculate metrics on every k-th iteration.
This parameter defines the index of the first tree not to be used when applying the model or calculating the metrics (the exclusive right border of the range). Indices are zero-based.
Possible types
int
Default value
0 (the index of the last tree to use equals to the number of trees in the
model minus one)
eval_period
Description
To reduce the number of trees to use when the model is applied or the metrics are calculated, set the range of the tree indices to[ntree_start; ntree_end)
and the eval_period
parameter to k to calculate metrics on every k-th iteration.
This parameter defines the step to iterate over the range [
ntree_start;
ntree_end)
. For example, let's assume that the following parameter values are set:
ntree_start
is set 0ntree_end
is set to N (the total tree count)eval_period
is set to 2
In this case, the metrics are calculated for the following tree ranges: [0, 2)
, [0, 4)
, ... , [0, N)
Possible types
int
Default value
1 (the trees are applied sequentially: the first tree, then the first two
trees, etc.)
thread_count
Description
The number of threads to use.
Optimizes the speed of execution. This parameter doesn't affect results.
Possible types
int
Default value
-1 (the number of threads is equal to the number of processor cores)
tmp_dir
Description
The name of the temporary directory for intermediate results.
Possible types
String
Default value
None (the name is generated)
log_cout
Output stream or callback for logging.
Possible types
- callable Python object
- python object providing the
write()
method
Default value
sys.stdout
log_cerr
Error stream or callback for logging.
Possible types
- callable Python object
- python object providing the
write()
method
Default value
sys.stderr
Examples
from catboost import Pool, CatBoost
train_data = [[0, 3],
[4, 1],
[8, 1],
[9, 1]]
train_labels = [0, 0, 1, 1]
eval_data = [[1, 3],
[4, 2],
[8, 2],
[8, 3]]
eval_labels = [1, 0, 0, 1]
train_dataset = Pool(train_data, train_labels)
eval_dataset = Pool(eval_data, eval_labels)
model1 = CatBoost(params={'iterations': 100,
'learning_rate': 0.1})
model1.fit(train_dataset, verbose=False)
model2 = CatBoost(params={'iterations': 100,
'learning_rate': 0.3})
model2.fit(train_dataset, eval_set=eval_dataset, verbose=False)
model1.compare(model2, eval_dataset, ['RMSE'])
The following is a chart plotted with Jupyter Notebook for the given example.