Usage examples

Train and apply a classification model

Train a classification model with default parameters in silent mode and then calculate model predictions on a custom dataset. The output contains the evaluated class1 probability:

catboost fit --learn-set train.tsv --test-set test.tsv --column-description train.cd  --loss-function Logloss

catboost calc -m model.bin --input-path custom_data --cd train.cd -o custom_data.eval -T 4 --prediction-type Probability

Train a regression model on a CSV file with header

Train a model with 100 trees on a comma-separated pool with header:

catboost fit --learn-set train.csv --test-set test.csv --column-description train.cd  --loss-function RMSE --iterations 100 --delimiter=',' --has-header

Train a classification model in verbose mode with multiple error functions

The Verbose logging level mode allows to output additional calculations while learning, such as current learn error or current plus best error on test error. Remaining and elapsed time are also displayed.

The --custom-metric parameter allows to log additional error functions on learn and test for each iteration.

catboost fit --learn-set train --test-set test --column-description train.cd  --loss-function Logloss --custom-loss="AUC,Precision,Recall" -i 4 --logging-level Verbose

Example test_error.tsv result:

iter    Logloss         AUC             Precision       Recall
0       0.6638384193    0.8759125663    0.8537374221    0.9592193809
1       0.6350880554    0.8840660536    0.8565563873    0.9547779273
2       0.6098460477    0.8914710667    0.8609022556    0.9554508748
3       0.5834954183    0.8954216255    0.8608579414    0.9534320323

Train a classification model with a preferred memory limit

Ctr computation on large pools can lead to out of memory problems. In this case it is possible to give Catboost a hint about available memory:

catboost fit --learn-set train.tsv --test-set test.tsv --column-description train.cd  --loss-function Logloss --used-ram-limit 4GB

Train a model on GPU

Train a classification model on GPU:

catboost fit --learn-set ../pytest/data/adult/train_small --column-description ../pytest/data/adult/train.cd --task-type GPU

Random subspace method

To enable random subspace method for feature bagging use the --rsm parameter:

catboost fit --learn-set train.tsv --test-set test.tsv --column-description train.cd  --loss-function Logloss --rsm 0.5

Calculate the object importances

To calculate the object importances:

Train the model:

catboost fit --loss-function Logloss -f train.tsv -t test.tsv --column-description train.cd

Calculate the object importances using the trained model:

catboost ostr -f train.tsv -t test.tsv --column-description train.cd -o object_importances.tsv

Was the article helpful?

Distributed learning

Overview