Jupyter Notebook

Additional packages for data visualization support must be installed to plot charts in Jupyter Notebook.

Choose the suitable code to plot the information regarding previously launched or ongoing trainings and refer to the examples for implementation details.

  1. Add a training parameter
  2. Read data from the specified directory only using MetricVisualizer
  3. Gather and read data from all subdirectories of the specified directory using MetricVisualizer

Add a training parameter

It is possible to plot a chart while training by setting the plot parameter to “True”. This approach is applicable for the following methods:

Add a training a model using

from catboost import CatBoostClassifier, Pool

train_data = [[1, 3], [0, 4], [1, 7], [0, 3]]
train_labels = [1, 0, 1, 1]

model = CatBoostClassifier(learning_rate=0.03)

model.fit(train_data,
          train_labels,
          verbose=False,
          plot=True)

Read data from the specified directory only using MetricVisualizer

import catboost

w = catboost.MetricVisualizer('/path/to/trains/1')
w.start()

Refer to the MetricVisualizer section for details.

Gather data from the specified directory only

  1. Train a model from the root of the file system (/):
    from catboost import CatBoostClassifier
    
    cat_features = [0,1,2]
    
    train_data = [["a", "b", 1, 4, 5, 6],
                  ["a", "b", 4, 5, 6, 7],
                  ["c", "d", 30, 40, 50, 60]]
    
    train_labels = [1,1,0]
    
    model = CatBoostClassifier(iterations=20, 
                               loss_function = "CrossEntropy", 
                               train_dir = "crossentropy")
    
    model.fit(train_data, train_labels, cat_features)
    predictions = model.predict(train_data)
  2. Plot a chart using the information regarding the previous training (from the crossentropy directory):
    import catboost
    
    w = catboost.MetricVisualizer('/crossentropy/')
    w.start()

    The following is a chart plotted with Jupyter Notebook for the given example.

Gather and read data from all subdirectories of the specified directory using MetricVisualizer

import catboost

w = catboost.MetricVisualizer('/path/to/trains', subdirs=True)
w.start()

Any data located directly in the /path/to/trains folder is ignored. The information is read from its' subdirectories only.

Refer to the MetricVisualizer section for details.

Gather and read data from all subdirectories

  1. Train two models from the root of the file system (/):
    1. from catboost import CatBoostClassifier
      
      cat_features = [0,1,2]
      
      train_data = [["a", "b", 1, 4, 5, 6],
                    ["a", "b", 4, 5, 6, 7],
                    ["c", "d", 30, 40, 50, 60]]
      
      train_labels = [1,1,0]
      
      model = CatBoostClassifier(iterations=20, 
                                 loss_function = "CrossEntropy", 
                                 train_dir = "crossentropy")
      
      model.fit(train_data, train_labels, cat_features)
      predictions = model.predict(train_data)
    2. from catboost import CatBoostClassifier
      
      cat_features = [0,1,2]
      
      train_data = [["a", "b", 1, 4, 5, 6],
                    ["a", "b", 4, 5, 6, 7],
                    ["c", "d", 30, 40, 50, 60]]
      
      train_labels = [1,1,0]
      
      model = CatBoostClassifier(iterations=20, 
                                 train_dir = "logloss")
      
      model.fit(train_data, train_labels, cat_features)
      predictions = model.predict(train_data)
  2. Plot charts using the information from all subdirectories (crossentropy and logloss) of the root of the file system:
    import catboost
    
    w = catboost.MetricVisualizer('/', subdirs=True)
    w.start()

    The following is a chart plotted with Jupyter Notebook for the given example.

Perform cross-validation

Perform cross-validation on the given dataset:

from catboost import Pool, cv

cv_data = [["France", 1924, 44],
           ["USA", 1932, 37],
           ["Switzerland", 1928, 25],
           ["Norway", 1952, 30],
           ["Japan", 1972, 35],
           ["Mexico", 1968, 112]]

labels = [1, 1, 0, 0, 0, 1]

cat_features = [0]

cv_dataset = Pool(data=cv_data,
                  label=labels,
                  cat_features=cat_features)

params = {"iterations": 100,
          "depth": 2,
          "loss_function": "Logloss",
          "verbose": False}

scores = cv(cv_dataset,
            params,
            fold_count=2, 
            plot="True")

The following is a chart plotted with Jupyter Notebook for the given example.