Missing values processing

The missing values processing mode depends on the feature type.

Numerical features

CatBoost interprets the value of a numerical feature as a missing value if it is equal to one of the following values:

  • None

  • One of the following strings when loading the values from files or as Python strings:

    “”, “#N/A”, “#N/A N/A”, “#NA”, “-1.#IND”, “-1.#QNAN”, “-NaN”, “-nan”, “1.#IND”, “1.#QNAN”, “N/A”, “NA”, “NULL”, “NaN”, “n/a”, “nan”, “null”, “NAN”, “Na”, “na”, “Null”, “none”, “None”, “-”

    This is an extended version of the default missing values list in pandas.

The following modes for processing missing values are supported:
  • Forbidden — Missing values are not supported, their presence is interpreted as an error.
  • Min — Missing values are processed as the minimum value (less than all other values) for the feature. It is guaranteed that a split that separates missing values from all other values is considered when selecting trees.
  • Max — Missing values are processed as the maximum value (greater than all other values) for the feature. It is guaranteed that a split that separates missing values from all other values is considered when selecting trees.

The default processing mode is Min. The following methods for changing the default mode are provided:

Categorical features
CatBoost does not process categorical features in any specific way.