Missing values processing

The missing values processing mode depends on the feature type and the selected package.

Numerical features

CatBoost interprets the value of a numerical feature as a missing value if it is equal to one of the following values, which are package-dependant:

  • None

  • Floating point NaN value

  • One of the following strings when loading the values from files or as Python strings:

    , #N/A, #N/A N/A, #NA, -1.#IND, -1.#QNAN, -NaN, -nan, 1.#IND, 1.#QNAN, N/A, NA, NULL, NaN, n/a, nan, null, NAN, Na, na, Null, none, None, -

    This is an extended version of the default missing values list in pandas.

  • Floating point NaN value

  • One of the following strings when loading the values from files:

    , #N/A, #N/A N/A, #NA, -1.#IND, -1.#QNAN, -NaN, -nan, 1.#IND, 1.#QNAN, N/A, NA, NULL, NaN, n/a, nan, null, NAN, Na, na, Null, none, None, -

    This is an extended version of the default missing values list in pandas.

One of the following strings when loading the values from files when reading from an input file:

, #N/A, #N/A N/A, #NA, -1.#IND, -1.#QNAN, -NaN, -nan, 1.#IND, 1.#QNAN, N/A, NA, NULL, NaN, n/a, nan, null, NAN, Na, na, Null, none, None, -

This is an extended version of the default missing values list in pandas.

The following modes for processing missing values are supported:

  • "Forbidden" — Missing values are not supported, their presence is interpreted as an error.
  • "Min" — Missing values are processed as the minimum value (less than all other values) for the feature. It is guaranteed that a split that separates missing values from all other values is considered when selecting trees.
  • "Max" — Missing values are processed as the maximum value (greater than all other values) for the feature. It is guaranteed that a split that separates missing values from all other values is considered when selecting trees.

The default processing mode is Min. The methods for changing the default mode are package-dependant:

Categorical features

CatBoost does not process categorical features in any specific way.