Missing values processing
The missing values processing mode depends on the feature type and the selected package.
Numerical features
CatBoost interprets the value of a numerical feature as a missing value if it is equal to one of the following values, which are package-dependant:
-
None
-
One of the following strings when loading the values from files or as Python strings:
,
#N/A
,#N/A N/A
,#NA
,-1.#IND
,-1.#QNAN
,-NaN
,-nan
,1.#IND
,1.#QNAN
,N/A
,NA
,NULL
,NaN
,n/a
,nan
,null
,NAN
,Na
,na
,Null
,none
,None
,-
This is an extended version of the default missing values list in pandas.
-
One of the following strings when loading the values from files:
,
#N/A
,#N/A N/A
,#NA
,-1.#IND
,-1.#QNAN
,-NaN
,-nan
,1.#IND
,1.#QNAN
,N/A
,NA
,NULL
,NaN
,n/a
,nan
,null
,NAN
,Na
,na
,Null
,none
,None
,-
This is an extended version of the default missing values list in pandas.
One of the following strings when loading the values from files when reading from an input file:
, #N/A
, #N/A N/A
, #NA
, -1.#IND
, -1.#QNAN
, -NaN
, -nan
, 1.#IND
, 1.#QNAN
, N/A
, NA
, NULL
, NaN
, n/a
, nan
, null
, NAN
, Na
, na
, Null
, none
, None
, -
This is an extended version of the default missing values list in pandas.
The following modes for processing missing values are supported:
- "Forbidden" — Missing values are not supported, their presence is interpreted as an error.
- "Min" — Missing values are processed as the minimum value (less than all other values) for the feature. It is guaranteed that a split that separates missing values from all other values is considered when selecting trees.
- "Max" — Missing values are processed as the maximum value (greater than all other values) for the feature. It is guaranteed that a split that separates missing values from all other values is considered when selecting trees.
The default processing mode is Min. The methods for changing the default mode are package-dependant:
- Globally for all features in the
nan_mode
training parameter. - Individually for each feature in the Custom quantization borders and missing value modes input file. Such values override the global default setting.
- Globally for all features in the
nan_mode
training parameter. - Individually for each feature in the Custom quantization borders and missing value modes input file. Such values override the global default setting.
- Globally for all features in the
--nan-mode
training parameter. - Individually for each feature in the Custom quantization borders and missing value modes input file. Such values override the global default setting.
Categorical features
CatBoost does not process categorical features in any specific way.