QuantizationParams

class catboost_spark.QuantizationParams(borderCount=None, featureBorderType=None, ignoredFeaturesIndices=None, ignoredFeaturesNames=None, inputBorders=None, nanMode=None, perFloatFeatureQuantizaton=None, threadCount=None)[source]

Bases: pyspark.ml.wrapper.JavaParams

Parameters
borderCountint

The number of splits for numerical features. Allowed values are integers from 1 to 65535 inclusively. Default value is 254.

featureBorderTypeEBorderSelectionType

The quantization mode for numerical features. See documentation for details. Default value is ‘GreedyLogSum’

ignoredFeaturesIndiceslist

Feature indices to exclude from the training

ignoredFeaturesNameslist

Feature names to exclude from the training

inputBordersstr

Load Custom quantization borders and missing value modes from a file (do not generate them)

nanModeENanMode

The method for processing missing values in the input dataset. See documentation for details. Default value is ‘Min’

perFloatFeatureQuantizatonlist

The quantization description for the given list of features (one or more).Description format for a single feature: FeatureId[:border_count=BorderCount][:nan_mode=BorderType][:border_type=border_selection_method]

threadCountint

Number of CPU threads in parallel operations on client

Methods Summary

getBorderCount()

Returns

getFeatureBorderType()

Returns

getIgnoredFeaturesIndices()

Returns

getIgnoredFeaturesNames()

Returns

getInputBorders()

Returns

getNanMode()

Returns

getPerFloatFeatureQuantizaton()

Returns

getThreadCount()

Returns

setBorderCount(value)

Parameters

setFeatureBorderType(value)

Parameters

setIgnoredFeaturesIndices(value)

Parameters

setIgnoredFeaturesNames(value)

Parameters

setInputBorders(value)

Parameters

setNanMode(value)

Parameters

setParams([borderCount, featureBorderType, …])

Set the (keyword only) parameters

setPerFloatFeatureQuantizaton(value)

Parameters

setThreadCount(value)

Parameters

Methods Documentation

getBorderCount()[source]
Returns
int

The number of splits for numerical features. Allowed values are integers from 1 to 65535 inclusively. Default value is 254.

getFeatureBorderType()[source]
Returns
EBorderSelectionType

The quantization mode for numerical features. See documentation for details. Default value is ‘GreedyLogSum’

getIgnoredFeaturesIndices()[source]
Returns
list

Feature indices to exclude from the training

getIgnoredFeaturesNames()[source]
Returns
list

Feature names to exclude from the training

getInputBorders()[source]
Returns
str

Load Custom quantization borders and missing value modes from a file (do not generate them)

getNanMode()[source]
Returns
ENanMode

The method for processing missing values in the input dataset. See documentation for details. Default value is ‘Min’

getPerFloatFeatureQuantizaton()[source]
Returns
list

The quantization description for the given list of features (one or more).Description format for a single feature: FeatureId[:border_count=BorderCount][:nan_mode=BorderType][:border_type=border_selection_method]

getThreadCount()[source]
Returns
int

Number of CPU threads in parallel operations on client

setBorderCount(value)[source]
Parameters
valueint

The number of splits for numerical features. Allowed values are integers from 1 to 65535 inclusively. Default value is 254.

setFeatureBorderType(value)[source]
Parameters
valueEBorderSelectionType

The quantization mode for numerical features. See documentation for details. Default value is ‘GreedyLogSum’

setIgnoredFeaturesIndices(value)[source]
Parameters
valuelist

Feature indices to exclude from the training

setIgnoredFeaturesNames(value)[source]
Parameters
valuelist

Feature names to exclude from the training

setInputBorders(value)[source]
Parameters
valuestr

Load Custom quantization borders and missing value modes from a file (do not generate them)

setNanMode(value)[source]
Parameters
valueENanMode

The method for processing missing values in the input dataset. See documentation for details. Default value is ‘Min’

setParams(borderCount=None, featureBorderType=None, ignoredFeaturesIndices=None, ignoredFeaturesNames=None, inputBorders=None, nanMode=None, perFloatFeatureQuantizaton=None, threadCount=None)[source]

Set the (keyword only) parameters

Parameters
borderCountint

The number of splits for numerical features. Allowed values are integers from 1 to 65535 inclusively. Default value is 254.

featureBorderTypeEBorderSelectionType

The quantization mode for numerical features. See documentation for details. Default value is ‘GreedyLogSum’

ignoredFeaturesIndiceslist

Feature indices to exclude from the training

ignoredFeaturesNameslist

Feature names to exclude from the training

inputBordersstr

Load Custom quantization borders and missing value modes from a file (do not generate them)

nanModeENanMode

The method for processing missing values in the input dataset. See documentation for details. Default value is ‘Min’

perFloatFeatureQuantizatonlist

The quantization description for the given list of features (one or more).Description format for a single feature: FeatureId[:border_count=BorderCount][:nan_mode=BorderType][:border_type=border_selection_method]

threadCountint

Number of CPU threads in parallel operations on client

setPerFloatFeatureQuantizaton(value)[source]
Parameters
valuelist

The quantization description for the given list of features (one or more).Description format for a single feature: FeatureId[:border_count=BorderCount][:nan_mode=BorderType][:border_type=border_selection_method]

setThreadCount(value)[source]
Parameters
valueint

Number of CPU threads in parallel operations on client