CleanMissingData

class CleanMissingData.CleanMissingData(cleaningMode='Mean', customValue=None, inputCols=None, outputCols=None)[source]

Bases: mmlspark.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaEstimator

Removes missing values from input dataset.

The following modes are supported:

  • Mean - replaces missings with the mean of fit column
  • Median - replaces missings with approximate median of fit column
  • Custom - replaces missings with custom value specified by user

For mean and median modes, only numeric column types are supported, specifically:

  • int
  • long
  • float
  • double

For custom mode, the types above are supported and additionally:

  • str
  • bool
Parameters:
  • cleaningMode (str) – Cleaning mode (default: Mean)
  • customValue (str) – Custom value for replacement
  • inputCols (list) – The names of the input columns
  • outputCols (list) – The names of the output columns
getCleaningMode()[source]
Returns:Cleaning mode (default: Mean)
Return type:str
getCustomValue()[source]
Returns:Custom value for replacement
Return type:str
getInputCols()[source]
Returns:The names of the input columns
Return type:list
static getJavaPackage()[source]

Returns package name String.

getOutputCols()[source]
Returns:The names of the output columns
Return type:list
classmethod read()[source]

Returns an MLReader instance for this class.

setCleaningMode(value)[source]
Parameters:cleaningMode (str) – Cleaning mode (default: Mean)
setCustomValue(value)[source]
Parameters:customValue (str) – Custom value for replacement
setInputCols(value)[source]
Parameters:inputCols (list) – The names of the input columns
setOutputCols(value)[source]
Parameters:outputCols (list) – The names of the output columns
setParams(cleaningMode='Mean', customValue=None, inputCols=None, outputCols=None)[source]

Set the (keyword only) parameters

Parameters:
  • cleaningMode (str) – Cleaning mode (default: Mean)
  • customValue (str) – Custom value for replacement
  • inputCols (list) – The names of the input columns
  • outputCols (list) – The names of the output columns
class CleanMissingData.CleanMissingDataModel(java_model=None)[source]

Bases: mmlspark.Utils.ComplexParamsMixin, pyspark.ml.wrapper.JavaModel, pyspark.ml.util.JavaMLWritable, pyspark.ml.util.JavaMLReadable

Model fitted by CleanMissingData.

This class is left empty on purpose. All necessary methods are exposed through inheritance.

static getJavaPackage()[source]

Returns package name String.

classmethod read()[source]

Returns an MLReader instance for this class.