CleanMissingData¶
-
class
CleanMissingData.CleanMissingData(cleaningMode='Mean', customValue=None, inputCols=None, outputCols=None)[source]¶ Bases:
mmlspark.Utils.ComplexParamsMixin,pyspark.ml.util.JavaMLReadable,pyspark.ml.util.JavaMLWritable,pyspark.ml.wrapper.JavaEstimatorRemoves missing values from input dataset.
The following modes are supported:
- Mean - replaces missings with the mean of fit column
- Median - replaces missings with approximate median of fit column
- Custom - replaces missings with custom value specified by user
For mean and median modes, only numeric column types are supported, specifically:
- int
- long
- float
- double
For custom mode, the types above are supported and additionally:
- str
- bool
Parameters:
-
class
CleanMissingData.CleanMissingDataModel(java_model=None)[source]¶ Bases:
mmlspark.Utils.ComplexParamsMixin,pyspark.ml.wrapper.JavaModel,pyspark.ml.util.JavaMLWritable,pyspark.ml.util.JavaMLReadableModel fitted by
CleanMissingData.This class is left empty on purpose. All necessary methods are exposed through inheritance.