SummarizeData¶
-
class
SummarizeData.
SummarizeData
(basic=True, counts=True, errorThreshold=0.0, percentiles=True, sample=True)[source]¶ Bases:
mmlspark.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaTransformer
Compute summary statistics for the dataset.
Statistics to be computed:
- counts
- basic
- sample
- percentiles
errorThreshold (default 0.0) is the error threshold for quantiles.
Parameters: - basic (bool) – Compute basic statistics (default: true)
- counts (bool) – Compute count statistics (default: true)
- errorThreshold (double) – Threshold for quantiles - 0 is exact (default: 0.0)
- percentiles (bool) – Compute percentiles (default: true)
- sample (bool) – Compute sample statistics (default: true)
-
getErrorThreshold
()[source]¶ Returns: Threshold for quantiles - 0 is exact (default: 0.0) Return type: double
-
setErrorThreshold
(value)[source]¶ Parameters: errorThreshold (double) – Threshold for quantiles - 0 is exact (default: 0.0)
-
setParams
(basic=True, counts=True, errorThreshold=0.0, percentiles=True, sample=True)[source]¶ Set the (keyword only) parameters
Parameters: - basic (bool) – Compute basic statistics (default: true)
- counts (bool) – Compute count statistics (default: true)
- errorThreshold (double) – Threshold for quantiles - 0 is exact (default: 0.0)
- percentiles (bool) – Compute percentiles (default: true)
- sample (bool) – Compute sample statistics (default: true)