SummarizeData

class SummarizeData.SummarizeData(basic=True, counts=True, errorThreshold=0.0, percentiles=True, sample=True)[source]

Bases: mmlspark.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Compute summary statistics for the dataset.

Statistics to be computed:

  • counts
  • basic
  • sample
  • percentiles

errorThreshold (default 0.0) is the error threshold for quantiles.

Parameters:
  • basic (bool) – Compute basic statistics (default: true)
  • counts (bool) – Compute count statistics (default: true)
  • errorThreshold (double) – Threshold for quantiles - 0 is exact (default: 0.0)
  • percentiles (bool) – Compute percentiles (default: true)
  • sample (bool) – Compute sample statistics (default: true)
getBasic()[source]
Returns:Compute basic statistics (default: true)
Return type:bool
getCounts()[source]
Returns:Compute count statistics (default: true)
Return type:bool
getErrorThreshold()[source]
Returns:Threshold for quantiles - 0 is exact (default: 0.0)
Return type:double
static getJavaPackage()[source]

Returns package name String.

getPercentiles()[source]
Returns:Compute percentiles (default: true)
Return type:bool
getSample()[source]
Returns:Compute sample statistics (default: true)
Return type:bool
classmethod read()[source]

Returns an MLReader instance for this class.

setBasic(value)[source]
Parameters:basic (bool) – Compute basic statistics (default: true)
setCounts(value)[source]
Parameters:counts (bool) – Compute count statistics (default: true)
setErrorThreshold(value)[source]
Parameters:errorThreshold (double) – Threshold for quantiles - 0 is exact (default: 0.0)
setParams(basic=True, counts=True, errorThreshold=0.0, percentiles=True, sample=True)[source]

Set the (keyword only) parameters

Parameters:
  • basic (bool) – Compute basic statistics (default: true)
  • counts (bool) – Compute count statistics (default: true)
  • errorThreshold (double) – Threshold for quantiles - 0 is exact (default: 0.0)
  • percentiles (bool) – Compute percentiles (default: true)
  • sample (bool) – Compute sample statistics (default: true)
setPercentiles(value)[source]
Parameters:percentiles (bool) – Compute percentiles (default: true)
setSample(value)[source]
Parameters:sample (bool) – Compute sample statistics (default: true)