EnsembleByKey

class EnsembleByKey.EnsembleByKey(colNames=None, collapseGroup=True, cols=None, keys=None, strategy='mean', vectorDims=None)[source]

Bases: mmlspark.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

The EnsembleByKey first performs a grouping operation on a set of keys, and then averages the selected columns. It can handle scalar or vector columns, and the dimensions of the vector columns are automatically inferred by materializing the first row of the column. To avoid materialization you can provide the vector dimensions through the setVectorDims function, which takes a mapping from columns (String) to dimension (Int). You can also choose to squash or keep the original dataset with the collapseGroup parameter.

Parameters:
  • colNames (list) – Names of the result of each col
  • collapseGroup (bool) – Whether to collapse all items in group to one entry (default: true)
  • cols (list) – Cols to ensemble
  • keys (list) – Keys to group by
  • strategy (str) – How to ensemble the scores, ex: mean (default: mean)
  • vectorDims (dict) – the dimensions of any vector columns, used to avoid materialization
getColNames()[source]
Returns:Names of the result of each col
Return type:list
getCollapseGroup()[source]
Returns:Whether to collapse all items in group to one entry (default: true)
Return type:bool
getCols()[source]
Returns:Cols to ensemble
Return type:list
static getJavaPackage()[source]

Returns package name String.

getKeys()[source]
Returns:Keys to group by
Return type:list
getStrategy()[source]
Returns:How to ensemble the scores, ex: mean (default: mean)
Return type:str
getVectorDims()[source]
Returns:the dimensions of any vector columns, used to avoid materialization
Return type:dict
classmethod read()[source]

Returns an MLReader instance for this class.

setColNames(value)[source]
Parameters:colNames (list) – Names of the result of each col
setCollapseGroup(value)[source]
Parameters:collapseGroup (bool) – Whether to collapse all items in group to one entry (default: true)
setCols(value)[source]
Parameters:cols (list) – Cols to ensemble
setKeys(value)[source]
Parameters:keys (list) – Keys to group by
setParams(colNames=None, collapseGroup=True, cols=None, keys=None, strategy='mean', vectorDims=None)[source]

Set the (keyword only) parameters

Parameters:
  • colNames (list) – Names of the result of each col
  • collapseGroup (bool) – Whether to collapse all items in group to one entry (default: true)
  • cols (list) – Cols to ensemble
  • keys (list) – Keys to group by
  • strategy (str) – How to ensemble the scores, ex: mean (default: mean)
  • vectorDims (dict) – the dimensions of any vector columns, used to avoid materialization
setStrategy(value)[source]
Parameters:strategy (str) – How to ensemble the scores, ex: mean (default: mean)
setVectorDims(value)[source]
Parameters:vectorDims (dict) – the dimensions of any vector columns, used to avoid materialization