EnsembleByKey¶
-
class
EnsembleByKey.
EnsembleByKey
(colNames=None, collapseGroup=True, cols=None, keys=None, strategy='mean', vectorDims=None)[source]¶ Bases:
mmlspark.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaTransformer
The
EnsembleByKey
first performs a grouping operation on a set of keys, and then averages the selected columns. It can handle scalar or vector columns, and the dimensions of the vector columns are automatically inferred by materializing the first row of the column. To avoid materialization you can provide the vector dimensions through thesetVectorDims
function, which takes a mapping from columns (String) to dimension (Int). You can also choose to squash or keep the original dataset with thecollapseGroup
parameter.Parameters: - colNames (list) – Names of the result of each col
- collapseGroup (bool) – Whether to collapse all items in group to one entry (default: true)
- cols (list) – Cols to ensemble
- keys (list) – Keys to group by
- strategy (str) – How to ensemble the scores, ex: mean (default: mean)
- vectorDims (dict) – the dimensions of any vector columns, used to avoid materialization
-
getCollapseGroup
()[source]¶ Returns: Whether to collapse all items in group to one entry (default: true) Return type: bool
-
getVectorDims
()[source]¶ Returns: the dimensions of any vector columns, used to avoid materialization Return type: dict
-
setCollapseGroup
(value)[source]¶ Parameters: collapseGroup (bool) – Whether to collapse all items in group to one entry (default: true)
-
setParams
(colNames=None, collapseGroup=True, cols=None, keys=None, strategy='mean', vectorDims=None)[source]¶ Set the (keyword only) parameters
Parameters: - colNames (list) – Names of the result of each col
- collapseGroup (bool) – Whether to collapse all items in group to one entry (default: true)
- cols (list) – Cols to ensemble
- keys (list) – Keys to group by
- strategy (str) – How to ensemble the scores, ex: mean (default: mean)
- vectorDims (dict) – the dimensions of any vector columns, used to avoid materialization