EnsembleByKey¶
-
class
EnsembleByKey.EnsembleByKey(colNames=None, collapseGroup=True, cols=None, keys=None, strategy='mean', vectorDims=None)[source]¶ Bases:
mmlspark.Utils.ComplexParamsMixin,pyspark.ml.util.JavaMLReadable,pyspark.ml.util.JavaMLWritable,pyspark.ml.wrapper.JavaTransformerThe
EnsembleByKeyfirst performs a grouping operation on a set of keys, and then averages the selected columns. It can handle scalar or vector columns, and the dimensions of the vector columns are automatically inferred by materializing the first row of the column. To avoid materialization you can provide the vector dimensions through thesetVectorDimsfunction, which takes a mapping from columns (String) to dimension (Int). You can also choose to squash or keep the original dataset with thecollapseGroupparameter.Parameters: - colNames (list) – Names of the result of each col
- collapseGroup (bool) – Whether to collapse all items in group to one entry (default: true)
- cols (list) – Cols to ensemble
- keys (list) – Keys to group by
- strategy (str) – How to ensemble the scores, ex: mean (default: mean)
- vectorDims (dict) – the dimensions of any vector columns, used to avoid materialization
-
getCollapseGroup()[source]¶ Returns: Whether to collapse all items in group to one entry (default: true) Return type: bool
-
getVectorDims()[source]¶ Returns: the dimensions of any vector columns, used to avoid materialization Return type: dict
-
setCollapseGroup(value)[source]¶ Parameters: collapseGroup (bool) – Whether to collapse all items in group to one entry (default: true)
-
setParams(colNames=None, collapseGroup=True, cols=None, keys=None, strategy='mean', vectorDims=None)[source]¶ Set the (keyword only) parameters
Parameters: - colNames (list) – Names of the result of each col
- collapseGroup (bool) – Whether to collapse all items in group to one entry (default: true)
- cols (list) – Cols to ensemble
- keys (list) – Keys to group by
- strategy (str) – How to ensemble the scores, ex: mean (default: mean)
- vectorDims (dict) – the dimensions of any vector columns, used to avoid materialization