AssembleFeatures

class AssembleFeatures.AssembleFeatures(allowImages=False, columnsToFeaturize=None, featuresCol='features', numberOfFeatures=None, oneHotEncodeCategoricals=True)[source]

Bases: mmlspark.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaEstimator

Assembles feature columns into a vector column of features.

Parameters:
  • allowImages (bool) – Allow featurization of images (default: false)
  • columnsToFeaturize (list) – Columns to featurize
  • featuresCol (str) – The name of the features column (default: features)
  • numberOfFeatures (int) – Number of features to hash string columns to
  • oneHotEncodeCategoricals (bool) – One-hot encode categoricals (default: true)
getAllowImages()[source]
Returns:Allow featurization of images (default: false)
Return type:bool
getColumnsToFeaturize()[source]
Returns:Columns to featurize
Return type:list
getFeaturesCol()[source]
Returns:The name of the features column (default: features)
Return type:str
static getJavaPackage()[source]

Returns package name String.

getNumberOfFeatures()[source]
Returns:Number of features to hash string columns to
Return type:int
getOneHotEncodeCategoricals()[source]
Returns:One-hot encode categoricals (default: true)
Return type:bool
classmethod read()[source]

Returns an MLReader instance for this class.

setAllowImages(value)[source]
Parameters:allowImages (bool) – Allow featurization of images (default: false)
setColumnsToFeaturize(value)[source]
Parameters:columnsToFeaturize (list) – Columns to featurize
setFeaturesCol(value)[source]
Parameters:featuresCol (str) – The name of the features column (default: features)
setNumberOfFeatures(value)[source]
Parameters:numberOfFeatures (int) – Number of features to hash string columns to
setOneHotEncodeCategoricals(value)[source]
Parameters:oneHotEncodeCategoricals (bool) – One-hot encode categoricals (default: true)
setParams(allowImages=False, columnsToFeaturize=None, featuresCol='features', numberOfFeatures=None, oneHotEncodeCategoricals=True)[source]

Set the (keyword only) parameters

Parameters:
  • allowImages (bool) – Allow featurization of images (default: false)
  • columnsToFeaturize (list) – Columns to featurize
  • featuresCol (str) – The name of the features column (default: features)
  • numberOfFeatures (int) – Number of features to hash string columns to
  • oneHotEncodeCategoricals (bool) – One-hot encode categoricals (default: true)
class AssembleFeatures.AssembleFeaturesModel(java_model=None)[source]

Bases: mmlspark.Utils.ComplexParamsMixin, pyspark.ml.wrapper.JavaModel, pyspark.ml.util.JavaMLWritable, pyspark.ml.util.JavaMLReadable

Model fitted by AssembleFeatures.

This class is left empty on purpose. All necessary methods are exposed through inheritance.

static getJavaPackage()[source]

Returns package name String.

classmethod read()[source]

Returns an MLReader instance for this class.