Featurize

class Featurize.Featurize(allowImages=False, featureColumns=None, numberOfFeatures=262144, oneHotEncodeCategoricals=True)[source]

Bases: mmlspark.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaEstimator

Featurizes a dataset. Converts the specified columns to feature columns.

Parameters:
  • allowImages (bool) – Allow featurization of images (default: false)
  • featureColumns (dict) – Feature columns
  • numberOfFeatures (int) – Number of features to hash string columns to (default: 262144)
  • oneHotEncodeCategoricals (bool) – One-hot encode categoricals (default: true)
getAllowImages()[source]
Returns:Allow featurization of images (default: false)
Return type:bool
getFeatureColumns()[source]
Returns:Feature columns
Return type:dict
static getJavaPackage()[source]

Returns package name String.

getNumberOfFeatures()[source]
Returns:Number of features to hash string columns to (default: 262144)
Return type:int
getOneHotEncodeCategoricals()[source]
Returns:One-hot encode categoricals (default: true)
Return type:bool
classmethod read()[source]

Returns an MLReader instance for this class.

setAllowImages(value)[source]
Parameters:allowImages (bool) – Allow featurization of images (default: false)
setFeatureColumns(value)[source]
Parameters:featureColumns (dict) – Feature columns
setNumberOfFeatures(value)[source]
Parameters:numberOfFeatures (int) – Number of features to hash string columns to (default: 262144)
setOneHotEncodeCategoricals(value)[source]
Parameters:oneHotEncodeCategoricals (bool) – One-hot encode categoricals (default: true)
setParams(allowImages=False, featureColumns=None, numberOfFeatures=262144, oneHotEncodeCategoricals=True)[source]

Set the (keyword only) parameters

Parameters:
  • allowImages (bool) – Allow featurization of images (default: false)
  • featureColumns (dict) – Feature columns
  • numberOfFeatures (int) – Number of features to hash string columns to (default: 262144)
  • oneHotEncodeCategoricals (bool) – One-hot encode categoricals (default: true)
class Featurize.PipelineModel(java_model=None)[source]

Bases: mmlspark.Utils.ComplexParamsMixin, pyspark.ml.wrapper.JavaModel, pyspark.ml.util.JavaMLWritable, pyspark.ml.util.JavaMLReadable

Model fitted by Featurize.

This class is left empty on purpose. All necessary methods are exposed through inheritance.

static getJavaPackage()[source]

Returns package name String.

classmethod read()[source]

Returns an MLReader instance for this class.