TrainClassifier

class TrainClassifier.TrainClassifier(featuresCol=None, labelCol=None, labels=None, model=None, numFeatures=0, reindexLabel=True)[source]

Bases: mmlspark.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaEstimator

Trains a classification model

The currently supported classifiers are:
Logistic Regression Classifier Decision Tree Classifier Random Forest Classifier Gradient Boosted Trees Classifier Naive Bayes Classifier Multilayer Perceptron Classifier

In addition to any generic learner that inherits from Predictor.

This module featurizes the given data into a vector of doubles and passes it to the given learner.

Note the behavior of the reindex and labels parameters, the parameters interact as:

reindex - false labels - false (Empty) Assume all double values, don’t use metadata, assume natural ordering

reindex - true labels - false (Empty) Index, use natural ordering of string indexer

reindex - false labels - true (Specified) Assume user knows indexing, apply label values. Currently only string type supported.

reindex - true labels - true (Specified) Validate labels matches column type, try to recast to label type, reindex label column

Parameters:
  • featuresCol (str) – The name of the features column (default: [self.uid]_features)
  • labelCol (str) – The name of the label column
  • labels (list) – Sorted label values on the labels column
  • model (object) – Classifier to run
  • numFeatures (int) – Number of features to hash to (default: 0)
  • reindexLabel (bool) – Re-index the label column (default: true)
getFeaturesCol()[source]
Returns:The name of the features column (default: [self.uid]_features)
Return type:str
static getJavaPackage()[source]

Returns package name String.

getLabelCol()[source]
Returns:The name of the label column
Return type:str
getLabels()[source]
Returns:Sorted label values on the labels column
Return type:list
getModel()[source]
Returns:Classifier to run
Return type:object
getNumFeatures()[source]
Returns:Number of features to hash to (default: 0)
Return type:int
getReindexLabel()[source]
Returns:Re-index the label column (default: true)
Return type:bool
classmethod read()[source]

Returns an MLReader instance for this class.

setFeaturesCol(value)[source]
Parameters:featuresCol (str) – The name of the features column (default: [self.uid]_features)
setLabelCol(value)[source]
Parameters:labelCol (str) – The name of the label column
setLabels(value)[source]
Parameters:labels (list) – Sorted label values on the labels column
setModel(value)[source]
Parameters:model (object) – Classifier to run
setNumFeatures(value)[source]
Parameters:numFeatures (int) – Number of features to hash to (default: 0)
setParams(featuresCol=None, labelCol=None, labels=None, model=None, numFeatures=0, reindexLabel=True)[source]

Set the (keyword only) parameters

Parameters:
  • featuresCol (str) – The name of the features column (default: [self.uid]_features)
  • labelCol (str) – The name of the label column
  • labels (list) – Sorted label values on the labels column
  • model (object) – Classifier to run
  • numFeatures (int) – Number of features to hash to (default: 0)
  • reindexLabel (bool) – Re-index the label column (default: true)
setReindexLabel(value)[source]
Parameters:reindexLabel (bool) – Re-index the label column (default: true)
class TrainClassifier.TrainedClassifierModel(java_model=None)[source]

Bases: mmlspark.Utils.ComplexParamsMixin, pyspark.ml.wrapper.JavaModel, pyspark.ml.util.JavaMLWritable, pyspark.ml.util.JavaMLReadable

Model fitted by TrainClassifier.

This class is left empty on purpose. All necessary methods are exposed through inheritance.

static getJavaPackage()[source]

Returns package name String.

classmethod read()[source]

Returns an MLReader instance for this class.