TrainClassifier¶
-
class
TrainClassifier.TrainClassifier(featuresCol=None, labelCol=None, labels=None, model=None, numFeatures=0, reindexLabel=True)[source]¶ Bases:
mmlspark.Utils.ComplexParamsMixin,pyspark.ml.util.JavaMLReadable,pyspark.ml.util.JavaMLWritable,pyspark.ml.wrapper.JavaEstimatorTrains a classification model
- The currently supported classifiers are:
- Logistic Regression Classifier Decision Tree Classifier Random Forest Classifier Gradient Boosted Trees Classifier Naive Bayes Classifier Multilayer Perceptron Classifier
In addition to any generic learner that inherits from Predictor.
This module featurizes the given data into a vector of doubles and passes it to the given learner.
Note the behavior of the reindex and labels parameters, the parameters interact as:
reindex - false labels - false (Empty) Assume all double values, don’t use metadata, assume natural ordering
reindex - true labels - false (Empty) Index, use natural ordering of string indexer
reindex - false labels - true (Specified) Assume user knows indexing, apply label values. Currently only string type supported.
reindex - true labels - true (Specified) Validate labels matches column type, try to recast to label type, reindex label column
Parameters: - featuresCol (str) – The name of the features column (default: [self.uid]_features)
- labelCol (str) – The name of the label column
- labels (list) – Sorted label values on the labels column
- model (object) – Classifier to run
- numFeatures (int) – Number of features to hash to (default: 0)
- reindexLabel (bool) – Re-index the label column (default: true)
-
getFeaturesCol()[source]¶ Returns: The name of the features column (default: [self.uid]_features) Return type: str
-
setFeaturesCol(value)[source]¶ Parameters: featuresCol (str) – The name of the features column (default: [self.uid]_features)
-
setNumFeatures(value)[source]¶ Parameters: numFeatures (int) – Number of features to hash to (default: 0)
-
setParams(featuresCol=None, labelCol=None, labels=None, model=None, numFeatures=0, reindexLabel=True)[source]¶ Set the (keyword only) parameters
Parameters: - featuresCol (str) – The name of the features column (default: [self.uid]_features)
- labelCol (str) – The name of the label column
- labels (list) – Sorted label values on the labels column
- model (object) – Classifier to run
- numFeatures (int) – Number of features to hash to (default: 0)
- reindexLabel (bool) – Re-index the label column (default: true)
-
class
TrainClassifier.TrainedClassifierModel(java_model=None)[source]¶ Bases:
mmlspark.Utils.ComplexParamsMixin,pyspark.ml.wrapper.JavaModel,pyspark.ml.util.JavaMLWritable,pyspark.ml.util.JavaMLReadableModel fitted by
TrainClassifier.This class is left empty on purpose. All necessary methods are exposed through inheritance.