LightGBMClassifier¶
-
class
LightGBMClassifier.LightGBMClassifier(baggingFraction=1.0, baggingFreq=0, baggingSeed=3, defaultListenPort=12400, featureFraction=1.0, featuresCol='features', labelCol='label', learningRate=0.1, maxBin=255, maxDepth=-1, minSumHessianInLeaf=0.001, numIterations=100, numLeaves=31, parallelism='data_parallel', predictionCol='prediction', probabilityCol='probability', rawPredictionCol='rawPrediction', thresholds=None)[source]¶ Bases:
mmlspark.Utils.ComplexParamsMixin,pyspark.ml.util.JavaMLReadable,pyspark.ml.util.JavaMLWritable,pyspark.ml.wrapper.JavaEstimatorTrains a LightGBM Binary Classification model, a fast, distributed, high performance gradient boosting framework based on decision tree algorithms. For more information please see here: https://github.com/Microsoft/LightGBM.
Parameters: - baggingFraction (double) – Bagging fraction (default: 1.0)
- baggingFreq (int) – Bagging frequence (default: 0)
- baggingSeed (int) – Bagging seed (default: 3)
- defaultListenPort (int) – The default listen port on executors, used for testing (default: 12400)
- featureFraction (double) – Feature fraction (default: 1.0)
- featuresCol (str) – features column name (default: features)
- labelCol (str) – label column name (default: label)
- learningRate (double) – Learning rate or shrinkage rate (default: 0.1)
- maxBin (int) – Max bin (default: 255)
- maxDepth (int) – Max depth (default: -1)
- minSumHessianInLeaf (double) – minimal sum hessian in one leaf (default: 0.001)
- numIterations (int) – Number of iterations, LightGBM constructs num_class * num_iterations trees (default: 100)
- numLeaves (int) – Number of leaves (default: 31)
- parallelism (str) – Tree learner parallelism, can be set to data_parallel or voting_parallel (default: data_parallel)
- predictionCol (str) – prediction column name (default: prediction)
- probabilityCol (str) – Column name for predicted class conditional probabilities. Note: Not all models output well-calibrated probability estimates! These probabilities should be treated as confidences, not precise probabilities (default: probability)
- rawPredictionCol (str) – raw prediction (a.k.a. confidence) column name (default: rawPrediction)
- thresholds (object) – Thresholds in multi-class classification to adjust the probability of predicting each class. Array must have length equal to the number of classes, with values > 0 excepting that at most one value may be 0. The class with largest value p/t is predicted, where p is the original probability of that class and t is the class’s threshold
-
getDefaultListenPort()[source]¶ Returns: The default listen port on executors, used for testing (default: 12400) Return type: int
-
getLearningRate()[source]¶ Returns: Learning rate or shrinkage rate (default: 0.1) Return type: double
-
getMinSumHessianInLeaf()[source]¶ Returns: minimal sum hessian in one leaf (default: 0.001) Return type: double
-
getNumIterations()[source]¶ Returns: Number of iterations, LightGBM constructs num_class * num_iterations trees (default: 100) Return type: int
-
getParallelism()[source]¶ Returns: Tree learner parallelism, can be set to data_parallel or voting_parallel (default: data_parallel) Return type: str
-
getProbabilityCol()[source]¶ Returns: Column name for predicted class conditional probabilities. Note: Not all models output well-calibrated probability estimates! These probabilities should be treated as confidences, not precise probabilities (default: probability) Return type: str
-
getRawPredictionCol()[source]¶ Returns: raw prediction (a.k.a. confidence) column name (default: rawPrediction) Return type: str
-
getThresholds()[source]¶ Returns: Thresholds in multi-class classification to adjust the probability of predicting each class. Array must have length equal to the number of classes, with values > 0 excepting that at most one value may be 0. The class with largest value p/t is predicted, where p is the original probability of that class and t is the class’s threshold Return type: object
-
setBaggingFraction(value)[source]¶ Parameters: baggingFraction (double) – Bagging fraction (default: 1.0)
-
setDefaultListenPort(value)[source]¶ Parameters: defaultListenPort (int) – The default listen port on executors, used for testing (default: 12400)
-
setFeatureFraction(value)[source]¶ Parameters: featureFraction (double) – Feature fraction (default: 1.0)
-
setFeaturesCol(value)[source]¶ Parameters: featuresCol (str) – features column name (default: features)
-
setLearningRate(value)[source]¶ Parameters: learningRate (double) – Learning rate or shrinkage rate (default: 0.1)
-
setMinSumHessianInLeaf(value)[source]¶ Parameters: minSumHessianInLeaf (double) – minimal sum hessian in one leaf (default: 0.001)
-
setNumIterations(value)[source]¶ Parameters: numIterations (int) – Number of iterations, LightGBM constructs num_class * num_iterations trees (default: 100)
-
setParallelism(value)[source]¶ Parameters: parallelism (str) – Tree learner parallelism, can be set to data_parallel or voting_parallel (default: data_parallel)
-
setParams(baggingFraction=1.0, baggingFreq=0, baggingSeed=3, defaultListenPort=12400, featureFraction=1.0, featuresCol='features', labelCol='label', learningRate=0.1, maxBin=255, maxDepth=-1, minSumHessianInLeaf=0.001, numIterations=100, numLeaves=31, parallelism='data_parallel', predictionCol='prediction', probabilityCol='probability', rawPredictionCol='rawPrediction', thresholds=None)[source]¶ Set the (keyword only) parameters
Parameters: - baggingFraction (double) – Bagging fraction (default: 1.0)
- baggingFreq (int) – Bagging frequence (default: 0)
- baggingSeed (int) – Bagging seed (default: 3)
- defaultListenPort (int) – The default listen port on executors, used for testing (default: 12400)
- featureFraction (double) – Feature fraction (default: 1.0)
- featuresCol (str) – features column name (default: features)
- labelCol (str) – label column name (default: label)
- learningRate (double) – Learning rate or shrinkage rate (default: 0.1)
- maxBin (int) – Max bin (default: 255)
- maxDepth (int) – Max depth (default: -1)
- minSumHessianInLeaf (double) – minimal sum hessian in one leaf (default: 0.001)
- numIterations (int) – Number of iterations, LightGBM constructs num_class * num_iterations trees (default: 100)
- numLeaves (int) – Number of leaves (default: 31)
- parallelism (str) – Tree learner parallelism, can be set to data_parallel or voting_parallel (default: data_parallel)
- predictionCol (str) – prediction column name (default: prediction)
- probabilityCol (str) – Column name for predicted class conditional probabilities. Note: Not all models output well-calibrated probability estimates! These probabilities should be treated as confidences, not precise probabilities (default: probability)
- rawPredictionCol (str) – raw prediction (a.k.a. confidence) column name (default: rawPrediction)
- thresholds (object) – Thresholds in multi-class classification to adjust the probability of predicting each class. Array must have length equal to the number of classes, with values > 0 excepting that at most one value may be 0. The class with largest value p/t is predicted, where p is the original probability of that class and t is the class’s threshold
-
setPredictionCol(value)[source]¶ Parameters: predictionCol (str) – prediction column name (default: prediction)
-
setProbabilityCol(value)[source]¶ Parameters: probabilityCol (str) – Column name for predicted class conditional probabilities. Note: Not all models output well-calibrated probability estimates! These probabilities should be treated as confidences, not precise probabilities (default: probability)
-
setRawPredictionCol(value)[source]¶ Parameters: rawPredictionCol (str) – raw prediction (a.k.a. confidence) column name (default: rawPrediction)
-
setThresholds(value)[source]¶ Parameters: thresholds (object) – Thresholds in multi-class classification to adjust the probability of predicting each class. Array must have length equal to the number of classes, with values > 0 excepting that at most one value may be 0. The class with largest value p/t is predicted, where p is the original probability of that class and t is the class’s threshold
-
class
LightGBMClassifier.M(java_model=None)[source]¶ Bases:
mmlspark.Utils.ComplexParamsMixin,pyspark.ml.wrapper.JavaModel,pyspark.ml.util.JavaMLWritable,pyspark.ml.util.JavaMLReadableModel fitted by
LightGBMClassifier.This class is left empty on purpose. All necessary methods are exposed through inheritance.