LightGBMClassifier

class LightGBMClassifier.LightGBMClassifier(baggingFraction=1.0, baggingFreq=0, baggingSeed=3, defaultListenPort=12400, featureFraction=1.0, featuresCol='features', labelCol='label', learningRate=0.1, maxBin=255, maxDepth=-1, minSumHessianInLeaf=0.001, numIterations=100, numLeaves=31, parallelism='data_parallel', predictionCol='prediction', probabilityCol='probability', rawPredictionCol='rawPrediction', thresholds=None)[source]

Bases: mmlspark.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaEstimator

Trains a LightGBM Binary Classification model, a fast, distributed, high performance gradient boosting framework based on decision tree algorithms. For more information please see here: https://github.com/Microsoft/LightGBM.

Parameters:
  • baggingFraction (double) – Bagging fraction (default: 1.0)
  • baggingFreq (int) – Bagging frequence (default: 0)
  • baggingSeed (int) – Bagging seed (default: 3)
  • defaultListenPort (int) – The default listen port on executors, used for testing (default: 12400)
  • featureFraction (double) – Feature fraction (default: 1.0)
  • featuresCol (str) – features column name (default: features)
  • labelCol (str) – label column name (default: label)
  • learningRate (double) – Learning rate or shrinkage rate (default: 0.1)
  • maxBin (int) – Max bin (default: 255)
  • maxDepth (int) – Max depth (default: -1)
  • minSumHessianInLeaf (double) – minimal sum hessian in one leaf (default: 0.001)
  • numIterations (int) – Number of iterations, LightGBM constructs num_class * num_iterations trees (default: 100)
  • numLeaves (int) – Number of leaves (default: 31)
  • parallelism (str) – Tree learner parallelism, can be set to data_parallel or voting_parallel (default: data_parallel)
  • predictionCol (str) – prediction column name (default: prediction)
  • probabilityCol (str) – Column name for predicted class conditional probabilities. Note: Not all models output well-calibrated probability estimates! These probabilities should be treated as confidences, not precise probabilities (default: probability)
  • rawPredictionCol (str) – raw prediction (a.k.a. confidence) column name (default: rawPrediction)
  • thresholds (object) – Thresholds in multi-class classification to adjust the probability of predicting each class. Array must have length equal to the number of classes, with values > 0 excepting that at most one value may be 0. The class with largest value p/t is predicted, where p is the original probability of that class and t is the class’s threshold
getBaggingFraction()[source]
Returns:Bagging fraction (default: 1.0)
Return type:double
getBaggingFreq()[source]
Returns:Bagging frequence (default: 0)
Return type:int
getBaggingSeed()[source]
Returns:Bagging seed (default: 3)
Return type:int
getDefaultListenPort()[source]
Returns:The default listen port on executors, used for testing (default: 12400)
Return type:int
getFeatureFraction()[source]
Returns:Feature fraction (default: 1.0)
Return type:double
getFeaturesCol()[source]
Returns:features column name (default: features)
Return type:str
static getJavaPackage()[source]

Returns package name String.

getLabelCol()[source]
Returns:label column name (default: label)
Return type:str
getLearningRate()[source]
Returns:Learning rate or shrinkage rate (default: 0.1)
Return type:double
getMaxBin()[source]
Returns:Max bin (default: 255)
Return type:int
getMaxDepth()[source]
Returns:Max depth (default: -1)
Return type:int
getMinSumHessianInLeaf()[source]
Returns:minimal sum hessian in one leaf (default: 0.001)
Return type:double
getNumIterations()[source]
Returns:Number of iterations, LightGBM constructs num_class * num_iterations trees (default: 100)
Return type:int
getNumLeaves()[source]
Returns:Number of leaves (default: 31)
Return type:int
getParallelism()[source]
Returns:Tree learner parallelism, can be set to data_parallel or voting_parallel (default: data_parallel)
Return type:str
getPredictionCol()[source]
Returns:prediction column name (default: prediction)
Return type:str
getProbabilityCol()[source]
Returns:Column name for predicted class conditional probabilities. Note: Not all models output well-calibrated probability estimates! These probabilities should be treated as confidences, not precise probabilities (default: probability)
Return type:str
getRawPredictionCol()[source]
Returns:raw prediction (a.k.a. confidence) column name (default: rawPrediction)
Return type:str
getThresholds()[source]
Returns:Thresholds in multi-class classification to adjust the probability of predicting each class. Array must have length equal to the number of classes, with values > 0 excepting that at most one value may be 0. The class with largest value p/t is predicted, where p is the original probability of that class and t is the class’s threshold
Return type:object
classmethod read()[source]

Returns an MLReader instance for this class.

setBaggingFraction(value)[source]
Parameters:baggingFraction (double) – Bagging fraction (default: 1.0)
setBaggingFreq(value)[source]
Parameters:baggingFreq (int) – Bagging frequence (default: 0)
setBaggingSeed(value)[source]
Parameters:baggingSeed (int) – Bagging seed (default: 3)
setDefaultListenPort(value)[source]
Parameters:defaultListenPort (int) – The default listen port on executors, used for testing (default: 12400)
setFeatureFraction(value)[source]
Parameters:featureFraction (double) – Feature fraction (default: 1.0)
setFeaturesCol(value)[source]
Parameters:featuresCol (str) – features column name (default: features)
setLabelCol(value)[source]
Parameters:labelCol (str) – label column name (default: label)
setLearningRate(value)[source]
Parameters:learningRate (double) – Learning rate or shrinkage rate (default: 0.1)
setMaxBin(value)[source]
Parameters:maxBin (int) – Max bin (default: 255)
setMaxDepth(value)[source]
Parameters:maxDepth (int) – Max depth (default: -1)
setMinSumHessianInLeaf(value)[source]
Parameters:minSumHessianInLeaf (double) – minimal sum hessian in one leaf (default: 0.001)
setNumIterations(value)[source]
Parameters:numIterations (int) – Number of iterations, LightGBM constructs num_class * num_iterations trees (default: 100)
setNumLeaves(value)[source]
Parameters:numLeaves (int) – Number of leaves (default: 31)
setParallelism(value)[source]
Parameters:parallelism (str) – Tree learner parallelism, can be set to data_parallel or voting_parallel (default: data_parallel)
setParams(baggingFraction=1.0, baggingFreq=0, baggingSeed=3, defaultListenPort=12400, featureFraction=1.0, featuresCol='features', labelCol='label', learningRate=0.1, maxBin=255, maxDepth=-1, minSumHessianInLeaf=0.001, numIterations=100, numLeaves=31, parallelism='data_parallel', predictionCol='prediction', probabilityCol='probability', rawPredictionCol='rawPrediction', thresholds=None)[source]

Set the (keyword only) parameters

Parameters:
  • baggingFraction (double) – Bagging fraction (default: 1.0)
  • baggingFreq (int) – Bagging frequence (default: 0)
  • baggingSeed (int) – Bagging seed (default: 3)
  • defaultListenPort (int) – The default listen port on executors, used for testing (default: 12400)
  • featureFraction (double) – Feature fraction (default: 1.0)
  • featuresCol (str) – features column name (default: features)
  • labelCol (str) – label column name (default: label)
  • learningRate (double) – Learning rate or shrinkage rate (default: 0.1)
  • maxBin (int) – Max bin (default: 255)
  • maxDepth (int) – Max depth (default: -1)
  • minSumHessianInLeaf (double) – minimal sum hessian in one leaf (default: 0.001)
  • numIterations (int) – Number of iterations, LightGBM constructs num_class * num_iterations trees (default: 100)
  • numLeaves (int) – Number of leaves (default: 31)
  • parallelism (str) – Tree learner parallelism, can be set to data_parallel or voting_parallel (default: data_parallel)
  • predictionCol (str) – prediction column name (default: prediction)
  • probabilityCol (str) – Column name for predicted class conditional probabilities. Note: Not all models output well-calibrated probability estimates! These probabilities should be treated as confidences, not precise probabilities (default: probability)
  • rawPredictionCol (str) – raw prediction (a.k.a. confidence) column name (default: rawPrediction)
  • thresholds (object) – Thresholds in multi-class classification to adjust the probability of predicting each class. Array must have length equal to the number of classes, with values > 0 excepting that at most one value may be 0. The class with largest value p/t is predicted, where p is the original probability of that class and t is the class’s threshold
setPredictionCol(value)[source]
Parameters:predictionCol (str) – prediction column name (default: prediction)
setProbabilityCol(value)[source]
Parameters:probabilityCol (str) – Column name for predicted class conditional probabilities. Note: Not all models output well-calibrated probability estimates! These probabilities should be treated as confidences, not precise probabilities (default: probability)
setRawPredictionCol(value)[source]
Parameters:rawPredictionCol (str) – raw prediction (a.k.a. confidence) column name (default: rawPrediction)
setThresholds(value)[source]
Parameters:thresholds (object) – Thresholds in multi-class classification to adjust the probability of predicting each class. Array must have length equal to the number of classes, with values > 0 excepting that at most one value may be 0. The class with largest value p/t is predicted, where p is the original probability of that class and t is the class’s threshold
class LightGBMClassifier.M(java_model=None)[source]

Bases: mmlspark.Utils.ComplexParamsMixin, pyspark.ml.wrapper.JavaModel, pyspark.ml.util.JavaMLWritable, pyspark.ml.util.JavaMLReadable

Model fitted by LightGBMClassifier.

This class is left empty on purpose. All necessary methods are exposed through inheritance.

static getJavaPackage()[source]

Returns package name String.

classmethod read()[source]

Returns an MLReader instance for this class.