LightGBMClassifier¶

class LightGBMClassifier.LightGBMClassifier(baggingFraction=1.0, baggingFreq=0, baggingSeed=3, defaultListenPort=12400, featureFraction=1.0, featuresCol='features', labelCol='label', learningRate=0.1, maxBin=255, maxDepth=-1, minSumHessianInLeaf=0.001, numIterations=100, numLeaves=31, parallelism='data_parallel', predictionCol='prediction', probabilityCol='probability', rawPredictionCol='rawPrediction', thresholds=None)[source]¶

Bases: mmlspark.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaEstimator

Trains a LightGBM Binary Classification model, a fast, distributed, high performance gradient boosting framework based on decision tree algorithms. For more information please see here: https://github.com/Microsoft/LightGBM.

Parameters:

baggingFraction (double) – Bagging fraction (default: 1.0)
baggingFreq (int) – Bagging frequence (default: 0)
baggingSeed (int) – Bagging seed (default: 3)
defaultListenPort (int) – The default listen port on executors, used for testing (default: 12400)
featureFraction (double) – Feature fraction (default: 1.0)
featuresCol (str) – features column name (default: features)
labelCol (str) – label column name (default: label)
learningRate (double) – Learning rate or shrinkage rate (default: 0.1)
maxBin (int) – Max bin (default: 255)
maxDepth (int) – Max depth (default: -1)
minSumHessianInLeaf (double) – minimal sum hessian in one leaf (default: 0.001)
numIterations (int) – Number of iterations, LightGBM constructs num_class * num_iterations trees (default: 100)
numLeaves (int) – Number of leaves (default: 31)
parallelism (str) – Tree learner parallelism, can be set to data_parallel or voting_parallel (default: data_parallel)
predictionCol (str) – prediction column name (default: prediction)
probabilityCol (str) – Column name for predicted class conditional probabilities. Note: Not all models output well-calibrated probability estimates! These probabilities should be treated as confidences, not precise probabilities (default: probability)
rawPredictionCol (str) – raw prediction (a.k.a. confidence) column name (default: rawPrediction)
thresholds (object) – Thresholds in multi-class classification to adjust the probability of predicting each class. Array must have length equal to the number of classes, with values > 0 excepting that at most one value may be 0. The class with largest value p/t is predicted, where p is the original probability of that class and t is the class’s threshold

getBaggingFraction()[source]¶

Returns:	Bagging fraction (default: 1.0)
Return type:	double

getBaggingFreq()[source]¶

Returns:	Bagging frequence (default: 0)
Return type:	int

getBaggingSeed()[source]¶

Returns:	Bagging seed (default: 3)
Return type:	int

getDefaultListenPort()[source]¶

Returns:	The default listen port on executors, used for testing (default: 12400)
Return type:	int

getFeatureFraction()[source]¶

Returns:	Feature fraction (default: 1.0)
Return type:	double

getFeaturesCol()[source]¶

Returns:	features column name (default: features)
Return type:	str

static getJavaPackage()[source]¶: Returns package name String.

getLabelCol()[source]¶

Returns:	label column name (default: label)
Return type:	str

getLearningRate()[source]¶

Returns:	Learning rate or shrinkage rate (default: 0.1)
Return type:	double

getMaxBin()[source]¶

Returns:	Max bin (default: 255)
Return type:	int

getMaxDepth()[source]¶

Returns:	Max depth (default: -1)
Return type:	int

getMinSumHessianInLeaf()[source]¶

Returns:	minimal sum hessian in one leaf (default: 0.001)
Return type:	double

getNumIterations()[source]¶

Returns:	Number of iterations, LightGBM constructs num_class * num_iterations trees (default: 100)
Return type:	int

getNumLeaves()[source]¶

Returns:	Number of leaves (default: 31)
Return type:	int

getParallelism()[source]¶

Returns:	Tree learner parallelism, can be set to data_parallel or voting_parallel (default: data_parallel)
Return type:	str

getPredictionCol()[source]¶

Returns:	prediction column name (default: prediction)
Return type:	str

getProbabilityCol()[source]¶

Returns:	Column name for predicted class conditional probabilities. Note: Not all models output well-calibrated probability estimates! These probabilities should be treated as confidences, not precise probabilities (default: probability)
Return type:	str

getRawPredictionCol()[source]¶

Returns:	raw prediction (a.k.a. confidence) column name (default: rawPrediction)
Return type:	str

getThresholds()[source]¶

Returns:	Thresholds in multi-class classification to adjust the probability of predicting each class. Array must have length equal to the number of classes, with values > 0 excepting that at most one value may be 0. The class with largest value p/t is predicted, where p is the original probability of that class and t is the class’s threshold
Return type:	object

classmethod read()[source]¶: Returns an MLReader instance for this class.

setBaggingFraction(value)[source]¶

Parameters:	baggingFraction (double) – Bagging fraction (default: 1.0)

setBaggingFreq(value)[source]¶

Parameters:	baggingFreq (int) – Bagging frequence (default: 0)

setBaggingSeed(value)[source]¶

Parameters:	baggingSeed (int) – Bagging seed (default: 3)

setDefaultListenPort(value)[source]¶

Parameters:	defaultListenPort (int) – The default listen port on executors, used for testing (default: 12400)

setFeatureFraction(value)[source]¶

Parameters:	featureFraction (double) – Feature fraction (default: 1.0)

setFeaturesCol(value)[source]¶

Parameters:	featuresCol (str) – features column name (default: features)

setLabelCol(value)[source]¶

Parameters:	labelCol (str) – label column name (default: label)

setLearningRate(value)[source]¶

Parameters:	learningRate (double) – Learning rate or shrinkage rate (default: 0.1)

setMaxBin(value)[source]¶

Parameters:	maxBin (int) – Max bin (default: 255)

setMaxDepth(value)[source]¶

Parameters:	maxDepth (int) – Max depth (default: -1)

setMinSumHessianInLeaf(value)[source]¶

Parameters:	minSumHessianInLeaf (double) – minimal sum hessian in one leaf (default: 0.001)

setNumIterations(value)[source]¶

Parameters:	numIterations (int) – Number of iterations, LightGBM constructs num_class * num_iterations trees (default: 100)

setNumLeaves(value)[source]¶

Parameters:	numLeaves (int) – Number of leaves (default: 31)

setParallelism(value)[source]¶

Parameters:	parallelism (str) – Tree learner parallelism, can be set to data_parallel or voting_parallel (default: data_parallel)

setParams(baggingFraction=1.0, baggingFreq=0, baggingSeed=3, defaultListenPort=12400, featureFraction=1.0, featuresCol='features', labelCol='label', learningRate=0.1, maxBin=255, maxDepth=-1, minSumHessianInLeaf=0.001, numIterations=100, numLeaves=31, parallelism='data_parallel', predictionCol='prediction', probabilityCol='probability', rawPredictionCol='rawPrediction', thresholds=None)[source]¶

Set the (keyword only) parameters

Parameters:

baggingFraction (double) – Bagging fraction (default: 1.0)
baggingFreq (int) – Bagging frequence (default: 0)
baggingSeed (int) – Bagging seed (default: 3)
defaultListenPort (int) – The default listen port on executors, used for testing (default: 12400)
featureFraction (double) – Feature fraction (default: 1.0)
featuresCol (str) – features column name (default: features)
labelCol (str) – label column name (default: label)
learningRate (double) – Learning rate or shrinkage rate (default: 0.1)
maxBin (int) – Max bin (default: 255)
maxDepth (int) – Max depth (default: -1)
minSumHessianInLeaf (double) – minimal sum hessian in one leaf (default: 0.001)
numIterations (int) – Number of iterations, LightGBM constructs num_class * num_iterations trees (default: 100)
numLeaves (int) – Number of leaves (default: 31)
parallelism (str) – Tree learner parallelism, can be set to data_parallel or voting_parallel (default: data_parallel)
predictionCol (str) – prediction column name (default: prediction)
probabilityCol (str) – Column name for predicted class conditional probabilities. Note: Not all models output well-calibrated probability estimates! These probabilities should be treated as confidences, not precise probabilities (default: probability)
rawPredictionCol (str) – raw prediction (a.k.a. confidence) column name (default: rawPrediction)
thresholds (object) – Thresholds in multi-class classification to adjust the probability of predicting each class. Array must have length equal to the number of classes, with values > 0 excepting that at most one value may be 0. The class with largest value p/t is predicted, where p is the original probability of that class and t is the class’s threshold

setPredictionCol(value)[source]¶

Parameters:	predictionCol (str) – prediction column name (default: prediction)

setProbabilityCol(value)[source]¶

Parameters:	probabilityCol (str) – Column name for predicted class conditional probabilities. Note: Not all models output well-calibrated probability estimates! These probabilities should be treated as confidences, not precise probabilities (default: probability)

setRawPredictionCol(value)[source]¶

Parameters:	rawPredictionCol (str) – raw prediction (a.k.a. confidence) column name (default: rawPrediction)

setThresholds(value)[source]¶

Parameters:	thresholds (object) – Thresholds in multi-class classification to adjust the probability of predicting each class. Array must have length equal to the number of classes, with values > 0 excepting that at most one value may be 0. The class with largest value p/t is predicted, where p is the original probability of that class and t is the class’s threshold

class LightGBMClassifier.M(java_model=None)[source]¶

Bases: mmlspark.Utils.ComplexParamsMixin, pyspark.ml.wrapper.JavaModel, pyspark.ml.util.JavaMLWritable, pyspark.ml.util.JavaMLReadable

Model fitted by LightGBMClassifier.

This class is left empty on purpose. All necessary methods are exposed through inheritance.

static getJavaPackage()[source]¶: Returns package name String.

classmethod read()[source]¶: Returns an MLReader instance for this class.