RankingTrainValidationSplit¶

class RankingTrainValidationSplit.HasCollectSubMetrics[source]¶

Bases: pyspark.ml.param.Params

Mixin for param collectSubModels: Param for whether to collect a list of sub-models trained during tuning. If set to false, then only the single best sub-model will be available after fitting. If set to true, then all sub-models will be available. Warning: For large models, collecting all sub-models can cause OOMs on the Spark driver.

collectSubMetrics = Param(parent='undefined', name='collectSubMetrics', doc='Param for whether to collect a list of sub-models metrics.')¶

getCollectSubMetrics()[source]¶: Gets the value of collectSubModels or its default value.

setCollectSubMetrics(value)[source]¶: Sets the value of collectSubModels.

class RankingTrainValidationSplit.HasCollectSubModels[source]¶

Bases: pyspark.ml.param.Params

Mixin for param collectSubModels: Param for whether to collect a list of sub-models trained during tuning. If set to false, then only the single best sub-model will be available after fitting. If set to true, then all sub-models will be available. Warning: For large models, collecting all sub-models can cause OOMs on the Spark driver.

collectSubModels = Param(parent='undefined', name='collectSubModels', doc='Param for whether to collect a list of sub-models trained during tuning. If set to false, then only the single best sub-model will be available after fitting. If set to true, then all sub-models will be available. Warning: For large models, collecting all sub-models can cause OOMs on the Spark driver.')¶

getCollectSubModels()[source]¶: Gets the value of collectSubModels or its default value.

setCollectSubModels(value)[source]¶: Sets the value of collectSubModels.

class RankingTrainValidationSplit.RankingTrainValidationSplit(estimator=None, estimatorParamMaps=None, evaluator=None, seed=None, trainRatio=0.8, java=False)[source]¶

Bases: pyspark.ml.base.Estimator, pyspark.ml.tuning.ValidatorParams, RankingTrainValidationSplit.HasCollectSubModels, RankingTrainValidationSplit.HasCollectSubMetrics, pyspark.ml.param.shared.HasParallelism

copy(extra=None)[source]¶

Creates a copy of this instance with a randomly generated uid and some extra params. This copies creates a deep copy of the embedded paramMap, and copies the embedded and extra parameters over.

Parameters:	extra – Extra parameters to copy to the new instance
Returns:	Copy of this instance

getItemCol()[source]¶

Returns:	column name for item ids. Ids must be within the integer value range. (default: item)
Return type:	str

getRatingCol()[source]¶

Returns:	column name for ratings (default: rating)
Return type:	str

getTrainRatio()[source]¶: Gets the value of trainRatio or its default value.

getUserCol()[source]¶

Returns:	column name for user ids. Ids must be within the integer value range. (default: user)
Return type:	str

itemCol = Param(parent='undefined', name='itemCol', doc='itemCol: column name for item ids. Ids must be within the integer value range. (default: item)')¶

ratingCol = Param(parent='undefined', name='ratingCol', doc='ratingCol: column name for ratings (default: rating)')¶

setItemCol(value)[source]¶

Parameters:	itemCol (str) – column name for item ids. Ids must be within the integer value range. (default: item)

setParams(estimator=None, estimatorParamMaps=None, evaluator=None, seed=None)[source]¶: setParams(self, estimator=None, estimatorParamMaps=None, evaluator=None, numFolds=3, seed=None): Sets params for cross validator.

setRatingCol(value)[source]¶

Parameters:	ratingCol (str) – column name for ratings (default: rating)

setTrainRatio(value)[source]¶: Sets the value of trainRatio.

setUserCol(value)[source]¶

Parameters:	userCol (str) – column name for user ids. Ids must be within the integer value range. (default: user)

trainRatio = Param(parent='undefined', name='trainRatio', doc='Param for ratio between train and validation data. Must be between 0 and 1.')¶

userCol = Param(parent='undefined', name='userCol', doc='userCol: column name for user ids. Ids must be within the integer value range. (default: user)')¶

class RankingTrainValidationSplit.RankingTrainValidationSplitModel(bestModel, validationMetrics=[], subModels=None, subMetrics=None)[source]¶

Bases: pyspark.ml.base.Model, pyspark.ml.tuning.ValidatorParams

bestModel = None¶: best model from cross validation

copy(extra=None)[source]¶

Creates a copy of this instance with a randomly generated uid and some extra params. This copies the underlying bestModel, creates a deep copy of the embedded paramMap, and copies the embedded and extra parameters over. And, this creates a shallow copy of the validationMetrics.

Parameters:	extra – Extra parameters to copy to the new instance
Returns:	Copy of this instance

recommendForAllItems(numItems)[source]¶

recommendForAllUsers(numItems)[source]¶

validationMetrics = None¶: evaluated validation metrics