RankingTrainValidationSplit

class RankingTrainValidationSplit.HasCollectSubMetrics[source]

Bases: pyspark.ml.param.Params

Mixin for param collectSubModels: Param for whether to collect a list of sub-models trained during tuning. If set to false, then only the single best sub-model will be available after fitting. If set to true, then all sub-models will be available. Warning: For large models, collecting all sub-models can cause OOMs on the Spark driver.

collectSubMetrics = Param(parent='undefined', name='collectSubMetrics', doc='Param for whether to collect a list of sub-models metrics.')
getCollectSubMetrics()[source]

Gets the value of collectSubModels or its default value.

setCollectSubMetrics(value)[source]

Sets the value of collectSubModels.

class RankingTrainValidationSplit.HasCollectSubModels[source]

Bases: pyspark.ml.param.Params

Mixin for param collectSubModels: Param for whether to collect a list of sub-models trained during tuning. If set to false, then only the single best sub-model will be available after fitting. If set to true, then all sub-models will be available. Warning: For large models, collecting all sub-models can cause OOMs on the Spark driver.

collectSubModels = Param(parent='undefined', name='collectSubModels', doc='Param for whether to collect a list of sub-models trained during tuning. If set to false, then only the single best sub-model will be available after fitting. If set to true, then all sub-models will be available. Warning: For large models, collecting all sub-models can cause OOMs on the Spark driver.')
getCollectSubModels()[source]

Gets the value of collectSubModels or its default value.

setCollectSubModels(value)[source]

Sets the value of collectSubModels.

class RankingTrainValidationSplit.RankingTrainValidationSplit(estimator=None, estimatorParamMaps=None, evaluator=None, seed=None, trainRatio=0.8, java=False)[source]

Bases: pyspark.ml.base.Estimator, pyspark.ml.tuning.ValidatorParams, RankingTrainValidationSplit.HasCollectSubModels, RankingTrainValidationSplit.HasCollectSubMetrics, pyspark.ml.param.shared.HasParallelism

copy(extra=None)[source]

Creates a copy of this instance with a randomly generated uid and some extra params. This copies creates a deep copy of the embedded paramMap, and copies the embedded and extra parameters over.

Parameters:extra – Extra parameters to copy to the new instance
Returns:Copy of this instance
getItemCol()[source]
Returns:column name for item ids. Ids must be within the integer value range. (default: item)
Return type:str
getRatingCol()[source]
Returns:column name for ratings (default: rating)
Return type:str
getTrainRatio()[source]

Gets the value of trainRatio or its default value.

getUserCol()[source]
Returns:column name for user ids. Ids must be within the integer value range. (default: user)
Return type:str
itemCol = Param(parent='undefined', name='itemCol', doc='itemCol: column name for item ids. Ids must be within the integer value range. (default: item)')
ratingCol = Param(parent='undefined', name='ratingCol', doc='ratingCol: column name for ratings (default: rating)')
setItemCol(value)[source]
Parameters:itemCol (str) – column name for item ids. Ids must be within the integer value range. (default: item)
setParams(estimator=None, estimatorParamMaps=None, evaluator=None, seed=None)[source]

setParams(self, estimator=None, estimatorParamMaps=None, evaluator=None, numFolds=3, seed=None): Sets params for cross validator.

setRatingCol(value)[source]
Parameters:ratingCol (str) – column name for ratings (default: rating)
setTrainRatio(value)[source]

Sets the value of trainRatio.

setUserCol(value)[source]
Parameters:userCol (str) – column name for user ids. Ids must be within the integer value range. (default: user)
trainRatio = Param(parent='undefined', name='trainRatio', doc='Param for ratio between train and validation data. Must be between 0 and 1.')
userCol = Param(parent='undefined', name='userCol', doc='userCol: column name for user ids. Ids must be within the integer value range. (default: user)')
class RankingTrainValidationSplit.RankingTrainValidationSplitModel(bestModel, validationMetrics=[], subModels=None, subMetrics=None)[source]

Bases: pyspark.ml.base.Model, pyspark.ml.tuning.ValidatorParams

bestModel = None

best model from cross validation

copy(extra=None)[source]

Creates a copy of this instance with a randomly generated uid and some extra params. This copies the underlying bestModel, creates a deep copy of the embedded paramMap, and copies the embedded and extra parameters over. And, this creates a shallow copy of the validationMetrics.

Parameters:extra – Extra parameters to copy to the new instance
Returns:Copy of this instance
recommendForAllItems(numItems)[source]
recommendForAllUsers(numItems)[source]
validationMetrics = None

evaluated validation metrics