PartitionSample

class PartitionSample.PartitionSample(count=1000, mode='RandomSample', newColName='Partition', numParts=10, percent=0.01, rsMode='Percentage', seed=-1)[source]

Bases: mmlspark.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Sampling mode. The options are:

  • AssignToPartition
  • RandomSample
  • Head

The default is RandomSample.

Relevant parameters for the different modes are:

  • When the mode is AssignToPartition:
    • seed - the seed for random partition assignment.
    • numParts - the number of partitions. The Default is 10.
    • newColName - the name of the partition column. The default is “Partition”.
  • When the mode is RandomSample:
    • mode - Absolute or Percentage
    • count - the number of rows to assign to each partition when Absolute
    • percent - the percentage per partition when Percentage
  • When the mode is Head:
    • count - the number of rows
Parameters:
  • count (long) – Number of rows to return (default: 1000)
  • mode (str) – AssignToPartition, RandomSample, or Head (default: RandomSample)
  • newColName (str) – Name of the partition column (default: Partition)
  • numParts (int) – Number of partitions (default: 10)
  • percent (double) – Percent of rows to return (default: 0.01)
  • rsMode (str) – Absolute or Percentage (default: Percentage)
  • seed (long) – Seed for random operations (default: -1)
getCount()[source]
Returns:Number of rows to return (default: 1000)
Return type:long
static getJavaPackage()[source]

Returns package name String.

getMode()[source]
Returns:AssignToPartition, RandomSample, or Head (default: RandomSample)
Return type:str
getNewColName()[source]
Returns:Name of the partition column (default: Partition)
Return type:str
getNumParts()[source]
Returns:Number of partitions (default: 10)
Return type:int
getPercent()[source]
Returns:Percent of rows to return (default: 0.01)
Return type:double
getRsMode()[source]
Returns:Absolute or Percentage (default: Percentage)
Return type:str
getSeed()[source]
Returns:Seed for random operations (default: -1)
Return type:long
classmethod read()[source]

Returns an MLReader instance for this class.

setCount(value)[source]
Parameters:count (long) – Number of rows to return (default: 1000)
setMode(value)[source]
Parameters:mode (str) – AssignToPartition, RandomSample, or Head (default: RandomSample)
setNewColName(value)[source]
Parameters:newColName (str) – Name of the partition column (default: Partition)
setNumParts(value)[source]
Parameters:numParts (int) – Number of partitions (default: 10)
setParams(count=1000, mode='RandomSample', newColName='Partition', numParts=10, percent=0.01, rsMode='Percentage', seed=-1)[source]

Set the (keyword only) parameters

Parameters:
  • count (long) – Number of rows to return (default: 1000)
  • mode (str) – AssignToPartition, RandomSample, or Head (default: RandomSample)
  • newColName (str) – Name of the partition column (default: Partition)
  • numParts (int) – Number of partitions (default: 10)
  • percent (double) – Percent of rows to return (default: 0.01)
  • rsMode (str) – Absolute or Percentage (default: Percentage)
  • seed (long) – Seed for random operations (default: -1)
setPercent(value)[source]
Parameters:percent (double) – Percent of rows to return (default: 0.01)
setRsMode(value)[source]
Parameters:rsMode (str) – Absolute or Percentage (default: Percentage)
setSeed(value)[source]
Parameters:seed (long) – Seed for random operations (default: -1)