PartitionSample¶
-
class
PartitionSample.
PartitionSample
(count=1000, mode='RandomSample', newColName='Partition', numParts=10, percent=0.01, rsMode='Percentage', seed=-1)[source]¶ Bases:
mmlspark.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaTransformer
Sampling mode. The options are:
- AssignToPartition
- RandomSample
- Head
The default is RandomSample.
Relevant parameters for the different modes are:
- When the mode is AssignToPartition:
- seed - the seed for random partition assignment.
- numParts - the number of partitions. The Default is 10.
- newColName - the name of the partition column. The default is “Partition”.
- When the mode is RandomSample:
- mode - Absolute or Percentage
- count - the number of rows to assign to each partition when Absolute
- percent - the percentage per partition when Percentage
- When the mode is Head:
- count - the number of rows
Parameters: - count (long) – Number of rows to return (default: 1000)
- mode (str) – AssignToPartition, RandomSample, or Head (default: RandomSample)
- newColName (str) – Name of the partition column (default: Partition)
- numParts (int) – Number of partitions (default: 10)
- percent (double) – Percent of rows to return (default: 0.01)
- rsMode (str) – Absolute or Percentage (default: Percentage)
- seed (long) – Seed for random operations (default: -1)
-
getMode
()[source]¶ Returns: AssignToPartition, RandomSample, or Head (default: RandomSample) Return type: str
-
setMode
(value)[source]¶ Parameters: mode (str) – AssignToPartition, RandomSample, or Head (default: RandomSample)
-
setNewColName
(value)[source]¶ Parameters: newColName (str) – Name of the partition column (default: Partition)
-
setParams
(count=1000, mode='RandomSample', newColName='Partition', numParts=10, percent=0.01, rsMode='Percentage', seed=-1)[source]¶ Set the (keyword only) parameters
Parameters: - count (long) – Number of rows to return (default: 1000)
- mode (str) – AssignToPartition, RandomSample, or Head (default: RandomSample)
- newColName (str) – Name of the partition column (default: Partition)
- numParts (int) – Number of partitions (default: 10)
- percent (double) – Percent of rows to return (default: 0.01)
- rsMode (str) – Absolute or Percentage (default: Percentage)
- seed (long) – Seed for random operations (default: -1)