SelectColumns

class SelectColumns.SelectColumns(cols=None)[source]

Bases: mmlspark.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

SelectColumns takes a list of column names and returns a DataFrame consisting of only those columns. Any columns in the DataFrame that are not in the selection list are dropped.

Example:
>>> import pandas as pd
>>> from mmlspark import SelectColumns
>>> from pyspark.sql import SQLContext
>>> spark = pyspark.sql.SparkSession.builder.appName("Test SelectCol").getOrCreate()
>>> tmp1 = {"col1": [1, 2, 3, 4, 5],
...         "col2": [6, 7, 8, 9, 10],
...         "col2": [5, 4, 3, 2, 1] }
>>> pddf = pd.DataFrame(tmp1)
>>> pddf.columns
['col1', 'col2', 'col3']
>>> data2 = SelectColumns(cols = ["col1", "col2"]).transform(data)
>>> data2.columns
['col1', 'col2']
Parameters:cols (list) – Comma separated list of selected column names
getCols()[source]
Returns:Comma separated list of selected column names
Return type:list
static getJavaPackage()[source]

Returns package name String.

classmethod read()[source]

Returns an MLReader instance for this class.

setCols(value)[source]
Parameters:cols (list) – Comma separated list of selected column names
setParams(cols=None)[source]

Set the (keyword only) parameters

Parameters:cols (list) – Comma separated list of selected column names