BinaryFileReader¶

BinaryFileReader.BinaryFileFields = ['path', 'bytes']¶: Names of Binary File Schema field names.

BinaryFileReader.BinaryFileSchema = StructType(List(StructField(path,StringType,true),StructField(bytes,BinaryType,true)))¶

Schema for Binary Files.

BinaryFileReader.isBinaryFile(df, column)[source]¶

Returns True if the column contains binary files

Parameters:	df (DataFrame) – The DataFrame to be processed column (bool) – The name of the column being inspected
Returns:	True if the colum is a binary files column
Return type:	bool

BinaryFileReader.readBinaryFiles(self, path, recursive=False, sampleRatio=1.0, inspectZip=True, seed=0)[source]¶

Reads the directory of binary files from the local or remote (WASB) source This function is attached to SparkSession class.

Example:

>>> spark.readBinaryFiles(path, recursive, sampleRatio = 1.0, inspectZip = True)

Parameters:	path (str) – Path to the file directory recursive (b (double) – Fraction of the files loaded into the dataframe
Returns:	DataFrame with a single column “value”; see binaryFileSchema for details
Return type:	DataFrame

BinaryFileReader.streamBinaryFiles(self, path, sampleRatio=1.0, inspectZip=True, seed=0)[source]¶

Streams the directory of binary files from the local or remote (WASB) source This function is attached to SparkSession class.

Example:

>>> spark.streamBinaryFiles(path, sampleRatio = 1.0, inspectZip = True)

Parameters:	path (str) – Path to the file directory
Returns:	DataFrame with a single column “value”; see binaryFileSchema for details
Return type:	DataFrame