BinaryFileReader

BinaryFileReader.BinaryFileFields = ['path', 'bytes']

Names of Binary File Schema field names.

BinaryFileReader.BinaryFileSchema = StructType(List(StructField(path,StringType,true),StructField(bytes,BinaryType,true)))

Schema for Binary Files.

Schema records consist of BinaryFileFields name, Type, and ??
path bytes
BinaryFileReader.isBinaryFile(df, column)[source]

Returns True if the column contains binary files

Parameters:
  • df (DataFrame) – The DataFrame to be processed
  • column (bool) – The name of the column being inspected
Returns:

True if the colum is a binary files column

Return type:

bool

BinaryFileReader.readBinaryFiles(self, path, recursive=False, sampleRatio=1.0, inspectZip=True, seed=0)[source]

Reads the directory of binary files from the local or remote (WASB) source This function is attached to SparkSession class.

Example:
>>> spark.readBinaryFiles(path, recursive, sampleRatio = 1.0, inspectZip = True)
Parameters:
  • path (str) – Path to the file directory
  • recursive (b (double) – Fraction of the files loaded into the dataframe
Returns:

DataFrame with a single column “value”; see binaryFileSchema for details

Return type:

DataFrame

BinaryFileReader.streamBinaryFiles(self, path, sampleRatio=1.0, inspectZip=True, seed=0)[source]

Streams the directory of binary files from the local or remote (WASB) source This function is attached to SparkSession class.

Example:
>>> spark.streamBinaryFiles(path, sampleRatio = 1.0, inspectZip = True)
Parameters:path (str) – Path to the file directory
Returns:DataFrame with a single column “value”; see binaryFileSchema for details
Return type:DataFrame