BinaryFileReader¶
-
BinaryFileReader.
BinaryFileFields
= ['path', 'bytes']¶ Names of Binary File Schema field names.
-
BinaryFileReader.
BinaryFileSchema
= StructType(List(StructField(path,StringType,true),StructField(bytes,BinaryType,true)))¶ Schema for Binary Files.
- Schema records consist of BinaryFileFields name, Type, and ??
- path bytes
-
BinaryFileReader.
isBinaryFile
(df, column)[source]¶ Returns True if the column contains binary files
Parameters: - df (DataFrame) – The DataFrame to be processed
- column (bool) – The name of the column being inspected
Returns: True if the colum is a binary files column
Return type:
-
BinaryFileReader.
readBinaryFiles
(self, path, recursive=False, sampleRatio=1.0, inspectZip=True, seed=0)[source]¶ Reads the directory of binary files from the local or remote (WASB) source This function is attached to SparkSession class.
Example: >>> spark.readBinaryFiles(path, recursive, sampleRatio = 1.0, inspectZip = True)
Parameters: - path (str) – Path to the file directory
- recursive (b (double) – Fraction of the files loaded into the dataframe
Returns: DataFrame with a single column “value”; see binaryFileSchema for details
Return type: DataFrame
-
BinaryFileReader.
streamBinaryFiles
(self, path, sampleRatio=1.0, inspectZip=True, seed=0)[source]¶ Streams the directory of binary files from the local or remote (WASB) source This function is attached to SparkSession class.
Example: >>> spark.streamBinaryFiles(path, sampleRatio = 1.0, inspectZip = True)
Parameters: path (str) – Path to the file directory Returns: DataFrame with a single column “value”; see binaryFileSchema for details Return type: DataFrame