ImageReader¶
-
ImageReader.isImage(df, column)[source]¶ Returns True if the column contains images
Parameters: - df (DataFrame) – The DataFrame to be processed
- column (str) – The name of the column being inspected
Returns: True if the colum is an image column
Return type:
-
ImageReader.readFromPaths(df, pathCol, imageCol='image')[source]¶ Reads images from a column of filenames
Parameters: Returns: The dataframe with loaded images
Return type: df
-
ImageReader.readFromStrings(df, bytesCol, imageCol='image', dropPrefix=False)[source]¶ Reads images from a column of filenames
Parameters: Returns: The dataframe with loaded images
Return type: df
-
ImageReader.readImages(sparkSession, path, recursive=False, sampleRatio=1.0, inspectZip=True, seed=0)[source]¶ Reads the directory of images from the local or remote (WASB) source. This function is attached to SparkSession class. Example: spark.readImages(path, recursive, …)
Parameters: Returns: DataFrame with a single column of “images”, see imageSchema for details
Return type: DataFrame
-
ImageReader.streamImages(sparkSession, path, sampleRatio=1.0, inspectZip=True, seed=0)[source]¶ Reads the directory of images from the local or remote (WASB) source. This function is attached to SparkSession class. Example: spark.streamImages(path, .5, …)
Parameters: - sparkSession (SparkSession) – Existing sparkSession
- path (str) – Path to the image directory
- sampleRatio (double) – Fraction of the images loaded
- inspectZip – (boolean): Whether to look inside zip folders
Returns: DataFrame with a single column of “images”, see imageSchema for details
Return type: DataFrame