ImageReader

ImageReader.isImage(df, column)[source]

Returns True if the column contains images

Parameters:
  • df (DataFrame) – The DataFrame to be processed
  • column (str) – The name of the column being inspected
Returns:

True if the colum is an image column

Return type:

bool

ImageReader.readFromPaths(df, pathCol, imageCol='image')[source]

Reads images from a column of filenames

Parameters:
  • df (DataFrame) – The DataFrame to be processed
  • pathCol (str) – The name of the column containing filenames
  • imageCol (str) – The name of the added column of images
Returns:

The dataframe with loaded images

Return type:

df

ImageReader.readFromStrings(df, bytesCol, imageCol='image', dropPrefix=False)[source]

Reads images from a column of filenames

Parameters:
  • df (DataFrame) – The DataFrame to be processed
  • pathCol (str) – The name of the column containing filenames
  • imageCol (str) – The name of the added column of images
Returns:

The dataframe with loaded images

Return type:

df

ImageReader.readImages(sparkSession, path, recursive=False, sampleRatio=1.0, inspectZip=True, seed=0)[source]

Reads the directory of images from the local or remote (WASB) source. This function is attached to SparkSession class. Example: spark.readImages(path, recursive, …)

Parameters:
  • sparkSession (SparkSession) – Existing sparkSession
  • path (str) – Path to the image directory
  • recursive (bool) – Recursive search flag
  • sampleRatio (double) – Fraction of the images loaded
Returns:

DataFrame with a single column of “images”, see imageSchema for details

Return type:

DataFrame

ImageReader.streamImages(sparkSession, path, sampleRatio=1.0, inspectZip=True, seed=0)[source]

Reads the directory of images from the local or remote (WASB) source. This function is attached to SparkSession class. Example: spark.streamImages(path, .5, …)

Parameters:
  • sparkSession (SparkSession) – Existing sparkSession
  • path (str) – Path to the image directory
  • sampleRatio (double) – Fraction of the images loaded
  • inspectZip – (boolean): Whether to look inside zip folders
Returns:

DataFrame with a single column of “images”, see imageSchema for details

Return type:

DataFrame