RecognizeText

class RecognizeText.RecognizeText(backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=100.0, errorCol=None, imageBytes=None, imageUrl=None, maxPollingRetries=1000, mode=None, outputCol=None, subscriptionKey=None, timeout=60.0, url=None)[source]

Bases: mmlspark.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters:
  • backoffs (list) – array of backoffs to use in the handler (default: [I@41128e91)
  • concurrency (int) – max number of concurrent calls (default: 1)
  • concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)
  • errorCol (str) – column to hold http errors (default: [self.uid]_error)
  • imageBytes (object) – bytestream of the image to use
  • imageUrl (object) – the url of the image to use
  • maxPollingRetries (int) – number of times to poll (default: 1000)
  • mode (object) – If this parameter is set to ‘Printed’, printed text recognition is performed. If ‘Handwritten’ is specified, handwriting recognition is performed
  • outputCol (str) – The name of the output column (default: [self.uid]_output)
  • subscriptionKey (object) – the API key to use
  • timeout (double) – number of seconds to wait before closing the connection (default: 60.0)
  • url (str) – Url of the service
getBackoffs()[source]
Returns:array of backoffs to use in the handler (default: [I@41128e91)
Return type:list
getConcurrency()[source]
Returns:max number of concurrent calls (default: 1)
Return type:int
getConcurrentTimeout()[source]
Returns:max number seconds to wait on futures if concurrency >= 1 (default: 100.0)
Return type:double
getErrorCol()[source]
Returns:column to hold http errors (default: [self.uid]_error)
Return type:str
getImageBytes()[source]
Returns:bytestream of the image to use
Return type:object
getImageUrl()[source]
Returns:the url of the image to use
Return type:object
static getJavaPackage()[source]

Returns package name String.

getMaxPollingRetries()[source]
Returns:number of times to poll (default: 1000)
Return type:int
getMode()[source]
Returns:If this parameter is set to ‘Printed’, printed text recognition is performed. If ‘Handwritten’ is specified, handwriting recognition is performed
Return type:object
getOutputCol()[source]
Returns:The name of the output column (default: [self.uid]_output)
Return type:str
getSubscriptionKey()[source]
Returns:the API key to use
Return type:object
getTimeout()[source]
Returns:number of seconds to wait before closing the connection (default: 60.0)
Return type:double
getUrl()[source]
Returns:Url of the service
Return type:str
classmethod read()[source]

Returns an MLReader instance for this class.

setBackoffs(value)[source]
Parameters:backoffs (list) – array of backoffs to use in the handler (default: [I@41128e91)
setConcurrency(value)[source]
Parameters:concurrency (int) – max number of concurrent calls (default: 1)
setConcurrentTimeout(value)[source]
Parameters:concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)
setErrorCol(value)[source]
Parameters:errorCol (str) – column to hold http errors (default: [self.uid]_error)
setImageBytes(value)[source]
Parameters:imageBytes (object) – bytestream of the image to use
setImageBytesCol(value)[source]
Parameters:imageBytes (object) – bytestream of the image to use
setImageUrl(value)[source]
Parameters:imageUrl (object) – the url of the image to use
setImageUrlCol(value)[source]
Parameters:imageUrl (object) – the url of the image to use
setMaxPollingRetries(value)[source]
Parameters:maxPollingRetries (int) – number of times to poll (default: 1000)
setMode(value)[source]
Parameters:mode (object) – If this parameter is set to ‘Printed’, printed text recognition is performed. If ‘Handwritten’ is specified, handwriting recognition is performed
setModeCol(value)[source]
Parameters:mode (object) – If this parameter is set to ‘Printed’, printed text recognition is performed. If ‘Handwritten’ is specified, handwriting recognition is performed
setOutputCol(value)[source]
Parameters:outputCol (str) – The name of the output column (default: [self.uid]_output)
setParams(backoffs=[100, 500, 1000], concurrency=1, concurrentTimeout=100.0, errorCol=None, imageBytes=None, imageUrl=None, maxPollingRetries=1000, mode=None, outputCol=None, subscriptionKey=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

Parameters:
  • backoffs (list) – array of backoffs to use in the handler (default: [I@41128e91)
  • concurrency (int) – max number of concurrent calls (default: 1)
  • concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)
  • errorCol (str) – column to hold http errors (default: [self.uid]_error)
  • imageBytes (object) – bytestream of the image to use
  • imageUrl (object) – the url of the image to use
  • maxPollingRetries (int) – number of times to poll (default: 1000)
  • mode (object) – If this parameter is set to ‘Printed’, printed text recognition is performed. If ‘Handwritten’ is specified, handwriting recognition is performed
  • outputCol (str) – The name of the output column (default: [self.uid]_output)
  • subscriptionKey (object) – the API key to use
  • timeout (double) – number of seconds to wait before closing the connection (default: 60.0)
  • url (str) – Url of the service
setSubscriptionKey(value)[source]
Parameters:subscriptionKey (object) – the API key to use
setSubscriptionKeyCol(value)[source]
Parameters:subscriptionKey (object) – the API key to use
setTimeout(value)[source]
Parameters:timeout (double) – number of seconds to wait before closing the connection (default: 60.0)
setUrl(value)[source]
Parameters:url (str) – Url of the service