AddDocuments

class AddDocuments.AddDocuments(actionCol='@search.action', batchSize=100, concurrency=1, concurrentTimeout=100.0, errorCol=None, handler=None, indexName=None, outputCol=None, serviceName=None, subscriptionKey=None, timeout=60.0, url=None)[source]

Bases: mmlspark.Utils.ComplexParamsMixin, pyspark.ml.util.JavaMLReadable, pyspark.ml.util.JavaMLWritable, pyspark.ml.wrapper.JavaTransformer

Parameters:
  • actionCol (str) – You can combine actions, such as an upload and a delete, in the same batch.upload: An upload action is similar to an ‘upsert’where the document will be inserted if it is new and updated/replacedif it exists. Note that all fields are replaced in the update case.merge: Merge updates an existing document with the specified fields.If the document doesn’t exist, the merge will fail. Any fieldyou specify in a merge will replace the existing field in the document.This includes fields of type Collection(Edm.String). For example, ifthe document contains a field ‘tags’ with value [‘budget’] and you executea merge with value [‘economy’, ‘pool’] for ‘tags’, the final valueof the ‘tags’ field will be [‘economy’, ‘pool’]. It will not be [‘budget’, ‘economy’, ‘pool’].mergeOrUpload: This action behaves like merge if a document with the given key already exists in the index. If the document does not exist, it behaves like upload with a new document.delete: Delete removes the specified document from the index. Note that any field you specify in a delete operation, other than the key field, will be ignored. If you want to remove an individual field from a document, use merge instead and simply set the field explicitly to null. (default: @search.action)
  • batchSize (int) – The max size of the buffer (default: 100)
  • concurrency (int) – max number of concurrent calls (default: 1)
  • concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)
  • errorCol (str) – column to hold http errors (default: [self.uid]_error)
  • handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))
  • indexName (str) –
  • outputCol (str) – The name of the output column (default: [self.uid]_output)
  • serviceName (str) –
  • subscriptionKey (object) – the API key to use
  • timeout (double) – number of seconds to wait before closing the connection (default: 60.0)
  • url (str) – Url of the service
getActionCol()[source]
Returns:You can combine actions, such as an upload and a delete, in the same batch.upload: An upload action is similar to an ‘upsert’where the document will be inserted if it is new and updated/replacedif it exists. Note that all fields are replaced in the update case.merge: Merge updates an existing document with the specified fields.If the document doesn’t exist, the merge will fail. Any fieldyou specify in a merge will replace the existing field in the document.This includes fields of type Collection(Edm.String). For example, ifthe document contains a field ‘tags’ with value [‘budget’] and you executea merge with value [‘economy’, ‘pool’] for ‘tags’, the final valueof the ‘tags’ field will be [‘economy’, ‘pool’]. It will not be [‘budget’, ‘economy’, ‘pool’].mergeOrUpload: This action behaves like merge if a document with the given key already exists in the index. If the document does not exist, it behaves like upload with a new document.delete: Delete removes the specified document from the index. Note that any field you specify in a delete operation, other than the key field, will be ignored. If you want to remove an individual field from a document, use merge instead and simply set the field explicitly to null. (default: @search.action)
Return type:str
getBatchSize()[source]
Returns:The max size of the buffer (default: 100)
Return type:int
getConcurrency()[source]
Returns:max number of concurrent calls (default: 1)
Return type:int
getConcurrentTimeout()[source]
Returns:max number seconds to wait on futures if concurrency >= 1 (default: 100.0)
Return type:double
getErrorCol()[source]
Returns:column to hold http errors (default: [self.uid]_error)
Return type:str
getHandler()[source]
Returns:Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))
Return type:object
getIndexName()[source]
Returns:
Return type:str
static getJavaPackage()[source]

Returns package name String.

getOutputCol()[source]
Returns:The name of the output column (default: [self.uid]_output)
Return type:str
getServiceName()[source]
Returns:
Return type:str
getSubscriptionKey()[source]
Returns:the API key to use
Return type:object
getTimeout()[source]
Returns:number of seconds to wait before closing the connection (default: 60.0)
Return type:double
getUrl()[source]
Returns:Url of the service
Return type:str
classmethod read()[source]

Returns an MLReader instance for this class.

setActionCol(value)[source]
Parameters:actionCol (str) – You can combine actions, such as an upload and a delete, in the same batch.upload: An upload action is similar to an ‘upsert’where the document will be inserted if it is new and updated/replacedif it exists. Note that all fields are replaced in the update case.merge: Merge updates an existing document with the specified fields.If the document doesn’t exist, the merge will fail. Any fieldyou specify in a merge will replace the existing field in the document.This includes fields of type Collection(Edm.String). For example, ifthe document contains a field ‘tags’ with value [‘budget’] and you executea merge with value [‘economy’, ‘pool’] for ‘tags’, the final valueof the ‘tags’ field will be [‘economy’, ‘pool’]. It will not be [‘budget’, ‘economy’, ‘pool’].mergeOrUpload: This action behaves like merge if a document with the given key already exists in the index. If the document does not exist, it behaves like upload with a new document.delete: Delete removes the specified document from the index. Note that any field you specify in a delete operation, other than the key field, will be ignored. If you want to remove an individual field from a document, use merge instead and simply set the field explicitly to null. (default: @search.action)
setBatchSize(value)[source]
Parameters:batchSize (int) – The max size of the buffer (default: 100)
setConcurrency(value)[source]
Parameters:concurrency (int) – max number of concurrent calls (default: 1)
setConcurrentTimeout(value)[source]
Parameters:concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)
setErrorCol(value)[source]
Parameters:errorCol (str) – column to hold http errors (default: [self.uid]_error)
setHandler(value)[source]
Parameters:handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))
setIndexName(value)[source]
Parameters:indexName (str) –
setOutputCol(value)[source]
Parameters:outputCol (str) – The name of the output column (default: [self.uid]_output)
setParams(actionCol='@search.action', batchSize=100, concurrency=1, concurrentTimeout=100.0, errorCol=None, handler=None, indexName=None, outputCol=None, serviceName=None, subscriptionKey=None, timeout=60.0, url=None)[source]

Set the (keyword only) parameters

Parameters:
  • actionCol (str) – You can combine actions, such as an upload and a delete, in the same batch.upload: An upload action is similar to an ‘upsert’where the document will be inserted if it is new and updated/replacedif it exists. Note that all fields are replaced in the update case.merge: Merge updates an existing document with the specified fields.If the document doesn’t exist, the merge will fail. Any fieldyou specify in a merge will replace the existing field in the document.This includes fields of type Collection(Edm.String). For example, ifthe document contains a field ‘tags’ with value [‘budget’] and you executea merge with value [‘economy’, ‘pool’] for ‘tags’, the final valueof the ‘tags’ field will be [‘economy’, ‘pool’]. It will not be [‘budget’, ‘economy’, ‘pool’].mergeOrUpload: This action behaves like merge if a document with the given key already exists in the index. If the document does not exist, it behaves like upload with a new document.delete: Delete removes the specified document from the index. Note that any field you specify in a delete operation, other than the key field, will be ignored. If you want to remove an individual field from a document, use merge instead and simply set the field explicitly to null. (default: @search.action)
  • batchSize (int) – The max size of the buffer (default: 100)
  • concurrency (int) – max number of concurrent calls (default: 1)
  • concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)
  • errorCol (str) – column to hold http errors (default: [self.uid]_error)
  • handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))
  • indexName (str) –
  • outputCol (str) – The name of the output column (default: [self.uid]_output)
  • serviceName (str) –
  • subscriptionKey (object) – the API key to use
  • timeout (double) – number of seconds to wait before closing the connection (default: 60.0)
  • url (str) – Url of the service
setServiceName(value)[source]
Parameters:serviceName (str) –
setSubscriptionKey(value)[source]
Parameters:subscriptionKey (object) – the API key to use
setSubscriptionKeyCol(value)[source]
Parameters:subscriptionKey (object) – the API key to use
setTimeout(value)[source]
Parameters:timeout (double) – number of seconds to wait before closing the connection (default: 60.0)
setUrl(value)[source]
Parameters:url (str) – Url of the service