AddDocuments¶
-
class
AddDocuments.
AddDocuments
(actionCol='@search.action', batchSize=100, concurrency=1, concurrentTimeout=100.0, errorCol=None, handler=None, indexName=None, outputCol=None, serviceName=None, subscriptionKey=None, timeout=60.0, url=None)[source]¶ Bases:
mmlspark.Utils.ComplexParamsMixin
,pyspark.ml.util.JavaMLReadable
,pyspark.ml.util.JavaMLWritable
,pyspark.ml.wrapper.JavaTransformer
Parameters: - actionCol (str) – You can combine actions, such as an upload and a delete, in the same batch.upload: An upload action is similar to an ‘upsert’where the document will be inserted if it is new and updated/replacedif it exists. Note that all fields are replaced in the update case.merge: Merge updates an existing document with the specified fields.If the document doesn’t exist, the merge will fail. Any fieldyou specify in a merge will replace the existing field in the document.This includes fields of type Collection(Edm.String). For example, ifthe document contains a field ‘tags’ with value [‘budget’] and you executea merge with value [‘economy’, ‘pool’] for ‘tags’, the final valueof the ‘tags’ field will be [‘economy’, ‘pool’]. It will not be [‘budget’, ‘economy’, ‘pool’].mergeOrUpload: This action behaves like merge if a document with the given key already exists in the index. If the document does not exist, it behaves like upload with a new document.delete: Delete removes the specified document from the index. Note that any field you specify in a delete operation, other than the key field, will be ignored. If you want to remove an individual field from a document, use merge instead and simply set the field explicitly to null. (default: @search.action)
- batchSize (int) – The max size of the buffer (default: 100)
- concurrency (int) – max number of concurrent calls (default: 1)
- concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)
- errorCol (str) – column to hold http errors (default: [self.uid]_error)
- handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))
- indexName (str) –
- outputCol (str) – The name of the output column (default: [self.uid]_output)
- serviceName (str) –
- subscriptionKey (object) – the API key to use
- timeout (double) – number of seconds to wait before closing the connection (default: 60.0)
- url (str) – Url of the service
-
getActionCol
()[source]¶ Returns: You can combine actions, such as an upload and a delete, in the same batch.upload: An upload action is similar to an ‘upsert’where the document will be inserted if it is new and updated/replacedif it exists. Note that all fields are replaced in the update case.merge: Merge updates an existing document with the specified fields.If the document doesn’t exist, the merge will fail. Any fieldyou specify in a merge will replace the existing field in the document.This includes fields of type Collection(Edm.String). For example, ifthe document contains a field ‘tags’ with value [‘budget’] and you executea merge with value [‘economy’, ‘pool’] for ‘tags’, the final valueof the ‘tags’ field will be [‘economy’, ‘pool’]. It will not be [‘budget’, ‘economy’, ‘pool’].mergeOrUpload: This action behaves like merge if a document with the given key already exists in the index. If the document does not exist, it behaves like upload with a new document.delete: Delete removes the specified document from the index. Note that any field you specify in a delete operation, other than the key field, will be ignored. If you want to remove an individual field from a document, use merge instead and simply set the field explicitly to null. (default: @search.action) Return type: str
-
getConcurrentTimeout
()[source]¶ Returns: max number seconds to wait on futures if concurrency >= 1 (default: 100.0) Return type: double
-
getErrorCol
()[source]¶ Returns: column to hold http errors (default: [self.uid]_error) Return type: str
-
getHandler
()[source]¶ Returns: Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None)) Return type: object
-
getOutputCol
()[source]¶ Returns: The name of the output column (default: [self.uid]_output) Return type: str
-
getTimeout
()[source]¶ Returns: number of seconds to wait before closing the connection (default: 60.0) Return type: double
-
setActionCol
(value)[source]¶ Parameters: actionCol (str) – You can combine actions, such as an upload and a delete, in the same batch.upload: An upload action is similar to an ‘upsert’where the document will be inserted if it is new and updated/replacedif it exists. Note that all fields are replaced in the update case.merge: Merge updates an existing document with the specified fields.If the document doesn’t exist, the merge will fail. Any fieldyou specify in a merge will replace the existing field in the document.This includes fields of type Collection(Edm.String). For example, ifthe document contains a field ‘tags’ with value [‘budget’] and you executea merge with value [‘economy’, ‘pool’] for ‘tags’, the final valueof the ‘tags’ field will be [‘economy’, ‘pool’]. It will not be [‘budget’, ‘economy’, ‘pool’].mergeOrUpload: This action behaves like merge if a document with the given key already exists in the index. If the document does not exist, it behaves like upload with a new document.delete: Delete removes the specified document from the index. Note that any field you specify in a delete operation, other than the key field, will be ignored. If you want to remove an individual field from a document, use merge instead and simply set the field explicitly to null. (default: @search.action)
-
setConcurrency
(value)[source]¶ Parameters: concurrency (int) – max number of concurrent calls (default: 1)
-
setConcurrentTimeout
(value)[source]¶ Parameters: concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)
-
setErrorCol
(value)[source]¶ Parameters: errorCol (str) – column to hold http errors (default: [self.uid]_error)
-
setHandler
(value)[source]¶ Parameters: handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))
-
setOutputCol
(value)[source]¶ Parameters: outputCol (str) – The name of the output column (default: [self.uid]_output)
-
setParams
(actionCol='@search.action', batchSize=100, concurrency=1, concurrentTimeout=100.0, errorCol=None, handler=None, indexName=None, outputCol=None, serviceName=None, subscriptionKey=None, timeout=60.0, url=None)[source]¶ Set the (keyword only) parameters
Parameters: - actionCol (str) – You can combine actions, such as an upload and a delete, in the same batch.upload: An upload action is similar to an ‘upsert’where the document will be inserted if it is new and updated/replacedif it exists. Note that all fields are replaced in the update case.merge: Merge updates an existing document with the specified fields.If the document doesn’t exist, the merge will fail. Any fieldyou specify in a merge will replace the existing field in the document.This includes fields of type Collection(Edm.String). For example, ifthe document contains a field ‘tags’ with value [‘budget’] and you executea merge with value [‘economy’, ‘pool’] for ‘tags’, the final valueof the ‘tags’ field will be [‘economy’, ‘pool’]. It will not be [‘budget’, ‘economy’, ‘pool’].mergeOrUpload: This action behaves like merge if a document with the given key already exists in the index. If the document does not exist, it behaves like upload with a new document.delete: Delete removes the specified document from the index. Note that any field you specify in a delete operation, other than the key field, will be ignored. If you want to remove an individual field from a document, use merge instead and simply set the field explicitly to null. (default: @search.action)
- batchSize (int) – The max size of the buffer (default: 100)
- concurrency (int) – max number of concurrent calls (default: 1)
- concurrentTimeout (double) – max number seconds to wait on futures if concurrency >= 1 (default: 100.0)
- errorCol (str) – column to hold http errors (default: [self.uid]_error)
- handler (object) – Which strategy to use when handling requests (default: UserDefinedFunction(<function2>,StringType,None))
- indexName (str) –
- outputCol (str) – The name of the output column (default: [self.uid]_output)
- serviceName (str) –
- subscriptionKey (object) – the API key to use
- timeout (double) – number of seconds to wait before closing the connection (default: 60.0)
- url (str) – Url of the service