Class/Object

com.microsoft.ml.spark

TextFeaturizer

Related Docs: object TextFeaturizer | package spark

Permalink

class TextFeaturizer extends Estimator[TextFeaturizerModel] with TextFeaturizerParams with HasInputCol with HasOutputCol

Featurize text.

Linear Supertypes
HasOutputCol, HasInputCol, TextFeaturizerParams, DefaultParamsWritable, MLWritable, Wrappable, Estimator[TextFeaturizerModel], PipelineStage, org.apache.spark.internal.Logging, Params, Serializable, Serializable, Identifiable, AnyRef, Any
Ordering
  1. Grouped
  2. Alphabetic
  3. By Inheritance
Inherited
  1. TextFeaturizer
  2. HasOutputCol
  3. HasInputCol
  4. TextFeaturizerParams
  5. DefaultParamsWritable
  6. MLWritable
  7. Wrappable
  8. Estimator
  9. PipelineStage
  10. Logging
  11. Params
  12. Serializable
  13. Serializable
  14. Identifiable
  15. AnyRef
  16. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new TextFeaturizer()

    Permalink
  2. new TextFeaturizer(uid: String)

    Permalink

    uid

    The id of the module

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def $[T](param: Param[T]): T

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  4. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  5. def BooleanParam(i: Identifiable, name: String, description: String, default: Boolean): BooleanParam

    Permalink
    Definition Classes
    Wrappable
  6. def BooleanParam(i: Identifiable, name: String, description: String): BooleanParam

    Permalink
    Definition Classes
    Wrappable
  7. def DoubleParam(i: Identifiable, name: String, description: String, default: Double): DoubleParam

    Permalink
    Definition Classes
    Wrappable
  8. def DoubleParam(i: Identifiable, name: String, description: String): DoubleParam

    Permalink
    Definition Classes
    Wrappable
  9. def IntParam(i: Identifiable, name: String, description: String, validation: (Int) ⇒ Boolean): IntParam

    Permalink
    Definition Classes
    Wrappable
  10. def IntParam(i: Identifiable, name: String, description: String, default: Int): IntParam

    Permalink
    Definition Classes
    Wrappable
  11. def IntParam(i: Identifiable, name: String, description: String): IntParam

    Permalink
    Definition Classes
    Wrappable
  12. def LongParam(i: Identifiable, name: String, description: String, default: Long): LongParam

    Permalink
    Definition Classes
    Wrappable
  13. def LongParam(i: Identifiable, name: String, description: String): LongParam

    Permalink
    Definition Classes
    Wrappable
  14. def StringParam(i: Identifiable, name: String, description: String, default: String, domain: Seq[String]): Param[String]

    Permalink
    Definition Classes
    Wrappable
  15. def StringParam(i: Identifiable, name: String, description: String, default: String): Param[String]

    Permalink
    Definition Classes
    Wrappable
  16. def StringParam(i: Identifiable, name: String, description: String, validation: (String) ⇒ Boolean): Param[String]

    Permalink
    Definition Classes
    Wrappable
  17. def StringParam(i: Identifiable, name: String, description: String): Param[String]

    Permalink
    Definition Classes
    Wrappable
  18. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  19. val binary: BooleanParam

    Permalink

    All nonnegative word counts are set to 1 when set to true

    All nonnegative word counts are set to 1 when set to true

    Definition Classes
    TextFeaturizerParams
  20. val caseSensitiveStopWords: BooleanParam

    Permalink

    Indicates whether a case sensitive comparison is performed on stop words.

    Indicates whether a case sensitive comparison is performed on stop words.

    Definition Classes
    TextFeaturizerParams
  21. def chainedUid(origin: String): String

    Permalink
    Definition Classes
    Wrappable
  22. final def clear(param: Param[_]): TextFeaturizer.this.type

    Permalink
    Definition Classes
    Params
  23. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  24. def copy(extra: ParamMap): Estimator[TextFeaturizerModel]

    Permalink
    Definition Classes
    TextFeaturizer → Estimator → PipelineStage → Params
  25. def copyValues[T <: Params](to: T, extra: ParamMap): T

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  26. final def defaultCopy[T <: Params](extra: ParamMap): T

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  27. val defaultStopWordLanguage: Param[String]

    Permalink

    Specify the language to use for stop word removal.

    Specify the language to use for stop word removal. The Use the custom setting when using the stopWords input

    Definition Classes
    TextFeaturizerParams
  28. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  29. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  30. def explainParam(param: Param[_]): String

    Permalink
    Definition Classes
    Params
  31. def explainParams(): String

    Permalink
    Definition Classes
    Params
  32. final def extractParamMap(): ParamMap

    Permalink
    Definition Classes
    Params
  33. final def extractParamMap(extra: ParamMap): ParamMap

    Permalink
    Definition Classes
    Params
  34. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  35. def fit(dataset: Dataset[_]): TextFeaturizerModel

    Permalink
    Definition Classes
    TextFeaturizer → Estimator
  36. def fit(dataset: Dataset[_], paramMaps: Array[ParamMap]): Seq[TextFeaturizerModel]

    Permalink
    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" )
  37. def fit(dataset: Dataset[_], paramMap: ParamMap): TextFeaturizerModel

    Permalink
    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" )
  38. def fit(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): TextFeaturizerModel

    Permalink
    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" ) @varargs()
  39. final def get[T](param: Param[T]): Option[T]

    Permalink
    Definition Classes
    Params
  40. final def getBinary: Boolean

    Permalink

    Definition Classes
    TextFeaturizerParams
  41. final def getCaseSensitiveStopWords: Boolean

    Permalink

    Definition Classes
    TextFeaturizerParams
  42. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  43. final def getDefault[T](param: Param[T]): Option[T]

    Permalink
    Definition Classes
    Params
  44. final def getDefaultStopWordLanguage: String

    Permalink

    Definition Classes
    TextFeaturizerParams
  45. def getInputCol: String

    Permalink

    Definition Classes
    HasInputCol
  46. final def getMinDocFreq: Int

    Permalink

    Definition Classes
    TextFeaturizerParams
  47. final def getMinTokenLength: Int

    Permalink

    Definition Classes
    TextFeaturizerParams
  48. final def getNGramLength: Int

    Permalink

    Definition Classes
    TextFeaturizerParams
  49. final def getNumFeatures: Int

    Permalink

    Definition Classes
    TextFeaturizerParams
  50. final def getOrDefault[T](param: Param[T]): T

    Permalink
    Definition Classes
    Params
  51. def getOutputCol: String

    Permalink

    Definition Classes
    HasOutputCol
  52. def getParam(paramName: String): Param[Any]

    Permalink
    Definition Classes
    Params
  53. final def getStopWords: String

    Permalink

    Definition Classes
    TextFeaturizerParams
  54. final def getToLowercase: Boolean

    Permalink

    Definition Classes
    TextFeaturizerParams
  55. final def getTokenizerGaps: Boolean

    Permalink

    Definition Classes
    TextFeaturizerParams
  56. final def getTokenizerPattern: String

    Permalink

    Definition Classes
    TextFeaturizerParams
  57. final def getUseIDF: Boolean

    Permalink

    Definition Classes
    TextFeaturizerParams
  58. final def getUseNGram: Boolean

    Permalink

    Definition Classes
    TextFeaturizerParams
  59. final def getUseStopWordsRemover: Boolean

    Permalink

    Definition Classes
    TextFeaturizerParams
  60. final def getUseTokenizer: Boolean

    Permalink

    Definition Classes
    TextFeaturizerParams
  61. final def hasDefault[T](param: Param[T]): Boolean

    Permalink
    Definition Classes
    Params
  62. def hasParam(paramName: String): Boolean

    Permalink
    Definition Classes
    Params
  63. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  64. def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  65. def initializeLogIfNecessary(isInterpreter: Boolean): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  66. val inputCol: Param[String]

    Permalink

    The name of the input column

    The name of the input column

    Definition Classes
    HasInputCol
  67. final def isDefined(param: Param[_]): Boolean

    Permalink
    Definition Classes
    Params
  68. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  69. final def isSet(param: Param[_]): Boolean

    Permalink
    Definition Classes
    Params
  70. def isTraceEnabled(): Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  71. def log: Logger

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  72. def logDebug(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  73. def logDebug(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  74. def logError(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  75. def logError(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  76. def logInfo(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  77. def logInfo(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  78. def logName: String

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  79. def logTrace(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  80. def logTrace(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  81. def logWarning(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  82. def logWarning(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  83. val minDocFreq: IntParam

    Permalink

    Minimum number of documents in which a term should appear.

    Minimum number of documents in which a term should appear.

    Definition Classes
    TextFeaturizerParams
  84. val minTokenLength: IntParam

    Permalink

    Minumum token length; must be 0 or greater.

    Minumum token length; must be 0 or greater.

    Definition Classes
    TextFeaturizerParams
  85. val nGramLength: IntParam

    Permalink

    The size of the Ngrams

    The size of the Ngrams

    Definition Classes
    TextFeaturizerParams
  86. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  87. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  88. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  89. val numFeatures: IntParam

    Permalink

    Set the number of features to hash each document to

    Set the number of features to hash each document to

    Definition Classes
    TextFeaturizerParams
  90. val outputCol: Param[String]

    Permalink

    The name of the output column

    The name of the output column

    Definition Classes
    HasOutputCol
  91. val paramDomains: Map[String, Seq[String]]

    Permalink
    Definition Classes
    Wrappable
  92. lazy val params: Array[Param[_]]

    Permalink
    Definition Classes
    Params
  93. def save(path: String): Unit

    Permalink
    Definition Classes
    MLWritable
    Annotations
    @Since( "1.6.0" ) @throws( ... )
  94. final def set(paramPair: ParamPair[_]): TextFeaturizer.this.type

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  95. final def set(param: String, value: Any): TextFeaturizer.this.type

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  96. final def set[T](param: Param[T], value: T): TextFeaturizer.this.type

    Permalink
    Definition Classes
    Params
  97. def setBinary(value: Boolean): TextFeaturizer.this.type

    Permalink

  98. def setCaseSensitiveStopWords(value: Boolean): TextFeaturizer.this.type

    Permalink

  99. final def setDefault(paramPairs: ParamPair[_]*): TextFeaturizer.this.type

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  100. final def setDefault[T](param: Param[T], value: T): TextFeaturizer.this.type

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  101. def setDefaultStopWordLanguage(value: String): TextFeaturizer.this.type

    Permalink

  102. def setInputCol(value: String): TextFeaturizer.this.type

    Permalink

    Definition Classes
    HasInputCol
  103. def setMinDocFreq(value: Int): TextFeaturizer.this.type

    Permalink

  104. def setMinTokenLength(value: Int): TextFeaturizer.this.type

    Permalink

  105. def setNGramLength(value: Int): TextFeaturizer.this.type

    Permalink

  106. def setNumFeatures(value: Int): TextFeaturizer.this.type

    Permalink

  107. def setOutputCol(value: String): TextFeaturizer.this.type

    Permalink

    Definition Classes
    HasOutputCol
  108. def setStopWords(value: String): TextFeaturizer.this.type

    Permalink

  109. def setToLowercase(value: Boolean): TextFeaturizer.this.type

    Permalink

  110. def setTokenizerGaps(value: Boolean): TextFeaturizer.this.type

    Permalink

  111. def setTokenizerPattern(value: String): TextFeaturizer.this.type

    Permalink

  112. def setUseIDF(value: Boolean): TextFeaturizer.this.type

    Permalink

  113. def setUseNGram(value: Boolean): TextFeaturizer.this.type

    Permalink

  114. def setUseStopWordsRemover(value: Boolean): TextFeaturizer.this.type

    Permalink

  115. def setUseTokenizer(value: Boolean): TextFeaturizer.this.type

    Permalink
  116. val stopWords: Param[String]

    Permalink

    The words to be filtered out.

    The words to be filtered out. This is a comma separated list of words, encoded as a single string. For example, "a, the, and"

    Definition Classes
    TextFeaturizerParams
  117. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  118. val toLowercase: BooleanParam

    Permalink

    Indicates whether to convert all characters to lowercase before tokenizing.

    Indicates whether to convert all characters to lowercase before tokenizing.

    Definition Classes
    TextFeaturizerParams
  119. def toString(): String

    Permalink
    Definition Classes
    Identifiable → AnyRef → Any
  120. val tokenizerGaps: BooleanParam

    Permalink

    Indicates whether the regex splits on gaps (true) or matches tokens (false)

    Indicates whether the regex splits on gaps (true) or matches tokens (false)

    Definition Classes
    TextFeaturizerParams
  121. val tokenizerPattern: Param[String]

    Permalink

    Regex pattern used to match delimiters if gaps (true) or tokens (false)

    Regex pattern used to match delimiters if gaps (true) or tokens (false)

    Definition Classes
    TextFeaturizerParams
  122. def transformSchema(schema: StructType): StructType

    Permalink
    Definition Classes
    TextFeaturizer → PipelineStage
  123. def transformSchema(schema: StructType, logging: Boolean): StructType

    Permalink
    Attributes
    protected
    Definition Classes
    PipelineStage
    Annotations
    @DeveloperApi()
  124. val uid: String

    Permalink

    The id of the module

    The id of the module

    Definition Classes
    TextFeaturizer → Identifiable
  125. val useIDF: BooleanParam

    Permalink

    Scale the Term Frequencies by IDF when set to true

    Scale the Term Frequencies by IDF when set to true

    Definition Classes
    TextFeaturizerParams
  126. val useNGram: BooleanParam

    Permalink

    Enumerate N grams when set

    Enumerate N grams when set

    Definition Classes
    TextFeaturizerParams
  127. val useStopWordsRemover: BooleanParam

    Permalink

    Indicates whether to remove stop words from tokenized data.

    Indicates whether to remove stop words from tokenized data.

    Definition Classes
    TextFeaturizerParams
  128. val useTokenizer: BooleanParam

    Permalink

    Tokenize the input when set to true

    Tokenize the input when set to true

    Definition Classes
    TextFeaturizerParams
  129. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  130. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  131. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  132. def write: MLWriter

    Permalink
    Definition Classes
    DefaultParamsWritable → MLWritable

Inherited from HasOutputCol

Inherited from HasInputCol

Inherited from TextFeaturizerParams

Inherited from DefaultParamsWritable

Inherited from MLWritable

Inherited from Wrappable

Inherited from Estimator[TextFeaturizerModel]

Inherited from PipelineStage

Inherited from org.apache.spark.internal.Logging

Inherited from Params

Inherited from Serializable

Inherited from Serializable

Inherited from Identifiable

Inherited from AnyRef

Inherited from Any

Parameters

A list of parameter keys this algorithm can take. Users can set and get the parameter values through setters and getters

Parameter setters

Parameter getters

Members