ExtractionOptions

Inheritance: java.lang.Object

public class ExtractionOptions

Provides options for extracting data from documents.

Constructors

Constructor Description
ExtractionOptions() Initializes a new instance of the ExtractionOptions class.
ExtractionOptions(Object data) Initializes a new instance of the ExtractionOptions class.
ExtractionOptions(IndexingOptions options, IFieldExtractor customExtractor, IOcrConnector ocrConnector) Initializes a new instance of the ExtractionOptions class.

Methods

Method Description
getCustomExtractor() Gets the custom text extractor.
setCustomExtractor(IFieldExtractor value) Sets or sets the custom text extractor.
getAutoDetectEncoding() Gets a value indicating whether to detect encoding automatically or not.
setAutoDetectEncoding(boolean value) Sets a value indicating whether to detect encoding automatically or not.
getEncoding() Gets the encoding used to extract text from text documents.
setEncoding(String value) Sets the encoding used to extract text from text documents.
getUseRawTextExtraction() Gets a value indicating whether the raw mode is used for text extraction if possible.
setUseRawTextExtraction(boolean value) Sets a value indicating whether the raw mode is used for text extraction if possible.
getMetadataIndexingOptions() Gets the options for indexing metadata fields.
getOcrIndexingOptions() Gets the options for OCR processing and indexing recognized text.
getImageIndexingOptions() Gets the image indexing options for reverse image search.
getCore()

ExtractionOptions()

public ExtractionOptions()

Initializes a new instance of the ExtractionOptions class.

ExtractionOptions(Object data)

public ExtractionOptions(Object data)

Initializes a new instance of the ExtractionOptions class.

Parameters:

Parameter Type Description
data java.lang.Object The serialized data.

ExtractionOptions(IndexingOptions options, IFieldExtractor customExtractor, IOcrConnector ocrConnector)

public ExtractionOptions(IndexingOptions options, IFieldExtractor customExtractor, IOcrConnector ocrConnector)

Initializes a new instance of the ExtractionOptions class.

Parameters:

Parameter Type Description
options IndexingOptions The options.
customExtractor IFieldExtractor The custom extractor.
ocrConnector IOcrConnector The ocr connector.

getCustomExtractor()

public IFieldExtractor getCustomExtractor()

Gets the custom text extractor. The default value is null .

Returns: IFieldExtractor - The custom text extractor.

setCustomExtractor(IFieldExtractor value)

public void setCustomExtractor(IFieldExtractor value)

Sets or sets the custom text extractor. The default value is null .

Parameters:

Parameter Type Description
value IFieldExtractor The custom text extractor.

getAutoDetectEncoding()

public boolean getAutoDetectEncoding()

Gets a value indicating whether to detect encoding automatically or not. The default value is false .

Returns: boolean - A value indicating whether to detect encoding automatically or not.

setAutoDetectEncoding(boolean value)

public void setAutoDetectEncoding(boolean value)

Sets a value indicating whether to detect encoding automatically or not. The default value is false .

Parameters:

Parameter Type Description
value boolean A value indicating whether to detect encoding automatically or not.

getEncoding()

public String getEncoding()

Gets the encoding used to extract text from text documents. The default value is null , which means that the default encoding UTF-8 is used. If AutoDetectEncoding is true then this value is used as the default encoding.

Returns: java.lang.String - The encoding used to extract text from text documents.

setEncoding(String value)

public void setEncoding(String value)

Sets the encoding used to extract text from text documents. The default value is null , which means that the default encoding UTF-8 is used. If AutoDetectEncoding is true then this value is used as the default encoding.

Parameters:

Parameter Type Description
value java.lang.String The encoding used to extract text from text documents.

getUseRawTextExtraction()

public boolean getUseRawTextExtraction()

Gets a value indicating whether the raw mode is used for text extraction if possible. The default value is true . The raw mode can significantly increase the indexing speed, but normal mode improves the formatting of the extracted text.

Returns: boolean - A value indicating whether the raw mode is used for text extraction if possible.

setUseRawTextExtraction(boolean value)

public void setUseRawTextExtraction(boolean value)

Sets a value indicating whether the raw mode is used for text extraction if possible. The default value is true . The raw mode can significantly increase the indexing speed, but normal mode improves the formatting of the extracted text.

Parameters:

Parameter Type Description
value boolean A value indicating whether the raw mode is used for text extraction if possible.

getMetadataIndexingOptions()

public MetadataIndexingOptions getMetadataIndexingOptions()

Gets the options for indexing metadata fields.

Returns: MetadataIndexingOptions - The options for indexing metadata fields.

getOcrIndexingOptions()

public OcrIndexingOptions getOcrIndexingOptions()

Gets the options for OCR processing and indexing recognized text.

Returns: OcrIndexingOptions - The options for OCR processing and indexing recognized text.

getImageIndexingOptions()

public ImageIndexingOptions getImageIndexingOptions()

Gets the image indexing options for reverse image search.

Returns: ImageIndexingOptions - The image indexing options for reverse image search.

getCore()

public Object getCore()

Returns: java.lang.Object