public final class SlidesTextExtractor extends SlidesTextExtractorBase implements ISearchable, IHighlightExtractor, IRegexSearchable, IStructuredExtractor, IPageTextExtractor, IDocumentContentExtractor, IFastTextExtractor
Provides the text extractor for presentations.
Supported formats:
.PPT | Microsoft PowerPoint Presentation |
.PPTX | Microsoft Office Open XML Presentation |
.PPS | Microsoft PowerPoint Slideshow |
.PPSX | Microsoft Office Open XML Auto-Play Presentation |
.PPSM | PowerPoint Open XML Macro-Enabled Slideshow |
.ODP | OpenDocument presentation |
Extracting a text from a presentation:
// Create a text extractor for presentations
SlidesTextExtractor extractor = new SlidesTextExtractor(stream);
// Extract a text
System.out.println(extractor.extractAll());
Extracting text by slides:
// Create a text extractor for presentations
SlidesTextExtractor extractor = new SlidesTextExtractor(stream);
// Iterate slides
for (int slideIndex = 0; slideIndex < extractor.getSlideCount(); slideIndex++) {
// Extract a text from the slide which index is slideIndex
System.out.println(extractor.extractSlide(slideIndex));
}
Constructor and Description |
---|
SlidesTextExtractor(InputStream stream)
Initializes a new instance of the
SlidesTextExtractor class. |
SlidesTextExtractor(InputStream stream,
LoadOptions loadOptions)
Initializes a new instance of the
SlidesTextExtractor class. |
SlidesTextExtractor(String fileName)
Initializes a new instance of the
SlidesTextExtractor class. |
SlidesTextExtractor(String fileName,
LoadOptions loadOptions)
Initializes a new instance of the
SlidesTextExtractor class. |
Modifier and Type | Method and Description |
---|---|
protected void |
dispose(boolean disposing)
Releases the unmanaged resources used by the extractor.
|
List<String> |
extractHighlights(HighlightOptions... highlightOptions)
Extracts highlights.
|
String |
extractPage(int pageIndex)
Extracts all characters from the page with pageIndex and returns the data as a string.
|
void |
extractStructured(StructuredHandler handler)
Extracts a structured text.
|
protected String |
extractText()
Extracts all characters from the current position to the end of the text extractor
and returns them as one string.
|
DocumentContent |
getDocumentContent()
Gets an access to the document's content.
|
int |
getExtractMode()
Gets a value indicating the mode of text extraction.
|
int |
getPageCount()
Gets a total count of the pages.
|
void |
search(SearchOptions options,
ISearchHandler handler,
ISearchEngine searchEngine,
List<String> keywords)
Searches the keywords.
|
void |
search(SearchOptions options,
ISearchHandler handler,
List<String> keywords)
Searches the keywords.
|
void |
searchWithRegex(String expression,
ISearchHandler handler,
RegexSearchOptions searchOptions)
Searches the expression.
|
void |
setExtractMode(int value)
Sets a value indicating the mode of text extraction.
|
extractSlide, getSlideCount, nextSlide, prepareLine, reset
checkDisposed, close, dispose, extractAll, extractLine, extractTextLine, getEncoding, getMediaType, getPassword, isDisposed, setEncoding, setMediaType
public SlidesTextExtractor(String fileName)
Initializes a new instance of the SlidesTextExtractor
class.
fileName
- The path to the file.public SlidesTextExtractor(String fileName, LoadOptions loadOptions)
Initializes a new instance of the SlidesTextExtractor
class.
fileName
- The path to the file.loadOptions
- The options of loading the file.public SlidesTextExtractor(InputStream stream)
Initializes a new instance of the SlidesTextExtractor
class.
stream
- The stream of the document.public SlidesTextExtractor(InputStream stream, LoadOptions loadOptions)
Initializes a new instance of the SlidesTextExtractor
class.
stream
- The stream of the document.loadOptions
- The options of loading the file.public DocumentContent getDocumentContent()
Gets an access to the document's content.
getDocumentContent
in interface IDocumentContentExtractor
DocumentContent
class.public int getExtractMode()
Gets a value indicating the mode of text extraction.
getExtractMode
in interface IFastTextExtractor
Standard
.public void setExtractMode(int value)
Sets a value indicating the mode of text extraction.
setExtractMode
in interface IFastTextExtractor
value
- The mode of text extraction. The default is Standard
.public int getPageCount()
IPageTextExtractor
Gets a total count of the pages.
getPageCount
in interface IPageTextExtractor
public void extractStructured(StructuredHandler handler)
Extracts a structured text.
extractStructured
in interface IStructuredExtractor
handler
- Structured text extraction handler.public void search(SearchOptions options, ISearchHandler handler, List<String> keywords)
Searches the keywords.
search
in interface ISearchable
options
- Options for searching.handler
- An instance of the search handler.keywords
- A collection of words to search.public void search(SearchOptions options, ISearchHandler handler, ISearchEngine searchEngine, List<String> keywords)
Searches the keywords.
search
in interface ISearchable
options
- Options for searching.handler
- An instance of the search handler.searchEngine
- An instance of the search engine.keywords
- A collection of words to search.public void searchWithRegex(String expression, ISearchHandler handler, RegexSearchOptions searchOptions)
Searches the expression.
searchWithRegex
in interface IRegexSearchable
expression
- A regular expression.handler
- An instance of the search handler.searchOptions
- Options for searching.public List<String> extractHighlights(HighlightOptions... highlightOptions)
Extracts highlights.
extractHighlights
in interface IHighlightExtractor
highlightOptions
- A collection of HighlightOptions.public String extractPage(int pageIndex)
IPageTextExtractor
Extracts all characters from the page with pageIndex and returns the data as a string.
extractPage
in interface IPageTextExtractor
pageIndex
- The index of the page.protected void dispose(boolean disposing)
Releases the unmanaged resources used by the extractor.
dispose
in class SlidesTextExtractorBase
disposing
- A boolean true if invoked from Dispose; otherwise, false.protected String extractText()
Extracts all characters from the current position to the end of the text extractor and returns them as one string.
extractText
in class TextExtractor