com.groupdocs.parser

Interfaces

Classes

Exceptions

com.groupdocs.parser

Class SlidesTextExtractor

  • All Implemented Interfaces:
    IHighlightExtractor, IPageTextExtractor, IRegexSearchable, ISearchable, IStructuredExtractor, AutoCloseable


    public final class SlidesTextExtractor
    extends SlidesTextExtractorBase
    implements ISearchable, IHighlightExtractor, IRegexSearchable, IStructuredExtractor, IPageTextExtractor

    Provides the text extractor for presentations.


    Supported formats:

    .PPTMicrosoft PowerPoint Presentation
    .PPTXMicrosoft Office Open XML Presentation
    .PPSMicrosoft PowerPoint Slideshow
    .PPSXMicrosoft Office Open XML Auto-Play Presentation
    .PPSMPowerPoint Open XML Macro-Enabled Slideshow
    .ODPOpenDocument presentation

    Extracting a text from a presentation:

     // Create a text extractor for presentations
     SlidesTextExtractor extractor = new SlidesTextExtractor(stream);
     // Extract a text
     System.out.println(extractor.extractAll());
      

    Extracting text by slides:

     // Create a text extractor for presentations
     SlidesTextExtractor extractor = new SlidesTextExtractor(stream);
     // Iterate slides
     for (int slideIndex = 0; slideIndex < extractor.getSlideCount(); slideIndex++) {
         // Extract a text from the slide which index is slideIndex
         System.out.println(extractor.extractSlide(slideIndex));
     }
      
    • Constructor Detail

      • SlidesTextExtractor

        public SlidesTextExtractor(String fileName)

        Initializes a new instance of the SlidesTextExtractor class.

        Parameters:
        fileName - The path to the file.
      • SlidesTextExtractor

        public SlidesTextExtractor(String fileName,
                           LoadOptions loadOptions)

        Initializes a new instance of the SlidesTextExtractor class.

        Parameters:
        fileName - The path to the file.
        loadOptions - The options of loading the file.
      • SlidesTextExtractor

        public SlidesTextExtractor(InputStream stream)

        Initializes a new instance of the SlidesTextExtractor class.

        Parameters:
        stream - The stream of the document.
      • SlidesTextExtractor

        public SlidesTextExtractor(InputStream stream,
                           LoadOptions loadOptions)

        Initializes a new instance of the SlidesTextExtractor class.

        Parameters:
        stream - The stream of the document.
        loadOptions - The options of loading the file.
    • Method Detail

      • getExtractMode

        public int getExtractMode()

        Gets a value indicating the mode of text extraction.

        Returns:
        The mode of text extraction. The default is Standard.
      • setExtractMode

        public void setExtractMode(int value)

        Sets a value indicating the mode of text extraction.

        Parameters:
        value - The mode of text extraction. The default is Standard.
      • search

        public void search(SearchOptions options,
                  ISearchHandler handler,
                  List<String> keywords)

        Searches the keywords.

        Specified by:
        search in interface ISearchable
        Parameters:
        options - Options for searching.
        handler - An instance of the search handler.
        keywords - A collection of words to search.
      • search

        public void search(SearchOptions options,
                  ISearchHandler handler,
                  ISearchEngine searchEngine,
                  List<String> keywords)

        Searches the keywords.

        Specified by:
        search in interface ISearchable
        Parameters:
        options - Options for searching.
        handler - An instance of the search handler.
        searchEngine - An instance of the search engine.
        keywords - A collection of words to search.
      • extractHighlights

        public List<String> extractHighlights(HighlightOptions... highlightOptions)

        Extracts highlights.

        Specified by:
        extractHighlights in interface IHighlightExtractor
        Parameters:
        highlightOptions - A collection of HighlightOptions.
        Returns:
        A collection of strings that represent highlights. If no highlight is found, a collection is empty.
      • extractPage

        public String extractPage(int pageIndex)
        Description copied from interface: IPageTextExtractor

        Extracts all characters from the page with pageIndex and returns the data as a string.

        Specified by:
        extractPage in interface IPageTextExtractor
        Parameters:
        pageIndex - The index of the page.
        Returns:
        A string that contains all characters from the page, or null if all characters have been extracted.
      • extractText

        protected String extractText()

        Extracts all characters from the current position to the end of the text extractor and returns them as one string.

        Overrides:
        extractText in class TextExtractor
        Returns:
        A string that contains all characters from the current position to the end of the text extractor.