com.groupdocs.parser

Interfaces

Classes

Exceptions

com.groupdocs.parser

Class SlidesFormattedTextExtractor

  • All Implemented Interfaces:
    IHighlightExtractor, IPageTextExtractor, ITextExtractorWithFormatter, AutoCloseable


    public final class SlidesFormattedTextExtractor
    extends SlidesTextExtractorBase
    implements IHighlightExtractor, IPageTextExtractor, ITextExtractorWithFormatter

    Provides the formatted text extractor for presentations.


    Supported formats:

    .PPTMicrosoft PowerPoint Presentation
    .PPTXMicrosoft Office Open XML Presentation
    .PPSMicrosoft PowerPoint Slideshow
    .PPSXMicrosoft Office Open XML Auto-Play Presentation
    .PPSMPowerPoint Open XML Macro-Enabled Slideshow
    .ODPOpenDocument presentation

    Extracting a text from a presentation:

     // Create a formatted text extractor for presentations
     SlidesFormattedTextExtractor extractor = new SlidesFormattedTextExtractor(stream);
     // Extract a formatted text
     System.out.println(extractor.extractAll());
      

    Extracting text by slides:

     // Create a formatted text extractor for presentations
     SlidesFormattedTextExtractor extractor = new SlidesFormattedTextExtractor(stream);
     // Iterate slides
     for (int slideIndex = 0; slideIndex < extractor.getSlideCount(); slideIndex++) {
         // Extract a formatted text from the slide which index is slideIndex
         System.out.println(extractor.extractSlide(slideIndex));
     }
      

    For setting a formatter DocumentFormatter property is used.

     // Create a formatted text extractor for presentations
     SlidesFormattedTextExtractor extractor = new SlidesFormattedTextExtractor(stream);
     // Set a markdown formatter for formatting
     extractor.setDocumentFormatter(new MarkdownDocumentFormatter()); // all the text will be formatted as Markdown
      

    By default a text is formatted as a plain text by Formatters.Plain.PlainDocumentFormatter.

    • Constructor Detail

      • SlidesFormattedTextExtractor

        public SlidesFormattedTextExtractor(String fileName)

        Initializes a new instance of the SlidesFormattedTextExtractor class.

        Parameters:
        fileName - The path to the file.
      • SlidesFormattedTextExtractor

        public SlidesFormattedTextExtractor(String fileName,
                                    LoadOptions loadOptions)

        Initializes a new instance of the SlidesFormattedTextExtractor class.

        Parameters:
        fileName - The path to the file.
        loadOptions - The options of loading the file.
      • SlidesFormattedTextExtractor

        public SlidesFormattedTextExtractor(InputStream stream)

        Initializes a new instance of the SlidesFormattedTextExtractor class.

        Parameters:
        stream - The stream of the document.
      • SlidesFormattedTextExtractor

        public SlidesFormattedTextExtractor(InputStream stream,
                                    LoadOptions loadOptions)

        Initializes a new instance of the SlidesFormattedTextExtractor class.

        Parameters:
        stream - The stream of the document.
        loadOptions - The options of loading the file.
    • Method Detail

      • getDocumentFormatter

        public DocumentFormatter getDocumentFormatter()

        Gets a DocumentFormatter.

        Specified by:
        getDocumentFormatter in interface ITextExtractorWithFormatter
        Returns:
        An instance of the DocumentFormatter. The default is PlainDocumentFormatter.


        By default the value is an instance of PlainDocumentFormatter class. You can set any other formatter or null, if you want to use default formatter.

      • setDocumentFormatter

        public void setDocumentFormatter(DocumentFormatter value)

        Sets a DocumentFormatter.

        Specified by:
        setDocumentFormatter in interface ITextExtractorWithFormatter
        Parameters:
        value - An instance of the DocumentFormatter. The default is PlainDocumentFormatter.


        By default the value is an instance of PlainDocumentFormatter class. You can set any other formatter or null, if you want to use default formatter.

      • extractHighlights

        public List<String> extractHighlights(HighlightOptions... highlightOptions)

        Extracts highlights.

        Specified by:
        extractHighlights in interface IHighlightExtractor
        Parameters:
        highlightOptions - A collection of HighlightOptions.


        Supports only the extraction with Mode = FixedWidth.

        Returns:
        A collection of strings that represent highlights. If no highlight is found, a collection is empty.
        Throws:
        UnsupportedOperationException - Mode is not FixedWith.
      • reset

        public void reset()

        Resets the current document.


        Resets the cursor's position. ExtractLine method will return the first line of the document.

        Overrides:
        reset in class SlidesTextExtractorBase
      • extractPage

        public String extractPage(int pageIndex)
        Description copied from interface: IPageTextExtractor

        Extracts all characters from the page with pageIndex and returns the data as a string.

        Specified by:
        extractPage in interface IPageTextExtractor
        Parameters:
        pageIndex - The index of the page.
        Returns:
        A string that contains all characters from the page, or null if all characters have been extracted.
      • extractText

        protected String extractText()

        Extracts all characters from the current position to the end of the text extractor and returns them as one string.

        Overrides:
        extractText in class TextExtractor
        Returns:
        A string that contains all characters from the current position to the end of the text extractor.
      • extractTextLine

        protected String extractTextLine()

        Extracts a line of characters from the text extractor and returns the data as a string.

        Overrides:
        extractTextLine in class TextExtractor
        Returns:
        The next line from the extractor, or null if all characters have been extracted.