com.groupdocs.parser

Interfaces

Classes

Exceptions

com.groupdocs.parser

Class ChmFormattedTextExtractor

    • Constructor Detail

      • ChmFormattedTextExtractor

        public ChmFormattedTextExtractor(String fileName)

        Initializes a new instance of the ChmFormattedTextExtractor class.

        Parameters:
        fileName - The path to the file.
      • ChmFormattedTextExtractor

        public ChmFormattedTextExtractor(InputStream stream)

        Initializes a new instance of the ChmFormattedTextExtractor class.

        Parameters:
        stream - The stream of the document.
    • Method Detail

      • getPageCount

        public int getPageCount()

        Gets a total count of the pages.

        Specified by:
        getPageCount in interface IPageTextExtractor
        Returns:
        A total count of the pages.
      • getTableOfContents

        public List<TableOfContentsItem> getTableOfContents()

        Gets a collection of table of contents items.

        Returns:
        A collection of table of contents items.
      • getDocumentFormatter

        public DocumentFormatter getDocumentFormatter()

        Gets a DocumentFormatter.

        Specified by:
        getDocumentFormatter in interface ITextExtractorWithFormatter
        Returns:
        An instance of the DocumentFormatter. The default is PlainDocumentFormatter.


        By default the value is an instance of PlainDocumentFormatter class. You can set any other formatter or null, if you want to use default formatter.

      • setDocumentFormatter

        public void setDocumentFormatter(DocumentFormatter value)

        Sets a DocumentFormatter.

        Specified by:
        setDocumentFormatter in interface ITextExtractorWithFormatter
        Parameters:
        value - An instance of the DocumentFormatter. The default is PlainDocumentFormatter.


        By default the value is an instance of PlainDocumentFormatter class. You can set any other formatter or null, if you want to use default formatter.

      • extractPage

        public String extractPage(int pageIndex)

        Extracts all characters from the page with pageIndex and returns the data as a string.

        Specified by:
        extractPage in interface IPageTextExtractor
        Parameters:
        pageIndex - The index of the page.
        Returns:
        A string that contains all characters from the page, or null if all characters have been extracted.
      • reset

        public void reset()

        Resets the current document.


        Resets the cursor's position. ExtractLine method will return the first line of the document.

        Overrides:
        reset in class TextExtractor
      • dispose

        protected void dispose(boolean disposing)

        Releases the unmanaged resources used by the extractor.

        Overrides:
        dispose in class TextExtractor
        Parameters:
        disposing - A boolean true if invoked from Dispose; otherwise, false.
      • prepareLine

        protected String prepareLine()

        Returns a line of the text.

        Specified by:
        prepareLine in class TextExtractor
        Returns:
        A string that represents a line of the text, or null if all characters have been read.