com.groupdocs.parser

Interfaces

Classes

Exceptions

com.groupdocs.parser

Class EpubTextExtractor

  • All Implemented Interfaces:
    IHighlightExtractor, IRegexSearchable, ISearchable, IStructuredExtractor, AutoCloseable


    public final class EpubTextExtractor
    extends EpubTextExtractorBase
    implements IHighlightExtractor, ISearchable, IRegexSearchable, IStructuredExtractor

    Provides the text extractor for EPUB documents.

    Extracts a line of characters from a document:

     // Create a text extractor for EPUB documents
     // Create a text extractor for EPUB documents
     EpubTextExtractor extractor = new EpubTextExtractor(stream);
     // Extract a line of the text
     String line = extractor.extractLine();
     // If the line is null, then the end of the file is reached
     while (line != null) {
         // Print a line to the console
         System.out.println(line);
         // Extract another line
         line = extractor.extractLine();
     } 
      

    Extracts all characters from a document:

     // Create a text extractor for EPUB documents
     EpubTextExtractor extractor = new EpubTextExtractor(stream);
     // Extract a text
     System.out.println(extractor.extractAll());
      

    For more detailed work with document EpubPackage class is used. Each EPUB document contains one ore more packages. Count property returns a total number of packages:

     int packageCount = extractor.getCount();
      

    Indexer property returns a package:

     EpubPackage epubPackage = extractor.get_Item(0);
      
    • Constructor Detail

      • EpubTextExtractor

        public EpubTextExtractor(String fileName)

        Initializes a new instance of the EpubTextExtractor class.

        Parameters:
        fileName - The path to the file.
      • EpubTextExtractor

        public EpubTextExtractor(InputStream stream)

        Initializes a new instance of the EpubTextExtractor class.

        Parameters:
        stream - The stream of the document.
    • Method Detail

      • search

        public void search(SearchOptions options,
                  ISearchHandler handler,
                  List<String> keywords)

        Searches the keywords.

        Specified by:
        search in interface ISearchable
        Parameters:
        options - Options for searching.
        handler - An instance of the search handler.
        keywords - A collection of words to search.
      • search

        public void search(SearchOptions options,
                  ISearchHandler handler,
                  ISearchEngine searchEngine,
                  List<String> keywords)

        Searches the keywords.

        Specified by:
        search in interface ISearchable
        Parameters:
        options - Options for searching.
        handler - An instance of the search handler.
        searchEngine - An instance of the search engine.
        keywords - A collection of words to search.
      • extractHighlights

        public List<String> extractHighlights(HighlightOptions... highlightOptions)

        Extracts highlights.

        Specified by:
        extractHighlights in interface IHighlightExtractor
        Parameters:
        highlightOptions - A collection of HighlightOptions.
        Returns:
        A collection of strings that represent highlights. If no highlight is found, a collection is empty.
      • extractItem

        protected String extractItem(String itemPath)

        Extracts a text from the document's item.

        Specified by:
        extractItem in class EpubTextExtractorBase
        Parameters:
        itemPath - A path to the document's item.
        Returns:
        A string that contains all characters from the document's item.