com.groupdocs.parser

Interfaces

Classes

Exceptions

com.groupdocs.parser

Class EmailTextExtractor

  • All Implemented Interfaces:
    IContainer, IHighlightExtractor, IRegexSearchable, ISearchable, IStructuredExtractor, AutoCloseable


    public final class EmailTextExtractor
    extends EmailTextExtractorBase
    implements ISearchable, IHighlightExtractor, IRegexSearchable, IStructuredExtractor

    Provides the text extractor for email messages.


    Supported formats:

    .MSGMicrosoft Outlook message
    .EMLEmail Message
    .EMLXApple's macOS Mail message

    Extracting an email:

     // Create a text extractor for emails
     EmailTextExtractor extractor = new EmailTextExtractor(stream);
     // Extract a body of the message
     System.out.println(extractor.extractAll());
     // Iterate attachments
     for (int i = 0; i < extractor.getAttachmentCount(); i++) {
         // Get the attachment
         Container.Entity entity = extractor.getEntities().get(i);
         // Print a content type of the attachment
         System.out.println(entity.getMediaType());
         // Create a text extractor for the attachment's stream
         TextExtractor attachmentExtractor = extractorFactory.createTextExtractor(entity.openStream());
         // If the content type is supported
         if (attachmentExtractor != null) {
             // Extract a text from the attachment
             System.out.println(attachmentExtractor.extractAll());
         }
     }
      
    • Constructor Detail

      • EmailTextExtractor

        public EmailTextExtractor(String fileName)

        Initializes a new instance of the EmailTextExtractor class.

        Parameters:
        fileName - The path to the file.
      • EmailTextExtractor

        public EmailTextExtractor(String fileName,
                          LoadOptions loadOptions)

        Initializes a new instance of the EmailTextExtractor class.

        Parameters:
        fileName - The path to the file.
        loadOptions - The options of loading the file.
      • EmailTextExtractor

        public EmailTextExtractor(InputStream stream)

        Initializes a new instance of the EmailTextExtractor class.

        Parameters:
        stream - The stream of the document.
      • EmailTextExtractor

        public EmailTextExtractor(InputStream stream,
                          LoadOptions loadOptions)

        Initializes a new instance of the EmailTextExtractor class.

        Parameters:
        stream - The stream of the document.
        loadOptions - The options of loading the file.
    • Method Detail

      • search

        public void search(SearchOptions options,
                  ISearchHandler handler,
                  List<String> keywords)

        Searches the keywords.

        Specified by:
        search in interface ISearchable
        Parameters:
        options - Options for searching.
        handler - An instance of the search handler.
        keywords - A collection of words to search.
      • search

        public void search(SearchOptions options,
                  ISearchHandler handler,
                  ISearchEngine searchEngine,
                  List<String> keywords)

        Searches the keywords.

        Specified by:
        search in interface ISearchable
        Parameters:
        options - Options for searching.
        handler - An instance of the search handler.
        searchEngine - An instance of the search engine.
        keywords - A collection of words to search.
      • extractHighlights

        public List<String> extractHighlights(HighlightOptions... highlightOptions)

        Extracts highlights.

        Specified by:
        extractHighlights in interface IHighlightExtractor
        Parameters:
        highlightOptions - A collection of HighlightOptions.
        Returns:
        A collection of strings that represent highlights. If no highlight is found, a collection is empty.
      • extractText

        protected String extractText()

        Extracts all characters from the current position to the end of the text extractor and returns them as one string.

        Overrides:
        extractText in class TextExtractor
        Returns:
        A string that contains all characters from the current position to the end of the text extractor.