com.groupdocs.parser

Interfaces

Classes

Exceptions

com.groupdocs.parser

Class EmailFormattedTextExtractor

  • All Implemented Interfaces:
    IContainer, IHighlightExtractor, ITextExtractorWithFormatter, AutoCloseable


    public final class EmailFormattedTextExtractor
    extends EmailTextExtractorBase
    implements IHighlightExtractor, ITextExtractorWithFormatter

    Provides the formatted text extractor for email messages.


    Supported formats:

    .MSGMicrosoft Outlook message
    .EMLEmail Message
    .EMLXApple's macOS Mail message

    Extracting an email:

     // Create a text extractor for emails
     EmailFormattedTextExtractor extractor = new EmailFormattedTextExtractor(stream);
     // Extract a formatted body of the message
     System.out.println(extractor.extractAll());
     // Iterate attachments
     for (int i = 0; i < extractor.getAttachmentCount(); i++) {
         // Get the attachment
         Container.Entity entity = extractor.getEntities().get(i);
         // Print a media type of the attachment
         System.out.println(entity.getMediaType());
         // Create a text extractor for the attachment's stream
         TextExtractor attachmentExtractor = extractorFactory.createTextExtractor(entity.openStream());
         // If the content type is supported
         if (attachmentExtractor != null) {
             // Extract a text from the attachment
             System.out.println(attachmentExtractor.extractAll()); // extracts a content of the attachment
         }
     }
      

    For setting a formatter DocumentFormatter property is used.

     EmailFormattedTextExtractor extractor = new EmailFormattedTextExtractor(stream);
     extractor.setDocumentFormatter(new MarkdownDocumentFormatter());
      

    By default a text is formatted as a plain text by Formatters.Plain.PlainDocumentFormatter.

    • Constructor Detail

      • EmailFormattedTextExtractor

        public EmailFormattedTextExtractor(String fileName)

        Initializes a new instance of the EmailFormattedTextExtractor class.

        Parameters:
        fileName - The path to the file.
      • EmailFormattedTextExtractor

        public EmailFormattedTextExtractor(String fileName,
                                   LoadOptions loadOptions)

        Initializes a new instance of the EmailFormattedTextExtractor class.

        Parameters:
        fileName - The path to the file.
        loadOptions - The options of loading the file.
      • EmailFormattedTextExtractor

        public EmailFormattedTextExtractor(InputStream stream)

        Initializes a new instance of the EmailFormattedTextExtractor class.

        Parameters:
        stream - The stream of the document.
      • EmailFormattedTextExtractor

        public EmailFormattedTextExtractor(InputStream stream,
                                   LoadOptions loadOptions)

        Initializes a new instance of the EmailFormattedTextExtractor class.

        Parameters:
        stream - The stream of the document.
        loadOptions - The options of loading the file.
    • Method Detail

      • getDocumentFormatter

        public DocumentFormatter getDocumentFormatter()

        Gets a DocumentFormatter.

        Specified by:
        getDocumentFormatter in interface ITextExtractorWithFormatter
        Returns:
        An instance of the DocumentFormatter. The default is PlainDocumentFormatter.


        By default the value is an instance of PlainDocumentFormatter class. You can set any other formatter or null, if you want to use default formatter.

      • setDocumentFormatter

        public void setDocumentFormatter(DocumentFormatter value)

        Sets a DocumentFormatter.

        Specified by:
        setDocumentFormatter in interface ITextExtractorWithFormatter
        Parameters:
        value - An instance of the DocumentFormatter. The default is PlainDocumentFormatter.


        By default the value is an instance of PlainDocumentFormatter class. You can set any other formatter or null, if you want to use default formatter.

      • extractHighlights

        public List<String> extractHighlights(HighlightOptions... highlightOptions)

        Extracts highlights.

        Specified by:
        extractHighlights in interface IHighlightExtractor
        Parameters:
        highlightOptions - A collection of HighlightOptions.


        Supports only the extraction with Mode = FixedWidth.

        Returns:
        A collection of strings that represent highlights. If no highlight is found, a collection is empty.
        Throws:
        UnsupportedOperationException - Mode is not FixedWith.