public final class EmailFormattedTextExtractor extends EmailTextExtractorBase implements IHighlightExtractor, ITextExtractorWithFormatter
Provides the formatted text extractor for email messages.
Supported formats:
.MSG | Microsoft Outlook message |
.EML | Email Message |
.EMLX | Apple's macOS Mail message |
Extracting an email:
// Create a text extractor for emails
EmailFormattedTextExtractor extractor = new EmailFormattedTextExtractor(stream);
// Extract a formatted body of the message
System.out.println(extractor.extractAll());
// Iterate attachments
for (int i = 0; i < extractor.getAttachmentCount(); i++) {
// Get the attachment
Container.Entity entity = extractor.getEntities().get(i);
// Print a media type of the attachment
System.out.println(entity.getMediaType());
// Create a text extractor for the attachment's stream
TextExtractor attachmentExtractor = extractorFactory.createTextExtractor(entity.openStream());
// If the content type is supported
if (attachmentExtractor != null) {
// Extract a text from the attachment
System.out.println(attachmentExtractor.extractAll()); // extracts a content of the attachment
}
}
For setting a formatter DocumentFormatter
property is used.
EmailFormattedTextExtractor extractor = new EmailFormattedTextExtractor(stream);
extractor.setDocumentFormatter(new MarkdownDocumentFormatter());
By default a text is formatted as a plain text by Formatters.Plain.PlainDocumentFormatter
.
Constructor and Description |
---|
EmailFormattedTextExtractor(InputStream stream)
Initializes a new instance of the
EmailFormattedTextExtractor class. |
EmailFormattedTextExtractor(InputStream stream,
LoadOptions loadOptions)
Initializes a new instance of the
EmailFormattedTextExtractor class. |
EmailFormattedTextExtractor(String fileName)
Initializes a new instance of the
EmailFormattedTextExtractor class. |
EmailFormattedTextExtractor(String fileName,
LoadOptions loadOptions)
Initializes a new instance of the
EmailFormattedTextExtractor class. |
Modifier and Type | Method and Description |
---|---|
List<String> |
extractHighlights(HighlightOptions... highlightOptions)
Extracts highlights.
|
DocumentFormatter |
getDocumentFormatter()
Gets a
DocumentFormatter . |
void |
setDocumentFormatter(DocumentFormatter value)
Sets a
DocumentFormatter . |
dispose, getAttachmentCount, getEntities, getStream, openEntityStream, prepareLine, reset
checkDisposed, close, dispose, extractAll, extractLine, extractText, extractTextLine, getEncoding, getMediaType, getPassword, isDisposed, setEncoding, setMediaType
public EmailFormattedTextExtractor(String fileName)
Initializes a new instance of the EmailFormattedTextExtractor
class.
fileName
- The path to the file.public EmailFormattedTextExtractor(String fileName, LoadOptions loadOptions)
Initializes a new instance of the EmailFormattedTextExtractor
class.
fileName
- The path to the file.loadOptions
- The options of loading the file.public EmailFormattedTextExtractor(InputStream stream)
Initializes a new instance of the EmailFormattedTextExtractor
class.
stream
- The stream of the document.public EmailFormattedTextExtractor(InputStream stream, LoadOptions loadOptions)
Initializes a new instance of the EmailFormattedTextExtractor
class.
stream
- The stream of the document.loadOptions
- The options of loading the file.public DocumentFormatter getDocumentFormatter()
Gets a DocumentFormatter
.
getDocumentFormatter
in interface ITextExtractorWithFormatter
DocumentFormatter
. The default is PlainDocumentFormatter
.
PlainDocumentFormatter
class. You can
set any other formatter or null, if you want to use default formatter.
public void setDocumentFormatter(DocumentFormatter value)
Sets a DocumentFormatter
.
setDocumentFormatter
in interface ITextExtractorWithFormatter
value
- An instance of the DocumentFormatter
. The default is PlainDocumentFormatter
.
PlainDocumentFormatter
class. You can
set any other formatter or null, if you want to use default formatter.
public List<String> extractHighlights(HighlightOptions... highlightOptions)
Extracts highlights.
extractHighlights
in interface IHighlightExtractor
highlightOptions
- A collection of HighlightOptions
.
Mode = FixedWidth
.
UnsupportedOperationException
- Mode is not FixedWith.