com.groupdocs.parser

Interfaces

Classes

Exceptions

com.groupdocs.parser

Class MarkdownFormattedTextExtractor

  • All Implemented Interfaces:
    ITextExtractorWithFormatter, AutoCloseable


    public final class MarkdownFormattedTextExtractor
    extends TextExtractor
    implements ITextExtractorWithFormatter

    Provides the formatted text extractor for Markdown (.md) documents.

    Extracts a line of characters from a document:

     // Create a text extractor for Markdown documents
     TextExtractor extractor = new MarkdownFormattedTextExtractor(stream);
     // Extract a line of the text
     String line = extractor.extractLine();
     // If the line is null, then the end of the file is reached
     while (line != null) {
         // Print a line to the console
         System.out.println(line);
         // Extract another line
         line = extractor.extractLine();
     }
      

    Extracts all characters from a document:

     // Create a text extractor for Markdown documents
     TextExtractor extractor = new MarkdownFormattedTextExtractor(stream);
     // Extract a text
     System.out.println(extractor.extractAll());
      

    For setting a formatter DocumentFormatter property is used.

     // Create a formatted text extractor for text documents
     MarkdownFormattedTextExtractor extractor = new MarkdownFormattedTextExtractor(stream);
     // Set a HTML formatter for formatting
     extractor.setDocumentFormatter(new HtmlDocumentFormatter()); // all the text will be formatted as HTML
      

    By default a text is formatted as a plain text by Formatters.Plain.PlainDocumentFormatter.

    • Constructor Detail

      • MarkdownFormattedTextExtractor

        public MarkdownFormattedTextExtractor(String fileName)

        Initializes a new instance of the MarkdownFormattedTextExtractor class.

        Parameters:
        fileName - The path to the file.
      • MarkdownFormattedTextExtractor

        public MarkdownFormattedTextExtractor(InputStream stream)

        Initializes a new instance of the MarkdownFormattedTextExtractor class.

        Parameters:
        stream - The stream of the document.
    • Method Detail

      • getDocumentFormatter

        public DocumentFormatter getDocumentFormatter()

        Gets a DocumentFormatter.

        Specified by:
        getDocumentFormatter in interface ITextExtractorWithFormatter
        Returns:
        An instance of the DocumentFormatter. The default is PlainDocumentFormatter.


        By default the value is an instance of PlainDocumentFormatter class. You can set any other formatter or null, if you want to use default formatter.

      • setDocumentFormatter

        public void setDocumentFormatter(DocumentFormatter value)

        Sets a DocumentFormatter.

        Specified by:
        setDocumentFormatter in interface ITextExtractorWithFormatter
        Parameters:
        value - An instance of the DocumentFormatter. The default is PlainDocumentFormatter.


        By default the value is an instance of PlainDocumentFormatter class. You can set any other formatter or null, if you want to use default formatter.

      • reset

        public void reset()

        Resets the current document.


        Resets the cursor's position. ExtractLine method will return the first line of the document.

        Overrides:
        reset in class TextExtractor
      • dispose

        protected void dispose(boolean disposing)

        Releases the unmanaged resources used by the extractor.

        Overrides:
        dispose in class TextExtractor
        Parameters:
        disposing - A boolean true if invoked from Dispose; otherwise, false.
      • prepareLine

        protected String prepareLine()

        Returns a line of the text.

        Specified by:
        prepareLine in class TextExtractor
        Returns:
        A string that represents a line of the text, or null if all characters have been read.