com.groupdocs.parser

Interfaces

Classes

Exceptions

com.groupdocs.parser

Class Extractor



  • public class Extractor
    extends Object

    Provides the functionality for fast extracting text and metadata from documents.

    Extracting metadata:

     // Create an extractor
     Extractor extractor = new Extractor();
     // Extract a metadata
     MetadataCollection metadata = extractor.extractMetadata(fileName);
     // If a file format isn't supported
     if (metadata == null) {
         // Print a message
         System.out.println("The document format is not supported");
     }
      
    • Field Detail

      • DEFAULT

        public static final Extractor DEFAULT

        A default extractor.

    • Constructor Detail

      • Extractor

        public Extractor()

        Initializes a new instance of the Extractor class.

      • Extractor

        public Extractor(MediaTypeDetector mediaTypeDetector,
                 EncodingDetector encodingDetector,
                 INotificationReceiver notificationReceiver)

        Initializes a new instance of the Extractor class.

        Parameters:
        mediaTypeDetector - An instance of the MediaTypeDetector.
        encodingDetector - An instance of the EncodingDetector.
        notificationReceiver - INotificationReceiver to process messages.
      • Extractor

        public Extractor(MediaTypeDetector mediaTypeDetector,
                 EncodingDetector encodingDetector,
                 INotificationReceiver notificationReceiver,
                 DocumentFormatter documentFormatter)

        Initializes a new instance of the Extractor class.

        Parameters:
        mediaTypeDetector - An instance of the MediaTypeDetector.
        encodingDetector - An instance of the EncodingDetector.
        notificationReceiver - INotificationReceiver to process messages.
        documentFormatter - An instance of the DocumentFormatter.
    • Method Detail

      • getMediaTypeDetector

        public MediaTypeDetector getMediaTypeDetector()

        Gets a media type detector.

        Returns:
        An instance of the MediaTypeDetector.
      • getEncodingDetector

        public EncodingDetector getEncodingDetector()

        Gets an encoding detector.

        Returns:
        An instance of the EncodingDetector.
      • extractMetadata

        public MetadataCollection extractMetadata(String fileName)

        Extracts the metadata.

        Parameters:
        fileName - The name of the file.


        The media type will be detected by the extension of the file or by the content of the file.

        Returns:
        A collection of the metadata or null if media type is not supported.
      • extractMetadata

        public MetadataCollection extractMetadata(String fileName,
                                         LoadOptions loadOptions)

        Extracts the metadata.

        Parameters:
        fileName - The name of the file.
        loadOptions - The options of loading the file.


        If loadOptions.MediaType is null, media type will be detected by the extension of the file or by the content of the file.

        Returns:
        A collection of the metadata or null if media type is not supported.
      • extractMetadata

        public MetadataCollection extractMetadata(InputStream stream)

        Extracts the metadata.

        Parameters:
        stream - The stream of the document.


        The media type will be detected by the content of the file.

        Returns:
        A collection of the metadata or null if media type is not supported.
      • extractMetadata

        public MetadataCollection extractMetadata(InputStream stream,
                                         LoadOptions loadOptions)

        Extracts the metadata.

        Parameters:
        stream - The stream of the document.
        loadOptions - The options of loading the file.


        If loadOptions.MediaType is null, media type will be detected by the content of the file.

        Returns:
        A collection of the metadata or null if media type is not supported.
      • extractText

        public String extractText(String fileName)

        Extracts a text.

        Parameters:
        fileName - The name of the file.


        The media type will be detected by the content of the file.

        Returns:
        A string that contains all characters from the document.
      • extractText

        public String extractText(String fileName,
                         LoadOptions loadOptions)

        Extracts a text.

        Parameters:
        fileName - The name of the file.
        loadOptions - The options of loading the file.


        If loadOptions.MediaType is null, media type will be detected by the content of the file.

        Returns:
        A string that contains all characters from the document.
      • extractText

        public String extractText(InputStream stream)

        Extracts a text.

        Parameters:
        stream - The stream of the document.


        The media type will be detected by the content of the file.

        Returns:
        A string that contains all characters from the document.
      • extractText

        public String extractText(InputStream stream,
                         LoadOptions loadOptions)

        Extracts a text.

        Parameters:
        stream - The stream of the document.
        loadOptions - The options of loading the file.


        If loadOptions.MediaType is null, media type will be detected by the content of the file.

        Returns:
        A string that contains all characters from the document.
      • extractFormattedText

        public String extractFormattedText(String fileName)

        Extracts a formatted text.

        Parameters:
        fileName - The name of the file.


        The media type will be detected by the content of the file.

        Returns:
        A string that contains all characters with formatting from the document.
      • extractFormattedText

        public String extractFormattedText(String fileName,
                                  LoadOptions loadOptions)

        Extracts a formatted text.

        Parameters:
        fileName - The name of the file.
        loadOptions - The options of loading the file.


        If loadOptions.MediaType is null, media type will be detected by the content of the file.

        Returns:
        A string that contains all characters with formatting from the document.
      • extractFormattedText

        public String extractFormattedText(InputStream stream)

        Extracts a formatted text.

        Parameters:
        stream - The stream of the document.


        The media type will be detected by the content of the file.

        Returns:
        A string that contains all characters with formatting from the document.
      • extractFormattedText

        public String extractFormattedText(InputStream stream,
                                  LoadOptions loadOptions)

        Extracts a formatted text.

        Parameters:
        stream - The stream of the document.
        loadOptions - The options of loading the file.


        If loadOptions.MediaType is null, media type will be detected by the content of the file.

        Returns:
        A string that contains all characters with formatting from the document.
      • sendNotificationMessage

        protected void sendNotificationMessage(INotificationReceiver receiver,
                                   NotificationMessage message)

        Sends notification message to receiver and factory receiver (if presented).

        Parameters:
        receiver - The notification receiver.
        message - The message with a notification.