com.groupdocs.parser

Interfaces

Classes

Exceptions

com.groupdocs.parser

Class ExtractorFactory

  • All Implemented Interfaces:
    IContainerFactory


    public class ExtractorFactory
    extends Object
    implements IContainerFactory

    Provides the functionality for creating extractors for documents.


    ExtractorFactory provides the functionality to create instances of extractors classes. It contains the following methods:

    CreateTextExtractor Creates a text extractor for the file. If the document format is not detected, the method returns <strong>null</strong>.
    CreateFormattedTextExtractor Creates a formatted text extractor for the file. If the document format is not detected, the method returns <strong>null</strong>.
    CreateContainer Creates a container object for the file. If the document format is not detected, the method returns <strong>null</strong>.
    CreateMetadataExtractor Creates a metadata extractor. If the document format is not detected, the method returns <strong>null</strong>.

    For detecting the document format MediaTypeDetector is used. By default all the supported document formats are detected. You can change this behavior by passing a custom MediaTypeDetector instance to the factory constructor.

    For formatted text extractors a PlainDocumentFormatter is used. You can change a formatter by passing an instance of a formatter to the factory constructor.

    Creating a text extractor:

     // Create a factory
     ExtractorFactory factory = new ExtractorFactory();
     // Create a text extractor
     TextExtractor extractor = factory.createTextExtractor(fileName);
     // Print a text from the document or message if a file format isn't supported
     System.out.println(extractor != null ? extractor.extractAll() : "The document format is not supported");
      

    Creating a formatted text extractor:

     // Create a factory
     ExtractorFactory factory = new ExtractorFactory();
     // Create a formatted text extractor
     TextExtractor extractor = factory.createFormattedTextExtractor(fileName);
     // Print a formatted text from the document or message if a file format isn't supported
     System.out.println(extractor != null ? extractor.extractAll() : "The document format is not supported");
      

    Creating a formatted text extractor with Markdown formatter:

     // Create a factory with MarkdownDocumentFormatter as a default formatter
     ExtractorFactory factory = new ExtractorFactory(new MarkdownDocumentFormatter());
     // Create a formatted text extractor
     TextExtractor extractor = factory.createFormattedTextExtractor(fileName);
     // Print a Markdown-formatted text from the document or message if a file format isn't supported
     System.out.println(extractor != null ? extractor.extractAll() : "The document format is not supported");
      

    Creating a text extractor only for spreadsheets:

     // Create a factory which can detect only spreadsheet's media types
     ExtractorFactory factory = new ExtractorFactory(null, new CellsMediaTypeDetector());
     // Create a formatted text extractor
     TextExtractor extractor = factory.createFormattedTextExtractor(fileName);
     // Print a formatted text from the document or message if a file format isn't supported
     System.out.println(extractor != null ? extractor.extractAll() : "The document format is not supported");
      

    Creating a container:

     // Create a factory
     ExtractorFactory factory = new ExtractorFactory(null, new CellsMediaTypeDetector());
     // Create a container
     Container container = factory.createContainer(fileName);
     // If a file format isn't supported
     if (container == null) {
         // Print a message
         System.out.println("The document format is not supported");
     }
      
    • Constructor Detail

      • ExtractorFactory

        public ExtractorFactory()

        Initializes a new instance of the ExtractorFactory class.

      • ExtractorFactory

        public ExtractorFactory(DocumentFormatter documentFormatter)

        Initializes a new instance of the ExtractorFactory class.

        Parameters:
        documentFormatter - An instance of the DocumentFormatter.
      • ExtractorFactory

        public ExtractorFactory(DocumentFormatter documentFormatter,
                        MediaTypeDetector mediaTypeDetector)

        Initializes a new instance of the ExtractorFactory class.

        Parameters:
        documentFormatter - An instance of the DocumentFormatter.
        mediaTypeDetector - An instance of the MediaTypeDetector.
      • ExtractorFactory

        public ExtractorFactory(DocumentFormatter documentFormatter,
                        MediaTypeDetector mediaTypeDetector,
                        EncodingDetector encodingDetector)

        Initializes a new instance of the ExtractorFactory class.

        Parameters:
        documentFormatter - An instance of the DocumentFormatter.
        mediaTypeDetector - An instance of the MediaTypeDetector.
        encodingDetector - An instance of the EncodingDetector.
      • ExtractorFactory

        public ExtractorFactory(DocumentFormatter documentFormatter,
                        MediaTypeDetector mediaTypeDetector,
                        EncodingDetector encodingDetector,
                        INotificationReceiver notificationReceiver)

        Initializes a new instance of the ExtractorFactory class.

        Parameters:
        documentFormatter - An instance of the DocumentFormatter.
        mediaTypeDetector - An instance of the MediaTypeDetector.
        encodingDetector - An instance of the EncodingDetector.
        notificationReceiver - INotificationReceiver to process messages.
    • Method Detail

      • getDocumentFormatter

        public DocumentFormatter getDocumentFormatter()

        Gets a document formatter.

        Returns:
        An instance of the DocumentFormatter.
      • getMediaTypeDetector

        public MediaTypeDetector getMediaTypeDetector()

        Gets a media type detector.

        Returns:
        An instance of the MediaTypeDetector.
      • getEncodingDetector

        public EncodingDetector getEncodingDetector()

        Gets an encoding detector.

        Returns:
        An instance of the EncodingDetector.
      • createTextExtractor

        public TextExtractor createTextExtractor(String fileName)
                                          throws FileNotFoundException

        Creates a text extractor.

        Parameters:
        fileName - The name of the file.


        The media type will be detected by the extension of the file or by the content of the file.

        Returns:
        An instance of the text extractor or null if media type is not supported.
        Throws:
        FileNotFoundException
      • createTextExtractor

        public TextExtractor createTextExtractor(String fileName,
                                        LoadOptions loadOptions)
                                          throws FileNotFoundException

        Creates a text extractor.

        Parameters:
        fileName - The name of the file.
        loadOptions - The options of loading the file.


        If loadOptions.MediaType is null, media type will be detected by the extension of the file or by the content of the file.

        Returns:
        An instance of the text extractor or null if media type is not supported.
        Throws:
        FileNotFoundException
      • createTextExtractor

        public TextExtractor createTextExtractor(InputStream stream)

        Creates a text extractor.

        Parameters:
        stream - The stream of the document.


        The media type will be detected by the content of the file.

        Returns:
        An instance of the text extractor or null if media type is not supported.
      • createTextExtractor

        public TextExtractor createTextExtractor(InputStream stream,
                                        LoadOptions loadOptions)

        Creates a text extractor.

        Parameters:
        stream - The stream of the document.
        loadOptions - The options of loading the file.


        If loadOptions.MediaType is null, media type will be detected by the content of the file.

        Returns:
        An instance of the text extractor or null if media type is not supported.
      • createFormattedTextExtractor

        public TextExtractor createFormattedTextExtractor(String fileName)
                                                   throws FileNotFoundException

        Creates a formatted text extractor.

        Parameters:
        fileName - The name of the file.


        The media type will be detected by the extension of the file or by the content of the file.

        Returns:
        An instance of the formatted text extractor or null if media type is not supported.
        Throws:
        FileNotFoundException
      • createFormattedTextExtractor

        public TextExtractor createFormattedTextExtractor(String fileName,
                                                 LoadOptions loadOptions)
                                                   throws FileNotFoundException

        Creates a formatted text extractor.

        Parameters:
        fileName - The name of the file.
        loadOptions - The options of loading the file.


        If loadOptions.MediaType is null, media type will be detected by the extension of the file or by the content of the file.

        Returns:
        An instance of the formatted text extractor or null if media type is not supported.
        Throws:
        FileNotFoundException
      • createFormattedTextExtractor

        public TextExtractor createFormattedTextExtractor(InputStream stream)

        Creates a formatted text extractor.

        Parameters:
        stream - The stream of the document.


        The media type will be detected by the content of the file.

        Returns:
        An instance of the formatted text extractor or null if media type is not supported.
      • createFormattedTextExtractor

        public TextExtractor createFormattedTextExtractor(InputStream stream,
                                                 LoadOptions loadOptions)

        Creates a formatted text extractor.

        Parameters:
        stream - The stream of the document.
        loadOptions - The options of loading the file.


        If loadOptions.MediaType is null, media type will be detected by the content of the file.

        Returns:
        An instance of the formatted text extractor or null if media type is not supported.
      • createMetadataExtractor

        public MetadataExtractor createMetadataExtractor(String fileName)
                                                  throws FileNotFoundException

        Creates a metadata extractor.

        Parameters:
        fileName - The name of the file.


        The media type will be detected by the extension of the file or by the content of the file.

        Returns:
        An instance of the metadata extractor or null if media type is not supported.
        Throws:
        FileNotFoundException
      • createMetadataExtractor

        public MetadataExtractor createMetadataExtractor(String fileName,
                                                LoadOptions loadOptions)
                                                  throws FileNotFoundException

        Creates a metadata extractor.

        Parameters:
        fileName - The name of the file.
        loadOptions - The options of loading the file.


        If loadOptions.MediaType is null, media type will be detected by the extension of the file or by the content of the file.

        Returns:
        An instance of the metadata extractor or null if media type is not supported.
        Throws:
        FileNotFoundException
      • createMetadataExtractor

        public MetadataExtractor createMetadataExtractor(InputStream stream)

        Creates a metadata extractor.

        Parameters:
        stream - The stream of the document.


        The media type will be detected by the content of the file.

        Returns:
        An instance of the metadata extractor or null if media type is not supported.
      • createMetadataExtractor

        public MetadataExtractor createMetadataExtractor(InputStream stream,
                                                LoadOptions loadOptions)

        Creates a metadata extractor.

        Parameters:
        stream - The stream of the document.
        loadOptions - The options of loading the file.


        If loadOptions.MediaType is null, media type will be detected by the content of the file.

        Returns:
        An instance of the metadata extractor or null if media type is not supported.
      • createContainer

        public Container createContainer(String fileName,
                                LoadOptions loadOptions)
                                  throws FileNotFoundException

        Creates a container.

        Specified by:
        createContainer in interface IContainerFactory
        Parameters:
        fileName - The name of the file.
        loadOptions - The options of loading the file.


        If loadOptions.MediaType is null, media type will be detected by the extension of the file or by the content of the file.

        Returns:
        An instance of the container or null if media type is not supported.
        Throws:
        FileNotFoundException
      • createContainer

        public Container createContainer(InputStream stream)

        Creates a container.

        Specified by:
        createContainer in interface IContainerFactory
        Parameters:
        stream - The stream of the document.


        The media type will be detected by the content of the file.

        Returns:
        An instance of the container or null if media type is not supported.
      • createContainer

        public Container createContainer(InputStream stream,
                                LoadOptions loadOptions)

        Creates a container.

        Specified by:
        createContainer in interface IContainerFactory
        Parameters:
        stream - The stream of the document.
        loadOptions - The options of loading the file.


        If loadOptions.MediaType is null, media type will be detected by the content of the file.

        Returns:
        An instance of the container or null if media type is not supported.
      • getDocumentInfo

        public DocumentInfo getDocumentInfo(String fileName)

        Returns the information of supported extractors for a document.

        Parameters:
        fileName - The path to the document.
        Returns:
        An instance of the DocumentInfo.
      • getDocumentInfo

        public DocumentInfo getDocumentInfo(InputStream stream)

        Returns the information of supported extractors for a document.

        Parameters:
        stream - A stream of the document.
        Returns:
        An instance of the DocumentInfo.
      • isTextSupported

        protected boolean isTextSupported(String mediaType)

        Checks whether text extractor for mediaType is existed.

        Parameters:
        mediaType - A media type of the file.
        Returns:
        A boolean true if text extractor for mediaType is existed; otherwise, false.
      • isFormattedTextSupported

        protected boolean isFormattedTextSupported(String mediaType)

        Checks whether formatted text extractor for mediaType is existed.

        Parameters:
        mediaType - A media type of the file.
        Returns:
        A boolean true if formatted text extractor for mediaType is existed; otherwise, false.
      • isMetadataSupported

        protected boolean isMetadataSupported(String mediaType)

        Checks whether metadata extractor for mediaType is existed.

        Parameters:
        mediaType - A media type of the file.
        Returns:
        A boolean true if metadata extractor for mediaType is existed; otherwise, false.
      • isContainerSupported

        protected boolean isContainerSupported(String mediaType)

        Checks whether container for mediaType is existed.

        Parameters:
        mediaType - A media type of the file.
        Returns:
        A boolean true if container for mediaType is existed; otherwise, false.
      • sendNotificationMessage

        protected void sendNotificationMessage(INotificationReceiver receiver,
                                   NotificationMessage message)

        Sends notification message to receiver and factory receiver (if presented).

        Parameters:
        receiver - The notification receiver.
        message - The message with a notification.