com.groupdocs.parser

Interfaces

Classes

Exceptions

com.groupdocs.parser

Class EncodingDetector



  • public final class EncodingDetector
    extends Object

    Provides the functionality for detecting the encoding of the java.io.InputStream.

    The constructor accepts a default encoding for non-unicode encodings:

     EncodingDetector detector = new EncodingDetector(java.nio.charset.Charset.forName("windows-1251"));
      

    Detect the encoding only by BOM:

     // Create an encoding detector
     EncodingDetector detector = new EncodingDetector(java.nio.charset.Charset.forName("windows-1251"));        
     // Detect a charset
     java.nio.charset.Charset charset = detector.detect(stream);
      

    Detect the encoding only by BOM or by the content (if BOM is not presented):

     EncodingDetector detector = new EncodingDetector(java.nio.charset.Charset.forName("windows-1251"));        
     // Detect a charset
     java.nio.charset.Charset charset = detector.detect(stream, true);
      

    If BOM is presented, it works like the previous method. But if BOM is not presented, it tries to detect encoding by the content. It uses indirect methods of the detecting, therefore, it is slower and less accurate.

    • Constructor Detail

      • EncodingDetector

        public EncodingDetector(Charset defaultAnsiCharset)

        Initializes a new instance of the EncodingDetector class.

        Parameters:
        defaultAnsiEncoding - The character encoding that is used for ANSI encodings.
        Throws:
        ArgumentNullException - defaultAnsiEncoding is null.
    • Method Detail

      • getDefaultAnsiCharset

        public Charset getDefaultAnsiCharset()

        Gets the character encoding that is used for ANSI encodings.

        Returns:
        The character encoding that is used for ANSI encodings.
      • detect

        public Charset detect(InputStream stream)

        Detects the character encoding of the stream by byte order mark (BOM).

        Parameters:
        stream - The stream for which the character encoding must be detected.
        Returns:
        The character encoding of the stream or null if encoding cannot be detected.
        Throws:
        ArgumentNullException - stream is null.
      • detect

        public Charset detect(InputStream stream,
                     boolean detectByContent)

        Detects the character encoding of the stream.

        Parameters:
        stream - The stream for which the character encoding must be detected.
        detectByContent - Indicates whether to detect encoding only by byte order mark (BOM).


        The method tries to detect encoding by byte order mark (BOM). If BOM isn't present or its signature is not recognized, the method tries (if detectByContent is set true) to detect encoding by content. Detecting by content may not always detect the encoding accurately.

        Returns:
        The character encoding of the stream or null if encoding cannot be detected.
        Throws:
        ArgumentNullException - stream is null.