com.groupdocs.parser

Interfaces

Classes

Exceptions

com.groupdocs.parser

Class StructuredHandler

  • Direct Known Subclasses:
    ModelStructuredHandler, XmlStructuredHandler


    public class StructuredHandler
    extends Object

    Represents a handler for extracting a structured text from the document.

    Extracting headers from a document:

     class Headers {
         private class Handler extends StructuredHandler {
             // Handle List event to prevent processing of lists
             protected void onStartList(ListProperties properties) {
                 properties.setSkipElement(true); // ignore lists
             }
             // Handle Table event to prevent processing of tables
             protected void onStartTable(TableProperties properties) {
                 properties.setSkipElement(true); // ignore tables
             }
             // Handle ElementText event to process a text
             protected void onText(TextProperties properties, String value) {
                 sb.append(value);
             }
             // Handle Paragraph event to process a paragraph
             protected void onStartParagraph(ParagraphProperties properties) {
                 int h1 = (int) ParagraphStyle.Heading1;
                 int h6 = (int) ParagraphStyle.Heading6;
                 int style = properties.getStyle();
                 if (h1 <= style && style <= h6) {
                     if (sb.length() > 0) {
                         sb.append("\r\n");
                     }
                     // make an indention for the header (h1 - no indention)
                     sb.append(new String(new char[style - h1]).replace('\0', ' '));
                 } else {
                     // skip paragraph if it's not a header or a title
                     properties.setSkipElement(properties.getStyle() != ParagraphStyle.Title);
                 }
             }
         }
         private StringBuilder sb = new StringBuilder();
         public void extract(java.io.InputStream stream) {
             IStructuredExtractor extractor = new WordsTextExtractor(stream);
             Handler handler = new Handler();
             // Extract a text with its structure
             extractor.extractStructured(handler);
             System.out.println(sb.toString());
         }
     }
      

    Extracting hyperlinks from a document:

     class Hyperlinks {
         private class Handler extends StructuredHandler {
             // Handle Hyperlink event to process a starting of a hyperlink
             protected void onStartHyperlink(HyperlinkProperties properties) {
                 sb = new StringBuilder();
                 currentLink = properties.getLink();
             }
             // Handle ElementClose event to process a closing of a hyperlink
             protected void onEndElement() {
                 if (get_Item(0).getClass() == HyperlinkProperties.class) // closing of hyperlink
                 {
                     if (sb != null) {
                         hyperlinks.add(String.format("%s (%s)", sb.toString(), currentLink));
                     }
                     sb = null;
                     currentLink = null;
                 }
             }
             // Handle ElementText event to process a text
             protected void onText(TextProperties properties, String value) {
                 if (sb != null) // if hyperlink is open
                 {
                     sb.append(value);
                 }
             }
         }
         java.util.List<String> hyperlinks = new java.util.ArrayList<String>();
         StringBuilder sb = null;
         String currentLink = null;
         public void extract(java.io.InputStream stream) {
             IStructuredExtractor extractor = new WordsTextExtractor(stream);
             StructuredHandler handler = new StructuredHandler();
             // Extract a text with its structure
             extractor.extractStructured(handler);
             for(String hl : hyperlinks)
             {
                 System.out.println(hl);
             }
         }
     }
      
    • Constructor Detail

      • StructuredHandler

        public StructuredHandler()

        Initializes a new instance of the StructuredHandler class.

    • Method Detail

      • getDepth

        public int getDepth()

        Gets a depth of the current element.

        Returns:
        An integer value that represents depth of the current element.
      • get_Item

        public StructuredElementProperties get_Item(int index)

        Gets a element.

        Parameters:
        index - Depth of the element.


        Elements are sorted by depth in a reverse way. The top element in the hierarchy has the largest index, the deepest element has a zero index.

        Returns:
        An instance of StructuredElementProperties.
      • startElement

        public void startElement(StructuredElementProperties properties)

        Processes the element.

        Parameters:
        properties - Properties of the element.
      • endElement

        public void endElement()

        Processes the closing of the element.

      • text

        public void text(TextProperties properties,
                String value)

        Processes the element's text.

        Parameters:
        properties - Properties of the element's text.
        value - A text of the element.
      • onStartingElement

        protected void onStartingElement(StructuredElementProperties properties)

        Prepares to process the element.

        Parameters:
        properties - Properties of the element.
      • onStartElement

        protected void onStartElement(StructuredElementProperties properties)

        Starts to process the element.

        Parameters:
        properties - Properties of the element.
      • onStartDocument

        protected void onStartDocument(DocumentProperties properties)

        Starts to process the document.

        Parameters:
        properties - Properties of the document.
      • onStartPage

        protected void onStartPage(PageProperties properties)

        Starts to process the page.

        Parameters:
        properties - Properties of the page.
      • onStartSlide

        protected void onStartSlide(SlideProperties properties)

        Starts to process the slide.

        Parameters:
        properties - Properties of the slide.
      • onStartParagraph

        protected void onStartParagraph(ParagraphProperties properties)

        Starts to process the paragraph element.

        Parameters:
        properties - Properties of the paragraph element.
      • onStartHyperlink

        protected void onStartHyperlink(HyperlinkProperties properties)

        Starts to process the hyperlink element.

        Parameters:
        properties - Properties of the hyperlink element.
      • onStartList

        protected void onStartList(ListProperties properties)

        Starts to process the list element.

        Parameters:
        properties - Properties of the list element.
      • onStartListItem

        protected void onStartListItem(ListItemProperties properties)

        Starts to process the list item element.

        Parameters:
        properties - Properties of the list item element.
      • onStartTable

        protected void onStartTable(TableProperties properties)

        Starts to process the table element.

        Parameters:
        properties - Properties of the table element.
      • onStartTableRow

        protected void onStartTableRow(TableRowProperties properties)

        Starts to process the table row element.

        Parameters:
        properties - Properties of the table row element.
      • onStartTableCell

        protected void onStartTableCell(TableCellProperties properties)

        Starts to process the table cell element.

        Parameters:
        properties - Properties of the table cell element.
      • onLineBreak

        protected void onLineBreak(LineBreakProperties properties)

        Starts to process the line break element.

        Parameters:
        properties - Properties of the line break element.
      • onStartGroup

        protected void onStartGroup(GroupProperties properties)

        Starts to process the group element.

        Parameters:
        properties - Properties of the group element.
      • onStartSection

        protected void onStartSection(SectionProperties properties)

        Starts to process the section element.

        Parameters:
        properties - Properties of the section element.
      • onText

        protected void onText(TextProperties properties,
                  String value)

        Starts to process the element's text.

        Parameters:
        properties - Properties of the element's text.
        value - A text of the element.
      • onEndElement

        protected void onEndElement()

        Starts to process the closing of the element.