com.groupdocs.parser

Interfaces

Classes

Exceptions

com.groupdocs.parser

Class CellsFormattedTextExtractor

  • All Implemented Interfaces:
    IHighlightExtractor, IPageTextExtractor, ITextExtractorWithFormatter, AutoCloseable


    public final class CellsFormattedTextExtractor
    extends CellsTextExtractorBase
    implements IHighlightExtractor, IPageTextExtractor, ITextExtractorWithFormatter

    Provides the formatted text extractor for spreadsheets.


    Supported formats:

    .XLSMicrosoft Excel Spreadsheet
    .XLSXMicrosoft Office Open XML Workbook
    .XLSMMicrosoft Excel 2007 Macro-Enabled Workbook
    .XLSBMicrosoft Excel 2007 Binary Workbook
    .ODSOpenDocument spreadsheet
    .CSVComma Separated Values text file

    Extracting a text from a spreadsheet:

     // Create a formatted text extractor for spreadsheets
     CellsFormattedTextExtractor extractor = new CellsFormattedTextExtractor(stream);
     // Extract a formatted text
     System.out.println(extractor.extractAll());
      

    Extracting by sheets:

     // Create a formatted text extractor for spreadsheets
     CellsFormattedTextExtractor extractor = new CellsFormattedTextExtractor(stream);
     // Extract a formatted text from the sheet which index is sheetIndex
     System.out.println(extractor.extractSheet(sheetIndex));
      

    Extracting the information about the sheet:

     // Create a formatted text extractor for spreadsheets
     CellsFormattedTextExtractor extractor = new CellsFormattedTextExtractor(stream);
     // Get a sheet info for sheet which index is sheetIndex
     CellsSheetInfo sheetInfo = extractor.getSheetInfo(sheetIndex);
     System.out.println(String.format("Name: %s", sheetInfo.getName())); // sheet's name
     System.out.println(String.format("Index: %d", sheetInfo.getIndex())); // sheet's index
     System.out.println(String.format("Rows Count: %d", sheetInfo.getRowCount())); // a total number of the rows in the sheet
     System.out.println("Columns");
     // Iterate sheet's columns
     for(int i = 0; i < sheetInfo.getColumnNames().size(); i++)
     {
       // Print a name of the column which index is i
       System.out.println(String.format("%s%s",
         sheetInfo.getColumnNames().get(i), i + 1 < sheetInfo.getColumnNames().size() ? ";" : ""));
     }
      

    Extracting by rows:

     // Create a formatted text extractor for spreadsheets
     CellsFormattedTextExtractor extractor = new CellsFormattedTextExtractor(stream);
     // Get a sheet info for sheet which index is sheetIndex
     CellsSheetInfo sheetInfo = extractor.getSheetInfo(sheetIndex);
     // Extract a formatted text for the row which index is rowIndex
     System.out.println(sheetInfo.extractRow(rowIndex));
      

    Extracting only selected columns:

     // Create a formatted text extractor for spreadsheets
     CellsFormattedTextExtractor extractor = new CellsFormattedTextExtractor(stream);
     // Get a sheet info for sheet which index is sheetIndex
     CellsSheetInfo sheetInfo = extractor.getSheetInfo(sheetIndex);
      
     // Extract a formatted text for the row which index is rowIndex (only A1 and C1 columns)
     System.out.println(sheetInfo.extractRow(rowIndex, "A1", "C1"));
     // Extract a formatted text for the entire sheet (only A1 and C1 columns)
     System.out.println(sheetInfo.extractSheet("A1", "C1"));
      

    For setting a formatter DocumentFormatter property is used.

     // Create a formatted text extractor for spreadsheets
     CellsFormattedTextExtractor extractor = new CellsFormattedTextExtractor(stream);
     // Set a markdown formatter for formatting
     extractor.setDocumentFormatter(new MarkdownDocumentFormatter()); // all the text will be formatted as Markdown
      

    By default a text is formatted as a plain text by PlainDocumentFormatter.

    • Constructor Detail

      • CellsFormattedTextExtractor

        public CellsFormattedTextExtractor(String fileName)

        Initializes a new instance of the CellsFormattedTextExtractor class.

        Parameters:
        fileName - The path to the file.
      • CellsFormattedTextExtractor

        public CellsFormattedTextExtractor(String fileName,
                                   LoadOptions loadOptions)

        Initializes a new instance of the CellsFormattedTextExtractor class.

        Parameters:
        fileName - The path to the file.
        loadOptions - The options of loading the file.
      • CellsFormattedTextExtractor

        public CellsFormattedTextExtractor(InputStream stream)

        Initializes a new instance of the CellsFormattedTextExtractor class.

        Parameters:
        stream - The stream of the document.
      • CellsFormattedTextExtractor

        public CellsFormattedTextExtractor(InputStream stream,
                                   LoadOptions loadOptions)

        Initializes a new instance of the CellsFormattedTextExtractor class.

        Parameters:
        stream - The stream of the document.
        loadOptions - The options of loading the file.
    • Method Detail

      • getDocumentFormatter

        public DocumentFormatter getDocumentFormatter()

        Gets a DocumentFormatter.

        Specified by:
        getDocumentFormatter in interface ITextExtractorWithFormatter
        Returns:
        An instance of the DocumentFormatter. The default is PlainDocumentFormatter.


        By default the value is an instance of PlainDocumentFormatter class. You can set any other formatter or null, if you want to use default formatter.

      • setDocumentFormatter

        public void setDocumentFormatter(DocumentFormatter value)

        Sets a DocumentFormatter.

        Specified by:
        setDocumentFormatter in interface ITextExtractorWithFormatter
        Parameters:
        value - An instance of the DocumentFormatter. The default is PlainDocumentFormatter.


        By default the value is an instance of PlainDocumentFormatter class. You can set any other formatter or null, if you want to use default formatter.

      • extractHighlights

        public List<String> extractHighlights(HighlightOptions... highlightOptions)

        Extracts highlights.

        Specified by:
        extractHighlights in interface IHighlightExtractor
        Parameters:
        highlightOptions - A collection of HighlightOptions.


        Supports only the extraction with Mode = FixedWidth.

        Returns:
        A collection of strings that represent highlights. If no highlight is found, a collection is empty.
        Throws:
        UnsupportedOperationException - Mode is not FixedWith.
      • reset

        public void reset()

        Resets the current document.


        Resets the cursor's position. ExtractLine method will return the first line of the document.

        Overrides:
        reset in class CellsTextExtractorBase
      • extractPage

        public String extractPage(int pageIndex)
        Description copied from interface: IPageTextExtractor

        Extracts all characters from the page with pageIndex and returns the data as a string.

        Specified by:
        extractPage in interface IPageTextExtractor
        Parameters:
        pageIndex - The index of the page.
        Returns:
        A string that contains all characters from the page, or null if all characters have been extracted.