com.groupdocs.parser

Interfaces

Classes

Exceptions

com.groupdocs.parser

Class CellsTextExtractor

  • All Implemented Interfaces:
    IHighlightExtractor, IPageTextExtractor, IRegexSearchable, ISearchable, IStructuredExtractor, AutoCloseable


    public final class CellsTextExtractor
    extends CellsTextExtractorBase
    implements ISearchable, IHighlightExtractor, IRegexSearchable, IStructuredExtractor, IPageTextExtractor

    Provides the text extractor for spreadsheets.


    Supported formats:

    .XLSMicrosoft Excel Spreadsheet
    .XLSXMicrosoft Office Open XML Workbook
    .XLSMMicrosoft Excel 2007 Macro-Enabled Workbook
    .XLSBMicrosoft Excel 2007 Binary Workbook
    .ODSOpenDocument spreadsheet
    .CSVComma Separated Values text file

    Extracting a text from a spreadsheet:

     // Create a text extractor for spreadsheets
     CellsTextExtractor extractor = new CellsTextExtractor(stream);
     // Extract a text
     System.out.println(extractor.extractAll());
      

    Extracting by sheets:

     // Create a text extractor for spreadsheets
     CellsTextExtractor extractor = new CellsTextExtractor(stream);
     // Extract a text from the sheet which index is sheetIndex
     System.out.println(extractor.extractSheet(sheetIndex));
      

    Extracting the information about the sheet:

     // Create a text extractor for spreadsheets
     CellsTextExtractor extractor = new CellsTextExtractor(stream);
     // Get a sheet info for sheet which index is sheetIndex
     CellsSheetInfo sheetInfo = extractor.getSheetInfo(sheetIndex);
     System.out.println(String.format("Name: %s", sheetInfo.getName())); // sheet's name
     System.out.println(String.format("Index: %d", sheetInfo.getIndex())); // sheet's index
     System.out.println(String.format("Rows Count: %d", sheetInfo.getRowCount())); // a total number of the rows in the sheet
     System.out.println("Columns");
     // Iterate sheet's columns
     for(int i = 0; i<sheetInfo.getColumnNames().size(); i++)
     {
         // Print a name of the column which index is i
         System.out.println(String.format("%d %s", 
                 sheetInfo.getColumnNames().get(i), 
                 i + 1 < sheetInfo.getColumnNames().size()? ";" : ""));
     }
      

    Extracting by rows:

     // Create a text extractor for spreadsheets
     CellsTextExtractor extractor = new CellsTextExtractor(stream);
     // Get a sheet info for sheet which index is sheetIndex
     CellsSheetInfo sheetInfo = extractor.getSheetInfo(sheetIndex);
     // Extract a text for the row which index is rowIndex
     System.out.println(sheetInfo.extractRow(rowIndex));
      

    Extracting only the selected columns:

     // Create a text extractor for spreadsheets
     CellsTextExtractor extractor = new CellsTextExtractor(stream);
     // Get a sheet info for sheet which index is sheetIndex
     CellsSheetInfo sheetInfo = extractor.getSheetInfo(sheetIndex);
      
     // Extract a text for the row which index is rowIndex (only A1 and C1 columns)
     System.out.println(sheetInfo.extractRow(rowIndex, "A1", "C1"));
     // Extract a text for the entire sheet (only A1 and C1 columns)
     System.out.println(sheetInfo.extractSheet("A1", "C1"));
      
    • Constructor Detail

      • CellsTextExtractor

        public CellsTextExtractor(String fileName)

        Initializes a new instance of the CellsTextExtractor class.

        Parameters:
        fileName - The path to the file.
      • CellsTextExtractor

        public CellsTextExtractor(String fileName,
                          LoadOptions loadOptions)

        Initializes a new instance of the CellsTextExtractor class.

        Parameters:
        fileName - The path to the file.
        loadOptions - The options of loading the file.
      • CellsTextExtractor

        public CellsTextExtractor(InputStream stream)

        Initializes a new instance of the CellsTextExtractor class.

        Parameters:
        stream - The stream of the document.
      • CellsTextExtractor

        public CellsTextExtractor(InputStream stream,
                          LoadOptions loadOptions)

        Initializes a new instance of the CellsTextExtractor class.

        Parameters:
        stream - The stream of the document.
        loadOptions - The options of loading the file.
    • Method Detail

      • getExtractMode

        public int getExtractMode()

        Gets a value indicating the mode of text extraction.

        Returns:
        The mode of text extraction. The default is Standard.
      • setExtractMode

        public void setExtractMode(int value)

        Sets a value indicating the mode of text extraction.

        Parameters:
        value - The mode of text extraction. The default is Standard.
      • extractSheet

        public String extractSheet(int sheetIndex)

        Extracts all characters from the sheet with sheetIndex and returns the data as a string.

        Overrides:
        extractSheet in class CellsTextExtractorBase
        Parameters:
        sheetIndex - The index of the sheet.
        Returns:
        A string that contains all characters from the sheet, or null if all characters have been extracted.
      • search

        public void search(SearchOptions options,
                  ISearchHandler handler,
                  List<String> keywords)

        Searches the keywords.

        Specified by:
        search in interface ISearchable
        Parameters:
        options - Options for searching.
        handler - An instance of the search handler.
        keywords - A collection of words to search.
      • search

        public void search(SearchOptions options,
                  ISearchHandler handler,
                  ISearchEngine searchEngine,
                  List<String> keywords)

        Searches the keywords.

        Specified by:
        search in interface ISearchable
        Parameters:
        options - Options for searching.
        handler - An instance of the search handler.
        searchEngine - An instance of the search engine.
        keywords - A collection of words to search.
      • extractHighlights

        public List<String> extractHighlights(HighlightOptions... highlightOptions)

        Extracts highlights.

        Specified by:
        extractHighlights in interface IHighlightExtractor
        Parameters:
        highlightOptions - A collection of HighlightOptions.
        Returns:
        A collection of strings that represent highlights. If no highlight is found, a collection is empty.
      • extractPage

        public String extractPage(int pageIndex)
        Description copied from interface: IPageTextExtractor

        Extracts all characters from the page with pageIndex and returns the data as a string.

        Specified by:
        extractPage in interface IPageTextExtractor
        Parameters:
        pageIndex - The index of the page.
        Returns:
        A string that contains all characters from the page, or null if all characters have been extracted.
      • dispose

        protected void dispose(boolean disposing)

        Releases the unmanaged resources used by the extractor.

        Overrides:
        dispose in class CellsTextExtractorBase
        Parameters:
        disposing - A boolean true if invoked from Dispose; otherwise, false.
      • extractText

        protected String extractText()

        Extracts all characters from the current position to the end of the text extractor and returns them as one string.

        Overrides:
        extractText in class CellsTextExtractorBase
        Returns:
        A string that contains all characters from the current position to the end of the text extractor.
      • extractTextLine

        protected String extractTextLine()
        Description copied from class: CellsTextExtractorBase

        Extracts a line of characters from the text extractor and returns the data as a string.

        Overrides:
        extractTextLine in class CellsTextExtractorBase
        Returns:
        The next line from the extractor, or null if all characters have been extracted.