PdfTextExtractor Class
Provides the text extractor for PDF documents.
Inheritance Hierarchy

Namespace: GroupDocs.Parser.Extractors.Text
Assembly: GroupDocs.Parser (in GroupDocs.Parser.dll) Version: 18.7
Syntax
public sealed class PdfTextExtractor : TextExtractor, 
	IContainer, IHighlightExtractor, IPageTextExtractor, IRegexSearchable, ISearchable

The PdfTextExtractor type exposes the following members.

Constructors
  NameDescription
Public methodPdfTextExtractor(Stream)
Initializes a new instance of the PdfTextExtractor class.
Public methodPdfTextExtractor(String)
Initializes a new instance of the PdfTextExtractor class.
Public methodPdfTextExtractor(Stream, LoadOptions)
Initializes a new instance of the PdfTextExtractor class.
Public methodPdfTextExtractor(String, LoadOptions)
Initializes a new instance of the PdfTextExtractor class.
Properties
  NameDescription
Public propertyDocumentContent
Gets an access to the document's content.
Public propertyEncoding
Gets or sets an encoding for the document.
(Inherited from TextExtractor.)
Public propertyEntities
Gets a collection of container's entities.
Public propertyExtractMode
Gets or sets a value indicating the mode of text extraction.
Public propertyIsDisposed
Gets a value indicating whether the extractor is disposed.
(Inherited from TextExtractor.)
Public propertyMediaType
Gets or sets a media type for the document.
(Inherited from TextExtractor.)
Public propertyPageCount
Gets a total count of the pages.
Methods
  NameDescription
Public methodDispose
Releases the unmanaged resources used by the extractor.
(Inherited from TextExtractor.)
Public methodEquals
Determines whether the specified Object is equal to the current Object.
(Inherited from Object.)
Public methodExtractAll
Extracts all characters from the current position to the end of the text extractor and returns them as one string.
(Inherited from TextExtractor.)
Public methodExtractHighlights
Extracts highlights.
Public methodExtractLine
Extracts a line of characters from the text extractor and returns the data as a string.
(Inherited from TextExtractor.)
Public methodExtractPage
Reads all characters from the page with pageIndex and returns the data as a string.
Public methodGetHashCode
Serves as a hash function for a particular type.
(Inherited from Object.)
Public methodGetType
Gets the type of the current instance.
(Inherited from Object.)
Public methodOpenEntityStream
Opens a stream with the content of the container's entity.
Public methodReset
Resets the current document.
(Overrides TextExtractorReset.)
Public methodSearch(SearchOptions, ISearchHandler, IListString)
Searches the keywords.
Public methodSearch(SearchOptions, ISearchHandler, ISearchEngine, IListString)
Searches the keywords.
Public methodSearchWithRegex
Searches the expression.
Public methodToString
Returns a string that represents the current object.
(Inherited from Object.)
Examples

Extracting a text from PDF:

C#
// Create a text extractor for PDFs
PdfTextExtractor extractor = new PdfTextExtractor(stream);
// Extract a text
Console.WriteLine(extractor.ExtractAll());

Extracting by pages:

C#
// Create a text extractor for PDFs
PdfTextExtractor extractor = new PdfTextExtractor(stream);
// Iterate pages
for(int pageIndex = 0; pageIndex<extractor.PageCount; pageIndex++)
{
  // Extract a text from the page which index is pageIndex
  Console.WriteLine(extractor.ExtractPage(pageIndex));
}
See Also