Class Extractor in namespace setasign\SetaPDF2\Extractor in SetaPDF v2 API Reference

$page : \setasign\SetaPDF2\Core\Document\Page
$boundaryBox : ?string: If set the page boundary is used to limit the result to the rectangle of the given boundary. See \setasign\SetaPDF2\Core\PageBoundaries::XXX_BOX constants for possible values.

Exceptions

Throws \setasign\SetaPDF2\Core\Exception

Throws \setasign\SetaPDF2\Core\Parser\Pdf\InvalidTokenException

Throws \setasign\SetaPDF2\Core\Type\Exception

getResultByPageNumber()

public Extractor::getResultByPageNumber (

int $pageNumber,
?string $boundaryBox = null

): Result\Collection|Result\Words|Result\WordGroups|string|string[]

Get the result by the default or individual strategy of a specific page by its page number.

Parameters

$pageNumber : int
$boundaryBox : ?string: If set the page boundary is used to limit the result to the rectangle of the given boundary. See \setasign\SetaPDF2\Core\PageBoundaries::XXX_BOX constants for possible values.

void

): Strategy\AbstractStrategy

Get the extraction strategy.

getTextItemsByPage()

public Extractor::getTextItemsByPage (

\setasign\SetaPDF2\Core\Document\Page $page,
?string $boundaryBox = null

): TextItem[]

Get all text items by the default or individual strategy of a specific page by its page object.

These text items can be used to get a result by an individual method of a strategy (e.g. the Strategy\PlainStrategy::getResultByTextItems() method. By using this intermediate state it is possible to use several filters, which may collect the same text-items.

Parameters

$page : \setasign\SetaPDF2\Core\Document\Page
$boundaryBox : ?string: If set the page boundary is used to limit the result to the rectangle of the given boundary. See \setasign\SetaPDF2\Core\PageBoundaries::XXX_BOX constants for possible values.

Exceptions

Throws \setasign\SetaPDF2\Core\Exception

Throws \setasign\SetaPDF2\Core\Parser\Pdf\InvalidTokenException

Throws \setasign\SetaPDF2\Core\Type\Exception

setStrategy()

public Extractor::setStrategy (

Strategy\AbstractStrategy $strategy

): void

Set the extraction strategy.

Parameters

$strategy : Strategy\AbstractStrategy

Index

setasign\SetaPDF2\Extractor

Extractor The main class of the SetaPDF-Extractor Component

Class hierarchy

Summary

Methods

Properties

Constants

Constants

VERSION

Properties

$_document

$_ignoreFaultyStreams

$_strategy

Methods

__construct()

Parameters

cleanUp()

getResultByPage()

Parameters

Exceptions

getResultByPageNumber()

Parameters

Exceptions

See

getStrategy()

getTextItemsByPage()

Parameters

Exceptions

setStrategy()

Parameters