setasign\SetaPDF2\Extractor\Strategy

GlyphStrategy Extraction strategy for single glyphs.

File: /SetaPDF v2/Extractor/Strategy/GlyphStrategy.php
Old class name (alias): \SetaPDF_Extractor_Strategy_Glyph

The result of this strategy is not sorted.

Class hierarchy

Summary

Properties

$_boundaryFilter

$_cleanStreamCallback

A callback that is called before processing a stream.

$_graphicState

The graphic state instance.

$_ignoreFaultyStreams

protected bool AbstractStrategy::$_ignoreFaultyStreams = false

Defines wether to continue when a stream cannot be decoded or not.

$_ignoreSpaceCharacter

protected bool GlyphStrategy::$_ignoreSpaceCharacter = false

Defines whether space characters should be ignored or not.

$_items

The text items.

$_keepIntersectingSpaces

protected bool PlainStrategy::$_keepIntersectingSpaces = false

Defines whether intersecting spaces should be ignored or not.

$_resources

The stream resources dictionary.

$_sorter

The sorter instance.

$_textCount

protected int PlainStrategy::$_textCount = 0

A text item counter.

$spaceWidthFactor

public float PlainStrategy::$spaceWidthFactor = 2.0

A factor to calculate whether a distance can be seen as a character separator.

The fonts space character width is divided by this factor to define the minimum space for a character separator.


Methods

__construct()

public AbstractStrategy::__construct (
void
)

The constructor.

_accept()

protected GlyphStrategy::_accept (): bool|string

Proxy method that forwards the call to a filter instance if available.

This strategy filters space characters automatically if specified (see setIgnoreSpaceCharacter().

Parameters
$textItem : \SetaPDF_Extractor_TextItem
 
Exceptions

Throws \setasign\SetaPDF2\Core\Exception

Throws \setasign\SetaPDF2\Extractor\Exception

See

_cleanResult()

public PlainStrategy::_cleanResult (
string $result
): string

Callback to clean up the resulting text.

Parameters
$result : string
 

_getParser()

Creates the content stream parser.

Parameters
$stream : string
 

_getSubInstance()

Get an instance of the same strategy for processing another stream (e.g. a Form XObject stream).

Parameters
$gs : \SetaPDF_Core_Canvas_GraphicState
 

_ignore()

protected PlainStrategy::_ignore (
string $string,
string $prevString,
\SetaPDF_Extractor_TextItem $item,
\SetaPDF_Extractor_TextItem $prevItem
): bool

Method to allow implementation of individual logic.

Parameters
$string : string
 
$prevString : string
 
$item : \SetaPDF_Extractor_TextItem
 
$prevItem : \SetaPDF_Extractor_TextItem
 

_onBeginOrEndText()

public PlainStrategy::_onBeginOrEndText (
array $arguments,
string $operator
): void

Callback for begin or end text operators (BT/ET).

Parameters
$arguments : array
 
$operator : string
 

_onCurrentTransformationMatrix()

public PlainStrategy::_onCurrentTransformationMatrix (
array $arguments,
string $operator
): void

Callback for ctm changes (cm).

Parameters
$arguments : array
 
$operator : string
 

_onFormXObject()

public PlainStrategy::_onFormXObject (
array $arguments,
string $operator
): void

Callback for painting a specified XObject.

Parameters
$arguments : array
 
$operator : string
 
Exceptions

Throws \setasign\SetaPDF2\Core\Exception

Throws \setasign\SetaPDF2\Core\Filter\Exception

Throws \setasign\SetaPDF2\Core\Parser\Pdf\InvalidTokenException

Throws \setasign\SetaPDF2\Core\Type\Exception

Throws \setasign\SetaPDF2\Exception

Throws \setasign\SetaPDF2\NotImplementedException

_onGraphicStateChange()

public PlainStrategy::_onGraphicStateChange (
array $arguments,
string $operator
): void

Callback for graphic state changes operators (q/Q).

Parameters
$arguments : array
 
$operator : string
 

_onInlineImage()

public PlainStrategy::_onInlineImage (
array $arguments,
string $operator
): false|void

Callback for inline image operator

Parameters
$arguments : array
 
$operator : string
 

_onTextPosition()

public PlainStrategy::_onTextPosition (
array $arguments,
string $operator
): void

Callback for text position operators.

Parameters
$arguments : array
 
$operator : string
 

_onTextShow()

public PlainStrategy::_onTextShow (
array $arguments,
string $operator
): void

Callback for text show operators.

Parameters
$arguments : array
 
$operator : string
 
Exceptions

Throws \setasign\SetaPDF2\Core\Exception

_onTextState()

public PlainStrategy::_onTextState (
array $arguments,
string $operator
): void

Callback for text state operators.

All states has to be passed to the current graphic state as defined in PDF 32000-1:2008, Table 52 on page 121.

Parameters
$arguments : array
 
$operator : string
 
Exceptions

Throws \setasign\SetaPDF2\Extractor\Exception

_showText()

protected GlyphStrategy::_showText (
string $string
): void

Method that shows text.

Parameters
$string : string
 
Exceptions

Throws \setasign\SetaPDF2\Core\Exception

_showTextStrings()

public PlainStrategy::_showTextStrings (
array $textStrings
): void

Callback that is called if text strings should be shown.

Parameters
$textStrings : array
 
Exceptions

Throws \setasign\SetaPDF2\Core\Exception

getCleanStreamCallback()

public AbstractStrategy::getCleanStreamCallback (
void
): ?callable

Get the callback that is called before a stream is processed.

getFilter()

getGraphicState()

Get the graphic state.

getIgnoreSpaceCharacter()

public GlyphStrategy::getIgnoreSpaceCharacter (
void
): bool

Gets whether a space character should be fetched or not.

getKeepIntersecingSpaces()

WARNING: This method is marked as deprecated!

Use getKeepIntersectingSpaces() instead.

getKeepIntersectingSpaces()

Get a flag which defines whether intersecting spaces are ignored or not.

getResult()

Get all resolved glyphs.

Parameters
$stream : string
 
$resources : \SetaPDF_Core_Type_Dictionary
 
Exceptions

Throws \setasign\SetaPDF2\Core\Exception

getSorter()

Get the sorter instance.

If none was set a baseline sorter is created automatically.

process()

Processes a stream through the plain text strategy.

Parameters
$stream : string
 
$resources : \SetaPDF_Core_Type_Dictionary
 
Exceptions

Throws \setasign\SetaPDF2\Core\Exception

Throws \setasign\SetaPDF2\Core\Parser\Pdf\InvalidTokenException

setBoundary()

Sets the boundary for the current strategy.

Parameters
$boundary : ?\SetaPDF_Core_Geometry_Rectangle
 

setCleanStreamCallback()

public AbstractStrategy::setCleanStreamCallback (
?callable $callback = null
): void

Set a callback that is called before processing a stream.

Parameters
$callback : ?callable
 

setFilter()

Set a filter.

Parameters
$filter : ?\SetaPDF_Extractor_Filter_FilterInterface
 

setGraphicState()

Set the graphic state.

Parameters
$graphicState : \SetaPDF_Core_Canvas_GraphicState
 

setIgnoreFaultyStreams()

public AbstractStrategy::setIgnoreFaultyStreams (
bool $ignoreFaultyStreams
): void

Define wether to continue when a stream cannot be decoded or not.

Parameters
$ignoreFaultyStreams : bool
 

setIgnoreSpaceCharacter()

public GlyphStrategy::setIgnoreSpaceCharacter (
bool $ignoreSpaceCharacter = true
): void

Defines whether a space character should be fetched or not.

If this is set to true, the strategy will use the found space character as a delimiter. If this is set to false (default), the strategy will calculate a delimiter by the distance of 2 characters/glyphs.

Parameters
$ignoreSpaceCharacter : bool
 

setKeepIntersectingSpaces()

public PlainStrategy::setKeepIntersectingSpaces (
bool $keep = true
): void

Set a flag which defines whether interacting spaces are ignored or not.

By default, this is set to false which removes a space or white-space character which intersects with another character for more than 55 percent.

Parameters
$keep : bool
 

setSorter()

Set a sorter instance.

Parameters
$sorter : \SetaPDF_Extractor_Sorter