SetaPDF_Extractor_Strategy_Plain Extraction strategy for plain text.

File: /SetaPDF/Extractor/Strategy/Plain.php

Class hierarchy

Summary

Properties

$_cleanStreamCallback

A callback that is called before processing a stream.

$_graphicState

$_items

$_lastMatrix

$_resources

The stream resources dictionary.

$_sorter

$_textCount

A text item counter.

$spaceWidthFactor

The fonts space character width is devided by this factor to define the minimum space for a character separator.


Methods

__construct()

The constructor.

_accept()

Proxy method that forwards the call to a filter instance if available.

Parameters
$textItem : SetaPDF_Extractor_TextItem
 
See

_cleanResult()

public string SetaPDF_Extractor_Strategy_Plain::_cleanResult ( $result $result )

Callback to clean up the resulting text.

Parameters
$result : $result
 

_getParser()

Creates the content stream parser.

Parameters
$stream : string
 

_getSubInstance()

Get an instance of the same strategy for processing an other stream (e.g. a Form XObject stream).

Parameters
$gs : SetaPDF_Core_Canvas_GraphicState
 

_onAfterShowText()

public void SetaPDF_Extractor_Strategy_Plain::_onAfterShowText ( string $rawString )

Callback that is called after a show text operation was invoked.

Parameters
$rawString : string
 

_onBeforeShowText()

Callback that is called before a show text operation is invoked.

_onBeginOrEndText()

public void SetaPDF_Extractor_Strategy_Plain::_onBeginOrEndText ( array $arguments, string $operator )

Callback for begin or end text operators (BT/ET).

Parameters
$arguments : array
 
$operator : string
 

_onCurrentTransformationMatrix()

public void SetaPDF_Extractor_Strategy_Plain::_onCurrentTransformationMatrix ( array $arguments, string $operator )

Callback for ctm changes (cm).

Parameters
$arguments : array
 
$operator : string
 

_onFormXObject()

public void SetaPDF_Extractor_Strategy_Plain::_onFormXObject ( array $arguments, string $operator )

Callback for painting a specified XObject.

Parameters
$arguments : array
 
$operator : string
 
Exceptions

Throws SetaPDF_Exception_NotImplemented

_onGraphicStateChange()

public void SetaPDF_Extractor_Strategy_Plain::_onGraphicStateChange ( array $arguments, string $operator )

Callback for graphic state changes operators (q/Q).

Parameters
$arguments : array
 
$operator : string
 

_onInlineImage()

public void SetaPDF_Extractor_Strategy_Plain::_onInlineImage ( array $arguments, string $operator )

Callback for inline image operator

Parameters
$arguments : array
 
$operator : string
 

_onTextPosition()

public void SetaPDF_Extractor_Strategy_Plain::_onTextPosition ( array $arguments, string $operator )

Callback for text position operators.

Parameters
$arguments : array
 
$operator : string
 

_onTextShow()

public void SetaPDF_Extractor_Strategy_Plain::_onTextShow ( array $arguments, string $operator )

Callback for text show operators.

Parameters
$arguments : array
 
$operator : string
 

_onTextState()

public void SetaPDF_Extractor_Strategy_Plain::_onTextState ( array $arguments, string $operator )

Callback for text state operators.

All states has to be passed to the current graphic state as defined in PDF 32000-1:2008, Table 52 on page 121.

Parameters
$arguments : array
 
$operator : string
 
Exceptions

Throws SetaPDF_Extractor_Exception

_saveLastMatrix()

protected void SetaPDF_Extractor_Strategy_Plain::_saveLastMatrix ( string $type )

Saves the last matrix by a specific type.

Parameters
$type : string
 

getCleanStreamCallback()

Get the callback that is called before a stream is processed.

getGraphicState()

getResult()

public string SetaPDF_Extractor_Strategy_Plain::getResult ( string $stream, SetaPDF_Core_Type_Dictionary $resources [, boolean $cleanUp = false ] )

Get the plain text from a stream.

Parameters
$stream : string
 
$resources : SetaPDF_Core_Type_Dictionary
 
$cleanUp : boolean

Defines if the cleanUp() method of the text items should be called to release cycled references and memory. This parameter should only be used for PHP below 5.3.

getSorter()

Get the sorter instance.

If none was set a base line sorter is created automatically.

process()

Processes a stream through the plain text strategy.

Parameters
$stream : string
 
$resources : SetaPDF_Core_Type_Dictionary
 

setCleanStreamCallback()

public void SetaPDF_Extractor_Strategy_AbstractStrategy::setCleanStreamCallback ( [ callable|null $callback = null ] )

Set a callback that is called before processing a stream.

Parameters
$callback : callable|null
 

setFilter()

setGraphicState()

Set the graphic state.

Parameters
$graphicState : SetaPDF_Core_Canvas_GraphicState
 

setSorter()

Set a sorter instance.

Parameters
$sorter : SetaPDF_Extractor_Sorter