SetaPDF_Extractor_Strategy_Plain Extraction strategy for plain text.

File: /SetaPDF/Extractor/Strategy/Plain.php

Class hierarchy

Summary

Properties

$_graphicState

$_items

$_lastMatrix

$_resources

The stream resources dictionary.

$_sorter

$_textCount

A text item counter.

$spaceWidthFactor

The fonts space character width is devided by this factor to define the minimum space for a character separator.


Methods

_accept()

protected bool SetaPDF_Extractor_Strategy_AbstractStrategy::_accept ( SetaPDF_Extractor_TextItem $textItem )

Proxy method that forwards the call to a filter instance if available.

Parameters
$textItem : SetaPDF_Extractor_TextItem
 
See

_cleanResult()

public string SetaPDF_Extractor_Strategy_Plain::_cleanResult ( $result $result )

Callback to clean up the resulting text.

Parameters
$result : $result
 

_getParser()

protected SetaPDF_Core_Parser_Content SetaPDF_Extractor_Strategy_Plain::_getParser ( string $stream )

Creates the content stream parser.

Parameters
$stream : string
 

_onAfterShowText()

public void SetaPDF_Extractor_Strategy_Plain::_onAfterShowText ( string $rawString )

Callback that is called after a show text operation was invoked.

Parameters
$rawString : string
 

_onBeforeShowText()

public void SetaPDF_Extractor_Strategy_Plain::_onBeforeShowText ( void )

Callback that is called before a show text operation is invoked.

_onBeginOrEndText()

public void SetaPDF_Extractor_Strategy_Plain::_onBeginOrEndText ( array $arguments, string $operator )

Callback for begin or end text operators (BT/ET).

Parameters
$arguments : array
 
$operator : string
 

_onCurrentTransformationMatrix()

public void SetaPDF_Extractor_Strategy_Plain::_onCurrentTransformationMatrix ( array $arguments, string $operator )

Callback for ctm changes (cm).

Parameters
$arguments : array
 
$operator : string
 

_onFormXObject()

public void SetaPDF_Extractor_Strategy_Plain::_onFormXObject ( array $arguments, string $operator )

Callback for painting a specified XObject.

Parameters
$arguments : array
 
$operator : string
 
Exceptions

Throws SetaPDF_Exception_NotImplemented

_onGraphicStateChange()

public void SetaPDF_Extractor_Strategy_Plain::_onGraphicStateChange ( array $arguments, string $operator )

Callback for graphic state changes operators (q/Q).

Parameters
$arguments : array
 
$operator : string
 

_onInlineImage()

public void SetaPDF_Extractor_Strategy_Plain::_onInlineImage ( array $arguments, string $operator )

Callback for inline image operator

Parameters
$arguments : array
 
$operator : string
 

_onTextPosition()

public void SetaPDF_Extractor_Strategy_Plain::_onTextPosition ( array $arguments, string $operator )

Callback for text position operators.

Parameters
$arguments : array
 
$operator : string
 

_onTextShow()

public void SetaPDF_Extractor_Strategy_Plain::_onTextShow ( array $arguments, string $operator )

Callback for text show operators.

Parameters
$arguments : array
 
$operator : string
 

_onTextState()

public void SetaPDF_Extractor_Strategy_Plain::_onTextState ( array $arguments, string $operator )

Callback for text state operators.

All states has to be passed to the current graphic state as defined in PDF 32000-1:2008, Table 52 on page 121.

Parameters
$arguments : array
 
$operator : string
 
Exceptions

Throws SetaPDF_Extractor_Exception

_saveLastMatrix()

protected void SetaPDF_Extractor_Strategy_Plain::_saveLastMatrix ( string $type )

Saves the last matrix by a specific type.

Parameters
$type : string
 

getFilter()

public null|SetaPDF_Extractor_Filter_FilterInterface SetaPDF_Extractor_Strategy_AbstractStrategy::getFilter ( void )

Get the filter.

getGraphicState()

public SetaPDF_Core_Canvas_GraphicState SetaPDF_Extractor_Strategy_Plain::getGraphicState ( void )

Get the graphic state.

getResult()

public string SetaPDF_Extractor_Strategy_Plain::getResult ( string $stream, SetaPDF_Core_Type_Dictionary $resources [, boolean $cleanUp = false ] )

Get the plain text from a stream.

Parameters
$stream : string
 
$resources : SetaPDF_Core_Type_Dictionary
 
$cleanUp : boolean

Defines if the cleanUp() method of the text items should be called to release cycled references and memory. This parameter should only be used for PHP below 5.3.

getSorter()

public SetaPDF_Extractor_Sorter|SetaPDF_Extractor_Sorter_Baseline SetaPDF_Extractor_Strategy_Plain::getSorter ( void )

Get the sorter instance.

If none was set a base line sorter is created automatically.

process()

public SetaPDF_Extractor_TextItem[] SetaPDF_Extractor_Strategy_Plain::process ( string $stream, SetaPDF_Core_Type_Dictionary $resources )

Processes a stream through the plain text strategy.

Parameters
$stream : string
 
$resources : SetaPDF_Core_Type_Dictionary
 

setFilter()

public void SetaPDF_Extractor_Strategy_AbstractStrategy::setFilter ( [ SetaPDF_Extractor_Filter_FilterInterface|null $filter = null ] )

Set a filter.

Parameters
$filter : SetaPDF_Extractor_Filter_FilterInterface|null
 

setGraphicState()

public void SetaPDF_Extractor_Strategy_Plain::setGraphicState ( SetaPDF_Core_Canvas_GraphicState $graphicState )

Set the graphic state.

Parameters
$graphicState : SetaPDF_Core_Canvas_GraphicState
 

setSorter()

public void SetaPDF_Extractor_Strategy_Plain::setSorter ( SetaPDF_Extractor_Sorter $sorter )

Set a sorter instance.

Parameters
$sorter : SetaPDF_Extractor_Sorter