SetaPDF_Extractor_ContentStreamCleaner Helper class to clean up content streams.

File: /SetaPDF/Extractor/ContentStreamCleaner.php

Class hierarchy

Summary

Constants

REGEX_COLORS

const string SetaPDF_Extractor_ContentStreamCleaner::REGEX_COLORS = '/(?<=[}\\]\\x00\\x09\\x0A\\x0C\\x0D\\x20]|^)([\\d\\.\\-]+[\\x00\\x09\\x0A\\x0C\\x0D\\x20]+){1,4}(k|K|SC|sc|SCN|scn|rg|RG|g|G)(?=[\\x00\\x09\\x0A\\x0C\\x0D\\x20{\\[\\/]|$)/S'

Constant defining a regex for color operators.

REGEX_PATHOPERATORS

const string SetaPDF_Extractor_ContentStreamCleaner::REGEX_PATHOPERATORS = '/(?<=[}\\]\\x00\\x09\\x0A\\x0C\\x0D\\x20]|^)([\\d\\.\\-]+[\\x00\\x09\\x0A\\x0C\\x0D\\x20]+){0,6}(m|l|c|v|y|re|h|S|s|f|F|f\\*|B|B\\*|b|b\\*|n|W|W\\*)(?=[\\x00\\x09\\x0A\\x0C\\x0D\\x20{\\[\\/]|$)/S'

Constant defining a regex for path operators.

TYPE_ALL

Constant defining all content types.

TYPE_INLINE_IMAGE

Constant defining a content type.

TYPE_NONE

Constant defining a content type.

TYPE_OPERATOR

Constant defining a content type.

TYPE_STRING

Constant defining a content type.


Static Methods

_strposa()

static private bool|int SetaPDF_Extractor_ContentStreamCleaner::_strposa ( string $haystack, array $needles [, int $offset = 0 ] )

Searches for the closest needle in the string.

If there is no needle in the string, it will return false.

Parameters
$haystack : string
 
$needles : array
 
$offset : int
 

clean()

static public string SetaPDF_Extractor_ContentStreamCleaner::clean ( string|array $data, array $regexes [, int $target = self::TYPE_OPERATOR ] )

Cleans a content stream string by using regexes on the chosen targets.

The regexes will NOT affect literal string objects.

Parameters
$data : string|array
 
$regexes : array
 
$target : int
 

splitStream()

static public array SetaPDF_Extractor_ContentStreamCleaner::splitStream ( string $string [, $ignore = self::TYPE_INLINE_IMAGE ] )

Splits a content stream string into literal strings, inline images and operators (all left).

The pieces offer information about their type.

Parameters
$string : string
 
$ignore :