SetaPDF_Extractor_ContentStreamCleaner Helper class to clean up content streams.

File: /SetaPDF v2/Extractor/ContentStreamCleaner.php

Class hierarchy

Summary

Constants

REGEX_COLORS

const string SetaPDF_Extractor_ContentStreamCleaner::REGEX_COLORS = '/(?<=[}\\]\\x00\\x09\\x0A\\x0C\\x0D\\x20]|^)([\\d\\.\\-]+[\\x00\\x09\\x0A\\x0C\\x0D\\x20]+){1,4}(k|K|SC|sc|SCN|scn|rg|RG|g|G)(?=[\\x00\\x09\\x0A\\x0C\\x0D\\x20{\\[\\/]|$)/S'

Constant defining a regex for color operators.

REGEX_PATHOPERATORS

const string SetaPDF_Extractor_ContentStreamCleaner::REGEX_PATHOPERATORS = '/(?<=[}\\]\\x00\\x09\\x0A\\x0C\\x0D\\x20]|^)([\\d\\.\\-]+[\\x00\\x09\\x0A\\x0C\\x0D\\x20]+){0,6}(m|l|c|v|y|re|h|S|s|f|F|f\\*|B|B\\*|b|b\\*|n|W|W\\*)(?=[\\x00\\x09\\x0A\\x0C\\x0D\\x20{\\[\\/]|$)/S'

Constant defining a regex for path operators.

TYPE_ALL

const integer SetaPDF_Extractor_ContentStreamCleaner::TYPE_ALL = 7

Constant defining all content types.

TYPE_INLINE_IMAGE

const integer SetaPDF_Extractor_ContentStreamCleaner::TYPE_INLINE_IMAGE = 4

Constant defining a content type.

TYPE_NONE

const integer SetaPDF_Extractor_ContentStreamCleaner::TYPE_NONE = 0

Constant defining a content type.

TYPE_OPERATOR

const integer SetaPDF_Extractor_ContentStreamCleaner::TYPE_OPERATOR = 2

Constant defining a content type.

TYPE_STRING

const integer SetaPDF_Extractor_ContentStreamCleaner::TYPE_STRING = 1

Constant defining a content type.


Static Methods

_strposa()

private static SetaPDF_Extractor_ContentStreamCleaner::_strposa (
string $haystack, array $needles [, int $offset = 0 ]
): bool|int

Searches for the closest needle in the string.

If there is no needle in the string, it will return false.

Parameters
$haystack : string
 
$needles : array
 
$offset : int
 

clean()

public static SetaPDF_Extractor_ContentStreamCleaner::clean (
string|array $data, array $regexes [, int $target = self::TYPE_OPERATOR ]
): string

Cleans a content stream string by using regexes on the chosen targets.

The regexes will NOT affect literal string objects.

Parameters
$data : string|array
 
$regexes : array
 
$target : int
 

splitStream()

public static SetaPDF_Extractor_ContentStreamCleaner::splitStream (
string $string [, $ignore = self::TYPE_INLINE_IMAGE ]
): array

Splits a content stream string into literal strings, inline images and operators (all left).

The pieces offer information about their type.

Parameters
$string : string
 
$ignore