SetaPDF_Extractor_ContentStreamCleaner Helper class to clean up content streams.

File: /SetaPDF v2/Extractor/ContentStreamCleaner.php

Class hierarchy

Summary

Constants

REGEX_COLORS

public const string SetaPDF_Extractor_ContentStreamCleaner::REGEX_COLORS = '/(?<=[}\\]\\x00\\x09\\x0A\\x0C\\x0D\\x20]|^)([\\d\\.\\-]+[\\x00\\x09\\x0A\\x0C\\x0D\\x20]+){1,4}(k|K|SC|sc|SCN|scn|rg|RG|g|G)(?=[\\x00\\x09\\x0A\\x0C\\x0D\\x20{\\[\\/]|$)/S'

Constant defining a regex for color operators.

REGEX_PATHOPERATORS

public const string SetaPDF_Extractor_ContentStreamCleaner::REGEX_PATHOPERATORS = '/(?<=[}\\]\\x00\\x09\\x0A\\x0C\\x0D\\x20]|^)([\\d\\.\\-]+[\\x00\\x09\\x0A\\x0C\\x0D\\x20]+){0,6}(m|l|c|v|y|re|h|S|s|f|F|f\\*|B|B\\*|b|b\\*|n|W|W\\*)(?=[\\x00\\x09\\x0A\\x0C\\x0D\\x20{\\[\\/]|$)/S'

Constant defining a regex for path operators.

TYPE_ALL

Constant defining all content types.

TYPE_INLINE_IMAGE

Constant defining a content type.

TYPE_NONE

Constant defining a content type.

TYPE_OPERATOR

Constant defining a content type.

TYPE_STRING

Constant defining a content type.


Static Methods

_strposa()

private static SetaPDF_Extractor_ContentStreamCleaner::_strposa (
string $haystack, array $needles [, int $offset = 0 ]
): bool|int

Searches for the closest needle in the string.

If there is no needle in the string, it will return false.

Parameters
$haystack : string
 
$needles : array
 
$offset : int
 

clean()

public static SetaPDF_Extractor_ContentStreamCleaner::clean (
string|array $data, array $regexes [, int $target = SetaPDF_Extractor_ContentStreamCleaner::TYPE_OPERATOR ]
): string

Cleans a content stream string by using regexes on the chosen targets.

The regexes will NOT affect literal string objects.

Parameters
$data : string|array
 
$regexes : array
 
$target : int
 

splitStream()

Splits a content stream string into literal strings, inline images and operators (all left).

The pieces offer information about their type.

Parameters
$string : string
 
$ignore : int