SetaPDF_Extractor_Sorter_Baseline A sorter class that sorts lines by comparing the baseline of text items.
File: /SetaPDF v2/Extractor/Sorter/Baseline.php
Class hierarchy
Summary
Properties
$_matrix
protected null|SetaPDF_Core_Geometry_Matrix SetaPDF_Extractor_Sorter::$_matrix
A temporary matrix used in the sort process.
Methods
_getItemsByOrientation()
protected SetaPDF_Extractor_Sorter::_getItemsByOrientation (
SetaPDF_Extractor_TextItem[] $textItems
): array<string, array{items: array<string, SetaPDF_Extractor_TextItem[]>, matrix: SetaPDF_Core_Geometry_Matrix}>Groups items by their orientation.
Parameters
- $textItems : SetaPDF_Extractor_TextItem[]
Return Values
An array grouped by the orientation holding both their items (grouped by their ordinate) and the associated orientation matrix.
_sortLinesVerticallyThenHorizontally()
protected SetaPDF_Extractor_Sorter::_sortLinesVerticallyThenHorizontally (
array<string, array{items: array<string, SetaPDF_Extractor_TextItem[]>, matrix: SetaPDF_Core_Geometry_Matrix}> $lines
): array<int, SetaPDF_Extractor_TextItem>Sort lines vertically then horizontally.
Parameters
- $lines : array<string, array{items: array<string, SetaPDF_Extractor_TextItem[]>, matrix: SetaPDF_Core_Geometry_Matrix}>
groupByLines()
public SetaPDF_Extractor_Sorter_Baseline::groupByLines (
SetaPDF_Extractor_TextItem[] $textItems
): arrayGroups all text items by lines.
Parameters
- $textItems : SetaPDF_Extractor_TextItem[]
The text items
horizontallyThenVertically()
A sort callback that sort first horizontally then vertically.
Parameters
Exceptions
Throws SetaPDF_Core_Exception
See
isOnSameLine()
public SetaPDF_Extractor_Sorter_Baseline::isOnSameLine (
SetaPDF_Extractor_TextItem $a, SetaPDF_Extractor_TextItem $b [, SetaPDF_Core_Geometry_Matrix $matrix = null ]
): boolChecks whether two items are on the same line or not.
Parameters
- $a : SetaPDF_Extractor_TextItem
- $b : SetaPDF_Extractor_TextItem
- $matrix : SetaPDF_Core_Geometry_Matrix
itemsJoining()
Checks if two items joining each other.
Parameters
- $a : SetaPDF_Extractor_TextItem
Item A.
- $b : SetaPDF_Extractor_TextItem
Item B.
- $spaceWidthFactor : float
The space width factor.
Exceptions
Throws SetaPDF_Core_Exception
setBaselineThreshold()
Set the threshold which keeps items on the same line.
Parameters
- $threshold : float
verticallyThenHorizontally()
A sort callback that sort first vertically then horizontally.
Parameters
Exceptions
Throws SetaPDF_Core_Exception