setasign\SetaPDF2\Extractor\Sorter

BaselineSorter A sorter class that sorts lines by comparing the baseline of text items.

File: /SetaPDF v2/Extractor/Sorter/BaselineSorter.php
Old class name (alias): \SetaPDF_Extractor_Sorter_Baseline

Class hierarchy

Summary

Properties

$_baselineThreshold

protected float BaselineSorter::$_baselineThreshold = 0.7

Threshold which keeps items on the same line.

$_matrix

A temporary matrix used in the sort process.


Methods

_getItemsByOrientation()

Groups items by their orientation.

Parameters
$textItems : \setasign\SetaPDF2\Extractor\TextItem[]
 
Return Values

An array grouped by the orientation holding both their items (grouped by their ordinate) and the associated orientation matrix.

_sortLinesVerticallyThenHorizontally()

Sort lines vertically then horizontally.

Parameters
$lines : array<string, array{items: array<string, \setasign\SetaPDF2\Extractor\TextItem[]>, matrix: \setasign\SetaPDF2\Core\Geometry\Matrix}>
 

getBaselineThreshold()

public BaselineSorter::getBaselineThreshold (
void
): float

Get the threshold which keeps items on the same line.

groupByLines()

Groups all text items by lines.

Parameters
$textItems : \setasign\SetaPDF2\Extractor\TextItem[]

The text items

horizontallyThenVertically()

itemsJoining()

Checks if two items joining each other.

Parameters
$a : \setasign\SetaPDF2\Extractor\TextItem

Item A.

$b : \setasign\SetaPDF2\Extractor\TextItem

Item B.

$spaceWidthFactor : float

The space width factor.

Exceptions

Throws \setasign\SetaPDF2\Core\Exception

setBaselineThreshold()

public BaselineSorter::setBaselineThreshold (
float $threshold
): void

Set the threshold which keeps items on the same line.

Parameters
$threshold : float
 

verticallyThenHorizontally()