Glyph Strategy Extracts Glyphs and Metrics
Table of Contents
Introduction
The glyph strategy allows you to extract single glyphs from PDF documents. It is represented by the class SetaPDF_Extractor_Strategy_Glyph
.
The result will be an instance of SetaPDF_Extractor_Result_Collection
(more details are available here). Each glyph in the collection is represented by an instance of SetaPDF_Extractor_Result_Glyph
.
Process
This strategy extracts each single glyph including its metrics in the order in which it appears in the PDF data stream. The result is NOT sorted.
The result may be used for further processing by another strategy or text analyses.
Usage
An instance has to be created individually and passed to the main class:
$glyphStrategy = new \SetaPDF_Extractor_Strategy_Glyph(); $extractor = new \SetaPDF_Extractor($document); $extractor->setStrategy($glyphStrategy);
You can get the result by this strategy by calling the getResultByPageNumber()
method for each individual page. Each glyph will be represented by an instance of SetaPDF_Extractor_Result_Glyph
which implements both the SetaPDF_Extractor_Result_CompareableInterface
and SetaPDF_Extractor_Result_HasBoundsInterface
interfaces.
The strategy allows you to pass a filter instance to limit the result e.g. by a specific area on a page.