SetaPDF-Extractor

SetaPDF-Extractor Manual

Setasign GmbH & Co. KG
  • Manuals
  • SetaPDF-Extractor

Index

  1. Getting Started
  2. The Main Class
  3. Strategies
  4. Filter
  5. Hints
  6. Migrating
  7. API Reference
Getting Started

SetaPDF-Extractor Component Introduction and Index

The SetaPDF-Extractor component allows PHP developers to extract textual content from existing PDF documents. Beside extracting text it is also possible to extract glyphs, words, or word groups and their positions and bounding boxes.

Like all SetaPDF components, it is based on the Core component. This manual describes only the high-level functionalities related to this component and may refer to the manual of the Core component.

  1. Getting Started
    1. System Requirements and Installation
    2. Limitations
    3. Loading the Component
    4. Error Handling
  2. The Main Class
    1. Introduction
    2. Get an Instance
    3. Setting a Strategy
    4. Getting Results
    5. Process a Result Several Times
  3. Strategies
    1. Overview
    2. Result Types
      1. Words Result
    3. Encoding
    4. Sorters
    5. Plain Text Strategy
      1. Introduction
      2. Process
      3. Usage
    6. Exact Plain Text Strategy
      1. Introduction
      2. Process
      3. Usage
    7. Glyph Strategy
      1. Introduction
      2. Process
      3. Usage
    8. Word Strategy
      1. Introduction
      2. Process
      3. Usage
    9. Word Group Strategy
      1. Introduction
      2. Process
      3. Usage
  4. Filter
    1. Introduction
    2. Predefined Filter Classes
      1. Rectangle
      2. Font Size
      3. Multi
    3. Individual Filter
    4. Filter Text Items Several Times
  5. Hints
    1. Individual Glyph Names
  6. Migrating
    1. From Version 2.46 to >=2.47 (Namespaces)
  7. API Reference
Getting Started
© 2025 Setasign GmbH & Co. KG · Contact / Imprint · Data Privacy Statement (German)