Tagged PDF

Introduction

As of version 2.49 the SetaPDF-Merger component allows you to handle tagged PDF files.

Tagged PDF files, or also known as "accessible PDF files" or "PDF files with tags" include structural information to enhance their accessibility for individuals with disabilities. While the PDF specification itself defines "Logical Structures" and "Tagged PDF" separate standards evolved to define how to create accessible PDF document in the real world. These standards are known as ISO 14289-1 / PDF/UA-1 and ISO 14289-2 / PDF/UA-2 (PDF 2.0). While the SetaPDF-Merger component is not a converter, the source PDF documents needs to already been created conforming to an expected standard.

Handling PDF files with tag-structures can be a performance intensive task as these kind of structures may consist of several thousands of objects. Because of this we suggest to not process foreign PDF files if you want to keep their tag structures.

We also suggest to use compressed cross-reference streams for these kind of structures to reduce the output-size.

Enable Handling of Tags

The handling of tags can be enabled by simply calling following method on the merger instance:

Description
public \setasign\SetaPDF2\Merger\Merger::setHandleTags (
bool $handleTags = true,
?string $subTag = 'Part'
): void

Set the flag if tag structures should be handled during the merge process.

Parameters
$handleTags : bool
 
$subTag : ?string

The default sub-tag name for each addDocument()/addFile() call. If null, all tags are put on the same level.

Exceptions

Throws \setasign\SetaPDF2\Merger\Exception

Then you can simply add the documents or PDF files through the addDocument() or addFile() methods and finally call merge() to create a well taged PDF document.

Please note that merging pages of the same document instance several times is not possible if tags are handled. Also the document instances should not be re-used after the whole process as the structure tree may had changed.

Examples

Merge Simple Tagged PDFs

The following script simply merges two tagged PDF documents. The new tag structure will create a sub-tag (Part by default) for each merged document:

PHP
<?php

use setasign\SetaPDF2\Core\Document\ObjectStreamCompressor;
use setasign\SetaPDF2\Core\Writer\HttpWriter;
use setasign\SetaPDF2\Merger\Merger;

require_once('library/SetaPDF/Autoload.php');

$merger = new Merger();

// enable handling of tags
$merger->setHandleTags();

$merger->addFile('files/pdfs/camtown/Cover.pdf');
$merger->addFile('files/pdfs/camtown/Terms-and-Conditions - Tagged.pdf');

$merger->merge();

// get the resulting document instance
$document = $merger->getDocument();

// as tag structures can be very huge, we should compress the document structure
$compressor = new ObjectStreamCompressor($document);
$compressor->register();

// add a writer
$document->setWriter(new HttpWriter('tagged-1.pdf', true));
// save and finish
$document->save()->finish();

Merge PDF/UA-1 PDF files

The following script merges two PDF/UA-1 conforming PDF files, keeps all tags on the same level and updates the result to PDF/UA-1, too:

PHP
<?php

use setasign\SetaPDF2\Core\Document\ObjectStreamCompressor;
use setasign\SetaPDF2\Core\Writer\HttpWriter;
use setasign\SetaPDF2\Core\Xmp\PdfUa;
use setasign\SetaPDF2\Merger\Merger;

require_once('library/SetaPDF/Autoload.php');

$merger = new Merger();

// enable handling of tags and do not create sub-tags per
// file but keep all tags on the same level
$merger->setHandleTags(true, null);

$merger->addFile('files/pdfs/camtown/Cover-PDF-UA1.pdf');
$merger->addFile('files/pdfs/camtown/Terms-and-Conditions - PDF-UA1.pdf');

$merger->merge();

// get the resulting document instance
$document = $merger->getDocument();

// now let's add some information required by PDF/UA-1:
$document->getInfo()->setTitle('The Cover + Terms & Conditions');
$document->getCatalog()->setLang('en');

// now update the XMP metadata for PDF/UA-1
PdfUa::update($document, 1);

// as tag structures can be very huge, we should compress the document structure
$compressor = new ObjectStreamCompressor($document);
$compressor->register();

// add a writer
$document->setWriter(new HttpWriter('pdf-ua1.pdf', true));
// save and finish
$document->save()->finish();

Split PDF/UA-2 PDF files

The following script extract a single page of a PDF/UA-2 conforming document and results in a PDF/UA-2 conforming PDF.

The document we use for demonstration purpose is an example file from the Latex project.

PHP
<?php

use setasign\SetaPDF2\Core\Document;
use setasign\SetaPDF2\Core\Document\ObjectStreamCompressor;
use setasign\SetaPDF2\Core\Writer\HttpWriter;
use setasign\SetaPDF2\Core\Xmp\PdfUa;
use setasign\SetaPDF2\Merger\Merger;

require_once('library/SetaPDF/Autoload.php');

$merger = new Merger();

// enable handling of tags and do not create sub-tags per
// file but keep all tags on the same level
$merger->setHandleTags(true, null);

$inDocument = Document::loadByFilename('files/pdfs/misc/tagged/pdfa-art.pdf');

// we only want to extract page 3
$merger->addDocument($inDocument, 3);

$merger->merge();

// get the resulting document instance
$document = $merger->getDocument();

// now let's add some information required by PDF/UA-1:
$document->getInfo()->setTitle('Page 2-5 of ' . $inDocument->getInfo()->getTitle());
$document->getCatalog()->setLang($inDocument->getCatalog()->getLang());

// now update the XMP metadata for PDF/UA-2
PdfUa::update($document, 2);

// as tag structures can be very huge, we should compress the document structure
$compressor = new ObjectStreamCompressor($document);
$compressor->register();

// add a writer
$document->setWriter(new HttpWriter('pdf-ua2.pdf', true));
// save and finish
$document->save()->finish();