Tagged PDFs Handling of tagged PDF form fields

Introduction

The SetaPDF-FormFiller compoment supports handling of tagged PDFs as of version 2.51.

Tagged PDF files, or also known as "accessible PDF files" or "PDF files with tags" include structural information to enhance their accessibility for individuals with disabilities. While the PDF specification itself defines "Logical Structures" and "Tagged PDF", separate standards evolved to define how to create accessible PDF document in the real world. These standards are known as ISO 14289-1 / PDF/UA-1 and ISO 14289-2 / PDF/UA-2 (PDF 2.0). The original PDF documents needs to conform to such standard if you want them to be processed by the SetaPDF-FormFiller on that level.

If the PDF form is only filled by the SetaPDF-FormFiller, neither the structure tree or anything else in view to the tagging structure is touched or modified. This only happens in case a form field is flattened or deleted.

Flatten Tagged Form Fields

If a PDF form field is flattened to the pages content stream its widget annotation appearance is transformed into an isolated form XObject which drawing operators are added to the pages content stream. Finally the widget annotation is deleted.

If the widget annotation is tagged appropriately the SetaPDF-FormFiller component will mark the drawing operation in the pages content stream and replace its origin object reference dictionary in the tree structure with a marked-content reference dictionary pointing to the new marked-content sequence.

If the structure element is tagged as a Form the component will add so called PrintField attributes to the structure element which will hold the information about the field role, values (e.g. the state of a check box/radio button group) and a description (resolved by the TU entry of the origin field).

How to Adjust the Tagging Behavior

We implemented the whole process in a very open and low-level way, so that it can be extended or modified to match any specific requirements. For this we introduced the FlattenTaggingHandlerInterface and an AbstractFlattenTaggingHandler. Based on these the default handler TagInPagesContentStreamHandler was created.

The default handler already comes with a callback attribute, that allows you to access or modify the parent structure element during the flattening process in its constructor:

Description
Parameters
$callback: ?callable(
\setasign\SetaPDF2\FormFiller\Field\AbstractField $field, \setasign\SetaPDF2\Core\Type\IndirectObjectInterface $parentObject
): bool

A callback that allows you to access/modify the parent structure element during the flattening process. If it returns true, the internal _updateAttributes() method is executed. If false, it is not executed.

Let's say you want to add an Alt entry to the structure element, you can do it that way:

PHP
$callback = static function(AbstractField $field, IndirectObjectInterface $parentObject) {
    $parentDict = PdfDictionary::ensureType($parentObject);
    $parentTagName = PdfName::ensureType(DictionaryHelper::getValue($parentDict, 'S'));
    
    $parentDict['Alt'] = new PdfString(
        Encoding::toPdfString(
            'The origin for field name was: ' . $field->getQualifiedName()
        )
    );
};

$flatteningTaggingHandler = new TagInPagesContentStreamHandler($callback);

$field->setFlatteningTaggingHandler($flatteningTaggingHandler);

Another way to get control over the whole process would be an individual implementation of the FlattenTaggingHandlerInterface.

Flatten Form Fields as Figures

Sometimes a form field is used as a kind of placeholder for an individual appearance such as an image. Such elements may be tagged as a Figure. To complete this structure element with e.g. a BBox value in its Layout attribute, the component comes with a TagAsFigureHandler which extends the default TagInPagesContentStreamHandler.

Following a snippet that shows you the usage of the TagAsFigureHandler with a callback that dynamically updates the Alt entry of the Figure structure element:

PHP
$flatteningTaggingHandler = new TagAsFigureHandler(
    function(AbstractField $field, IndirectObjectInterface $parentObject) {
        $parentDict = PdfDictionary::ensureType($parentObject);
        $parentTagName = DictionaryHelper::getValue($parentDict, 'S', null, true);
        if ($parentTagName !== 'Figure') {
            return;
        }

        $replacements = [
            '{CompanyName}' => 'tektown',
            //...
        ];

        // can be used for individual logic
        $fieldName = $field->getQualifiedName();

        $alt = DictionaryHelper::getValue(
            $parentDict, 
            'Alt', 
            new PdfString('Image in field ' . $fieldName)
        );
        
        $alt = Encoding::convertPdfString(
            AbstractType::ensureWithType(PdfStringInterface::class, $alt)->getValue()
        );

        $alt = \str_replace(
            \array_keys($replacements), \array_values($replacements), $alt
        );

        $parentDict['Alt'] = new PdfString(Encoding::toPdfString($alt));

        return true;
    }
);
$field->setFlatteningTaggingHandler($flatteningTaggingHandler);

Delete Tagged Form Fields

If a form field is deleted, the component will search its object reference dictionary (OBJR) in the tag structure and simply remove it from the K entry of its structure element.

The structure element is currently left and not modified any further, which may end in non-conforming PDF/UA documents.