PDF Portfolios (aka PDF Packages or Collections)

Introduction

Since PDF 1.4 it is possible to embed external files in the body of a PDF document and link them through e.g. file attachment annotations or through the embedded files name tree.

In PDF 1.7 a new feature was introduced which allows an enhanced presentation of file attachments stored in a PDF document. It may specify how a conforming reader application should present the file attachments. The PDF specification named such presentation "portable collection" or more general "Collections". Sadly none of these terms made it into any viewer or creator application. Acrobat 8 for example called a file that makes use of collections a PDF Package while it was called PDF Portfolio in Acrobat 9. PDF Portfolios in Acrobat 9 were also enriched with a compiled ActionScript program.

Other reader and creator applications also use the term PDF Porfolio when it comes to Collections. We will use this term in the documentation as well while our code makes use of the more PDF specification related terms. 

The SetaPDF-Merger component allows you to create and interact with PDF Portfolios in a very intuitive way.

Create a Collection Instance

A PDF Portfolio starts with a PDF document that represents the container. This document could display e.g. a message that a conforming reader application is needed to display PDF Portfolios (it is also called cover sheet in some applications). It can be an existing PDF document or a completely new document.

The SetaPDF_Merger_Collection class is the main class to use if you start to handle PDF Portfolios.

It requires a document instance in its constructor which represents such container PDF or, in case you want to edit an existing PDF Portfolio, the loaded document instance:

PHP
$document = \SetaPDF_Core_Document::load(...);
$collection = new \SetaPDF_Merger_Collection($document);

To simply check if a document is a PDF Portfolio, you can use the isCollection() method:

PHP
$isCollection = $collection->isCollection();

The class will set the appropriate entries in the document structure automatically if you at least add a file or folder. To force the creation of a PDF Portfolio a simple call to getDictionary(true) is needed (not needed if you plan to add files or folders). Following example creates a simple cover sheet and defining that the document is a PDF Portfolio:

PHP
<?php
require_once('library/SetaPDF/Autoload.php');

// create a document
$writer = new \SetaPDF_Core_Writer_Http('empty-portfolio.pdf', true);
$document = new \SetaPDF_Core_Document($writer);

// create the cover sheet
$page = $document->getCatalog()->getPages()->create(\SetaPDF_Core_PageFormats::A4);
$canvas = $page->getCanvas();
$font = \SetaPDF_Core_Font_Standard_Helvetica::create($document);
$text = new \SetaPDF_Core_Text_Block($font,16);
$text->setAlign(\SetaPDF_Core_Text::ALIGN_CENTER);
$text->setText('For the best experience, open this PDF portfolio in ...');
$text->setWidth($page->getWidth());
$text->draw($canvas, 0, $page->getHeight() - 100);

// create Collection entry in the documents catalog
$collection = new \SetaPDF_Merger_Collection($document);
$collection->getDictionary(true);

// save and finish
$document->save()->finish();

Files

Add Files

A PDF Portfolio isn't restricted to PDF files but you can add any file type. Adding files is straight forward by using the addFile() method:

Description
public SetaPDF_Merger_Collection::addFile (
SetaPDF_Core_Reader_ReaderInterface|string $pathOrReader, string $filename [, null|string $description = null [, array $fileStreamParams = array ( ) [, null|string $mimeType = null [, null|array|SetaPDF_Merger_Collection_Item $collectionItem = null ]]]]
): string

Add a file to the collection.

Parameters
$pathOrReader : SetaPDF_Core_Reader_ReaderInterface|string

A reader instance or a path to a file.

$filename : string

The filename in UTF-8 encoding.

$description : null|string

The description of the file in UTF-8 encoding.

$fileStreamParams : array

See SetaPDF_Core_EmbeddedFileStream::setParams() method.

$mimeType : null|string

The subtype of the embedded file. Shall conform to the MIME media type names defined in Internet RFC 2046

$collectionItem : null|array|SetaPDF_Merger_Collection_Item

The data described by the collection schema.

Return Values

The name that was used to register the file specification in the embedded files name tree.

Following example adds an existing PDF file from a local path and a dynamically created text file:

PHP
<?php
require_once('library/SetaPDF/Autoload.php');

// create a document as the cover sheet
$writer = new \SetaPDF_Core_Writer_Http('simple-portfolio.pdf', true);
$document = new \SetaPDF_Core_Document($writer);
$document->getCatalog()->getPages()->create(\SetaPDF_Core_PageFormats::A4);
// we leave it empty for demonstration purpose...

// create a collection instance
$collection = new \SetaPDF_Merger_Collection($document);

// add a file through a local path
$collection->addFile(
    'files/pdfs/tektown/Laboratory-Report.pdf',
    'Laboratory-Report.pdf',
    'Description of Laboratory-Report.pdf'
);

// add a dynamically created text file
$textFile = 'A simple text content';
$collection->addFile(
    new \SetaPDF_Core_Reader_String($textFile),
    'text-file.txt',
    'The description of the text file.'
);

// save and finish
$document->save()->finish();

If you pass files through a reader instance as shown with the text file in the previous example you may add additional parameters for the generated embedded file stream. This is possible by the $fileStreamParams parameter or by resolving the file specification by the returned name:

PHP
<?php
require_once('library/SetaPDF/Autoload.php');

// create a document as the cover sheet
$writer = new \SetaPDF_Core_Writer_Http('dynamic-portfolio.pdf', true);
$document = new \SetaPDF_Core_Document($writer);
$document->getCatalog()->getPages()->create(\SetaPDF_Core_PageFormats::A4);
// we leave it empty for demonstration purpose...

// create a collection instance
$collection = new \SetaPDF_Merger_Collection($document);

// add a dynamically created text file
$textFile = 'A simple text content';
$collection->addFile(
    new \SetaPDF_Core_Reader_String($textFile),
    'text-file.txt',
    'The description of the text file.',
    [
        // an optional check sum
        \SetaPDF_Core_EmbeddedFileStream::PARAM_CHECK_SUM => md5($textFile, true),
        // modification and creation date are default columns and set automatically
        // to the current date time. If you want to define them manually:
        \SetaPDF_Core_EmbeddedFileStream::PARAM_MODIFICATION_DATE => new DateTime('yesterday'),
        \SetaPDF_Core_EmbeddedFileStream::PARAM_CREATION_DATE => new DateTime('-1 week')
    ]
);

// add another dynamically created text file
$textFile = 'Another simple text content';
$name = $collection->addFile(
    new \SetaPDF_Core_Reader_String($textFile),
    'another-text-file.txt',
    'The description of the other text file.'
);
// get the file specification by its name
$fileSpecification = $collection->getFile($name);

// get the embedded file stream and add additional parameters
$fileSpecification->getEmbeddedFileStream()->setParams([
    \SetaPDF_Core_EmbeddedFileStream::PARAM_CHECK_SUM => md5($textFile, true),
    \SetaPDF_Core_EmbeddedFileStream::PARAM_MODIFICATION_DATE => new DateTime('yesterday'),
    \SetaPDF_Core_EmbeddedFileStream::PARAM_CREATION_DATE => new DateTime('last Wednesday')
], false);

// save and finish
$document->save()->finish();

Get Files

PDF Portfolios use the files attached to a PDF document in the global embedded files name tree. The collection class  offers a proxy method, which will return all embedded file specifications. Their names are the keys of the returned array: 

PHP
<?php
require_once('library/SetaPDF/Autoload.php');

// create a document
$document = \SetaPDF_Core_Document::loadByFilename('files/pdfs/tektown/products/All-Portfolio.pdf');

// get the collection instance
$collection = new \SetaPDF_Merger_Collection($document);

// get all files
$files = $collection->getFiles();

// extract the file
if (isset($_GET['f']) && isset($files[$_GET['f']])) {
    $file = $files[$_GET['f']];
    if ($file instanceof \SetaPDF_Core_FileSpecification) {
        // resolve the filename
        $filename = $file->getFileSpecification();
        // resolve the file stream
        $embeddedFileStream = $file->getEmbeddedFileStream();

        // get the content type
        $contentType = $embeddedFileStream->getMimeType();
        // or set a default content type
        if ($contentType === null) {
            $contentType = 'application/force-download';
        }

        // pass the file to the client
        $stream = $embeddedFileStream->getStream();
        header('Content-Type: ' . $contentType);
        header('Content-Disposition: attachment; filename="' . $filename . '";');
        header('Content-Transfer-Encoding: binary');
        header('Content-Length: ' . strlen($stream));
        echo $stream;
        die();
    }
}

foreach ($files AS $name => $file) {
    $filename = $file->getFileSpecification();
    echo '<a href="?f=' . urlencode($name) . '">' . htmlspecialchars($filename) . '</a><br />';
}

A single file specification can be resolved by its name with the getFile() method.

Delete Files

As all files in a PDF Portfolio are located in the global embedded files name tree, the collection instance offers a proxy method deleteFile() which proxies SetaPDF_Core_Document_Catalog_Names_EmbeddedFiles::remove().

PHP
$collection->deleteFile('registered-filename.pdf');

// is the same as calling

$document->getCatalog()
    ->getNames()
    ->getEmbeddedFiles()
    ->remove('registered-filename.pdf');

The filename is the name with which the file specification is registered in the embedded files name tree in the PDF document. It doesn't need to be identically to the filename of the embedded file itself.

Folders

Folders in PDF Portfolios are an extension to the PDF specification (ExtensionLevel 3 by Adobe) and also land up in PDF 2.0. 

With folders you can organize files into a hierachical tree structure.

The collection instance offers a simple method which allows you to check if folders are in use or not:

PHP
$hasFolders = $collection->hasFolders();

Folders are represented by the SetaPDF_Merger_Collection_Folder class.

Add Folders

Folders can be added through the addFolder() method of the collection instance or with the addFolder() method of a folder instance.

Internally the addFolder() method of the collection instance just ensures that a root folder exists (through getRootFolder() method) and forwards the call to its addFolder() method.

PHP
<?php
require_once('library/SetaPDF/Autoload.php');

// create a document as the cover sheet
$writer = new \SetaPDF_Core_Writer_Http('portfolio-with-folders.pdf', true);
$document = new \SetaPDF_Core_Document($writer);
$document->getCatalog()->getPages()->create(\SetaPDF_Core_PageFormats::A4);
// we leave it empty for demonstration purpose...

// create a collection instance
$collection = new \SetaPDF_Merger_Collection($document);

// thorugh the proxy method
$folderA = $collection->addFolder('Folder (A)');
// add more sub folders
$folderA->addFolder('Folder (AA)');
$folderA->addFolder('Folder (AB)')->addFolder('Folder (ABA)');
$folderA->addFolder('Folder (AC)')->addFolder('Folder (ACA)');

// through the root folder
$rootFolder = $collection->getRootFolder();
$folderB = $rootFolder->addFolder('Folder (B)');
// add more sub folders
$folderB->addFolder('Folder (BA)')->addFolder('Folder (BAA)');
$folderB->addFolder('Folder (BB)');
$folderB->addFolder('Folder (BC)');

// save and finish
$document->save()->finish();

Add Files

A folder instance also offers an addFile() method with the same signature as the collection instance. So adding files to folders is straight forward:

PHP
<?php
require_once('library/SetaPDF/Autoload.php');

// create a document as the cover sheet
$writer = new \SetaPDF_Core_Writer_Http('portfolio-with-folders-and-files.pdf', true);
$document = new \SetaPDF_Core_Document($writer);
$document->getCatalog()->getPages()->create(\SetaPDF_Core_PageFormats::A4);
// we leave it empty for demonstration purpose...

// create a collection instance
$collection = new \SetaPDF_Merger_Collection($document);

// thorugh the proxy method
$folderA = $collection->addFolder('tektown');
$folderA->addFile(
    'files/pdfs/tektown/Laboratory-Report.pdf',
    'Laboratory-Report.pdf'
);
$folderA->addFile(
    'files/pdfs/tektown/Terms-and-Conditions.pdf',
    'Terms-and-Conditions.pdf'
);

$folderB = $collection->addFolder('camtown');
$folderB->addFile(
    'files/pdfs/camtown/Laboratory-Report.pdf',
    'Laboratory-Report.pdf'
);
$folderB->addFile(
    'files/pdfs/camtown/Terms-and-Conditions.pdf',
    'Terms-and-Conditions.pdf'
);


// save and finish
$document->save()->finish();

Get Files And Subfolders

To get all files located in a folder, just use the getFiles() method of the folder instance.

Subfolders can be received using the getSubfolders() method while you can check for their existence with the hasSubfolders() method.  

Following example will show all files and folders in a PDF Portfolio (without sorting):

PHP
<?php
require_once('library/SetaPDF/Autoload.php');

// create a document
$document = \SetaPDF_Core_Document::loadByFilename('files/pdfs/Logos-Portfolio.pdf');

// get the collection instance
$collection = new \SetaPDF_Merger_Collection($document);

// extract the file
if (isset($_GET['f'])) {
    // get the file specification
    $file = $collection->getFile($_GET['f']);
    if ($file instanceof \SetaPDF_Core_FileSpecification) {
        // resolve the filename
        $filename = $file->getFileSpecification();
        // resolve the file stream
        $embeddedFileStream = $file->getEmbeddedFileStream();

        // we force a content type
        $contentType = 'application/force-download';

        // pass the file to the client
        $stream = $embeddedFileStream->getStream();
        header('Content-Type: ' . $contentType);
        header('Content-Disposition: attachment; filename="' . $filename . '";');
        header('Content-Transfer-Encoding: binary');
        header('Content-Length: ' . strlen($stream));
        echo $stream;
        die();
    }
}

// function which is called recursively to print out all folders and files
function printFolder(\SetaPDF_Merger_Collection_Folder $folder, $level = 0) {
    $files = $folder->getFiles();

    echo str_repeat('&nbsp', $level++ * 4);
    echo $folder->getName() . '/<br />';
    foreach ($files AS $name => $file) {
        $filename = $file->getFileSpecification();
        echo str_repeat('&nbsp', $level * 4);
        echo '<a href="?f=' . urlencode($name) . '">' . htmlspecialchars($filename) . '</a><br />';
    }

    // get sub folders and print them out, too
    foreach ($folder->getSubfolders() AS $subFolder) {
        printFolder($subFolder, $level);
    }
}

printFolder($collection->getRootFolder());

As shown in the previous example you can also use the getFile() method to resolve a single file specification by its name in a folder.

To access a subfolder by its name just use the getSubfolder() method.

Move a Folder

Moving a folder is done by calling its setParent() method:

PHP
<?php
require_once('library/SetaPDF/Autoload.php');

// create a document as the cover sheet
$writer = new \SetaPDF_Core_Writer_Http('moved-folders.pdf', true);
// create a document
$document = \SetaPDF_Core_Document::loadByFilename('files/pdfs/Logos-Portfolio.pdf', $writer);

// get the collection instance
$collection = new \SetaPDF_Merger_Collection($document);
$rootFolder = $collection->getRootFolder();

// get all existing folders
$camtown = $rootFolder->getSubfolder('camtown');
$etown = $rootFolder->getSubfolder('etown');
$lentown = $rootFolder->getSubfolder('lenstown');
$tektown = $rootFolder->getSubfolder('tektown');

// create a new folder
$newFolder = $rootFolder->addFolder('New Folder');

// move all folders to this new folder
$camtown->setParent($newFolder);
$etown->setParent($newFolder);
$lentown->setParent($newFolder);
$tektown->setParent($newFolder);

$document->save()->finish();

The Collection Schema

A PDF Portfolio can be presented in a table view with individual fields. By default a reader application will use the standard fields available in a file specification.

By using a schema it is possible to define all fields and their types individually. A schema can reference standard file-related fields such as the filename or its description but also allows you to define completely individual fields. These fields refer to data in a collection item which dictionary can be assigned to a file specification or its instance to a folder instance.

Defining a schema is done through the SetaPDF_Merger_Collection_Schema instance, which can be resolved that easy:

PHP
$schema = $collection->getSchema();

The schema class offers various method which allows you to interact with the schema and their fields:

addField()

Add a field to the schema.

addFields()

Adds several fields to the schema.

getCollection()

Get the collection instance.

getField()

Get a field instance by its name.

getFields()

Get all field instances.

hasField()

Check if a field exists.

removeField()

Remove a field from the schema.

A field is represented by a SetaPDF_Merger_Collection_Schema_Field instance. Most of the above methods allow you to just pass strings and constants while the field instances were created internally. Following example shows some ways to create fields with default or individual data:

PHP
<?php
require_once('library/SetaPDF/Autoload.php');

// create a document as the cover sheet
$writer = new \SetaPDF_Core_Writer_Http('portfolio-with-schema.pdf', true);
$document = new \SetaPDF_Core_Document($writer);
$document->getCatalog()->getPages()->create(\SetaPDF_Core_PageFormats::A4);
// we leave it empty for demonstration purpose...

// create a collection instance
$collection = new \SetaPDF_Merger_Collection($document);

// get the schema instance
$schema = $collection->getSchema();

// create a field instance manually
$filenameField = \SetaPDF_Merger_Collection_Schema_Field::create(
    'Filename', // the visible field name
    \SetaPDF_Merger_Collection_Schema::DATA_FILE_NAME // refer to the file name
);
$filenameField->setOrder(1);
// add it to the schema
$schema->addField('filename', $filenameField);

// let addField() do the field creation
$schema->addField(
    'description',
    'Description',
    \SetaPDF_Merger_Collection_Schema::DATA_DESCRIPTION,
    2
);

// let's create an individual field
$schema->addField(
    'company', 'Company Name', \SetaPDF_Merger_Collection_Schema::TYPE_STRING, 3
);

// let's create another individual field
$schema->addField(
    'order', 'Order', \SetaPDF_Merger_Collection_Schema::TYPE_NUMBER, 4
);

// for demonstration purpose, we add some files now...
$collection->addFile(
    'files/pdfs/tektown/Logo.pdf',
    'tektown-logo.pdf',
    'The logo of tektown',
    [],
    'application/pdf',
    [
        'company' => \SetaPDF_Core_Encoding::toPdfString('tektown'),
        'order'   => 1
    ]
);

$collection->addFile(
    'files/pdfs/etown/Logo.pdf',
    'etown-logo.pdf',
    'The logo of etown',
    [],
    'application/pdf',
    [
        'company' => \SetaPDF_Core_Encoding::toPdfString('etown'),
        'order'   => 2
    ]
);

$collection->addFile(
    'files/pdfs/lenstown/Logo.pdf',
    'lenstown-logo.pdf',
    'The logo of lenstown',
    [],
    'application/pdf',
    [
        'company' => \SetaPDF_Core_Encoding::toPdfString('lenstown'),
        'order'   => 3
    ]
);

// save and finish
$document->save()->finish();

As you may have noticed the constants prefixed with DATA_* refer to data available by default fields of a file specification or folder. The constants prefixed with TYPE_* define a data type. All available constants are: 

public const string SetaPDF_Merger_Collection_Schema::DATA_COMPRESSED_SIZE = 'CompressedSize'

Constant defining the compressed size property

public const string SetaPDF_Merger_Collection_Schema::DATA_CREATION_DATE = 'CreationDate'

Constant defining the creation date property

Constant defining the description property

Constant defining the file name property

Constant defining the modification date property

public const string SetaPDF_Merger_Collection_Schema::DATA_SIZE = 'Size'

Constant defining the size property

public const string SetaPDF_Merger_Collection_Schema::TYPE_DATE = 'D'

Constant defining a date data type

Constant defining a number type

Constant defining a string data type (value needs to be in PdfDocEncoding or UTF-16BE)

Collection Items

Collection items are used to assign data described by the collection schema for a particular file or folder. The data or a collection item instance can be passed as the $collectionItem parameter in the addFile() or addFolder() method of both the collection or a folder instance.

A collection item instance is a wrapper class around the collection item dictionary and optionally validates the data against a given collection schema:

PHP
// create a collection item
$collectionItem = new SetaPDF_Merger_Collection_Item();
// add the company value
$collectionItem->setEntry('company', SetaPDF_Core_Encoding::toPdfString('tektown'), $schema);

// ignore the schema
$collectionItem->setEntry(
    'secret', 'value', SetaPDF_Merger_Collection_Schema::TYPE_STRING
);

// add several entries
$collectionItem->setData([
    'company' => SetaPDF_Core_Encoding::toPdfString('lenstown'),
    'order' => 5
], $schema);

If you set the collection item data through the addFile() or addFolder() methods you can pass an instance of a collection item or an array, which will be forwarded to the setData() method of a newly created instance.

Notice that string values needs to be passed in PdfDocEncoding or UTF-16BE. You can use the SetaPDF_Core_Encoding::toPdfString() method to convert it from your local encoding. 

Various Settings

Initial View and Document

A PDF Portfolio can be viewed in different modes. The initial mode can be defined by using the setView() method of the collection instance:

Description
public SetaPDF_Merger_Collection::setView (
string $view
): void

Set the initial view.

Parameters
$view : string

A view constant.

See

The initial document that should be presented can be set using the setInitialDocument()

Description
public SetaPDF_Merger_Collection::setInitialDocument (
string $name
): void

Set the name of the document, that should be initially presented.

If you want to open a document, that is located in a subfolder, you will need to pass the id of the subfolder as a prefix to the name:

$collection->setInitialDocument('<' . $folder->getId() . '>' . $name);
Parameters
$name : string
 

Configure the Splitter Bar

The splitter bar can be configured through following methods of the collection instance:

getSplitterDirection()

Get the orientation of the splitter bar.

getSplitterPosition()

Get the initial position of the splitter bar.

setSplitterDirection()

Set the orientation of the splitter bar.

setSplitterPosition()

Set the initial position of the splitter bar.

Sorting

The sorting can be defined by using the setSort() method:

Description
public SetaPDF_Merger_Collection::setSort (
array $sort
): void

Set the data that specifies the order in which the collection shall be sorted in the user interface.

Parameters
$sort : array

The key is the field name, while the value defines the direction. Valid key names are field names defined in the schema or SetaPDF_Merger_Collection_Schema::DATA_* constants.

See