PDF Portfolios (aka PDF Packages or Collections)
Table of Contents
Introduction
Since PDF 1.4 it is possible to embed external files in the body of a PDF document and link them through e.g. file attachment annotations or through the embedded files name tree.
In PDF 1.7 a new feature was introduced which allows an enhanced presentation of file attachments stored in a PDF document. It may specify how a conforming reader application should present the file attachments. The PDF specification named such presentation "portable collection" or more general "Collections". Sadly none of these terms made it into any viewer or creator application. Acrobat 8 for example called a file that makes use of collections a PDF Package while it was called PDF Portfolio in Acrobat 9. PDF Portfolios in Acrobat 9 were also enriched with a compiled ActionScript program.
Other reader and creator applications also use the term PDF Porfolio when it comes to Collections. We will use this term in the documentation as well while our code makes use of the more PDF specification related terms.
The SetaPDF-Merger component allows you to create and interact with PDF Portfolios in a very intuitive way.
Create a Collection Instance
A PDF Portfolio starts with a PDF document that represents the container. This document could display e.g. a message that a conforming reader application is needed to display PDF Portfolios (it is also called cover sheet in some applications). It can be an existing PDF document or a completely new document.
The SetaPDF_Merger_Collection
class is the main class to use if you start to handle PDF Portfolios.
It requires a document instance in its constructor which represents such container PDF or, in case you want to edit an existing PDF Portfolio, the loaded document instance:
$document = \SetaPDF_Core_Document::load(...); $collection = new \SetaPDF_Merger_Collection($document);
To simply check if a document is a PDF Portfolio, you can use the isCollection() method:
$isCollection = $collection->isCollection();
The class will set the appropriate entries in the document structure automatically if you at least add a file or folder. To force the creation of a PDF Portfolio a simple call to getDictionary(true) is needed (not needed if you plan to add files or folders). Following example creates a simple cover sheet and defining that the document is a PDF Portfolio:
Files
Add Files
A PDF Portfolio isn't restricted to PDF files but you can add any file type. Adding files is straight forward by using the addFile() method:
Description
Add a file to the collection.
Parameters
- $pathOrReader : SetaPDF_Core_Reader_ReaderInterface|string
A reader instance or a path to a file.
- $filename : string
The filename in UTF-8 encoding.
- $description : null|string
The description of the file in UTF-8 encoding.
- $fileStreamParams : array
See
SetaPDF_Core_EmbeddedFileStream::setParams()
method.- $mimeType : null|string
The subtype of the embedded file. Shall conform to the MIME media type names defined in Internet RFC 2046
- $collectionItem : null|array|SetaPDF_Merger_Collection_Item
The data described by the collection schema.
Return Values
The name that was used to register the file specification in the embedded files name tree.
Exceptions
Throws SetaPDF_Core_DataStructure_Tree_KeyAlreadyExistsException
Throws SetaPDF_Core_SecHandler_Exception
Throws SetaPDF_Core_Type_Exception
Following example adds an existing PDF file from a local path and a dynamically created text file:
If you pass files through a reader instance as shown with the text file in the previous example you may add additional parameters for the generated embedded file stream. This is possible by the $fileStreamParams parameter or by resolving the file specification by the returned name:
Get Files
PDF Portfolios use the files attached to a PDF document in the global embedded files name tree. The collection class offers a proxy method, which will return all embedded file specifications. Their names are the keys of the returned array:
A single file specification can be resolved by its name with the getFile()
method.
Delete Files
As all files in a PDF Portfolio are located in the global embedded files name tree, the collection instance offers a proxy method deleteFile()
which proxies SetaPDF_Core_Document_Catalog_Names_EmbeddedFiles::remove()
.
$collection->deleteFile('registered-filename.pdf'); // is the same as calling $document->getCatalog() ->getNames() ->getEmbeddedFiles() ->remove('registered-filename.pdf');
The filename is the name with which the file specification is registered in the embedded files name tree in the PDF document. It doesn't need to be identically to the filename of the embedded file itself.
Folders
Folders in PDF Portfolios are an extension to the PDF specification (ExtensionLevel 3 by Adobe) and also land up in PDF 2.0.
With folders you can organize files into a hierachical tree structure.
The collection instance offers a simple method which allows you to check if folders are in use or not:
$hasFolders = $collection->hasFolders();
Folders are represented by the SetaPDF_Merger_Collection_Folder
class.
Add Folders
Folders can be added through the addFolder()
method of the collection instance or with the addFolder()
method of a folder instance.
Internally the addFolder()
method of the collection instance just ensures that a root folder exists (through getRootFolder()
method) and forwards the call to its addFolder()
method.
Add Files
A folder instance also offers an addFile()
method with the same signature as the collection instance. So adding files to folders is straight forward:
Get Files And Subfolders
To get all files located in a folder, just use the getFiles()
method of the folder instance.
Subfolders can be received using the getSubfolders()
method while you can check for their existence with the hasSubfolders()
method.
Following example will show all files and folders in a PDF Portfolio (without sorting):
As shown in the previous example you can also use the getFile()
method to resolve a single file specification by its name in a folder.
To access a subfolder by its name just use the getSubfolder()
method.
Move a Folder
Moving a folder is done by calling its setParent()
method:
The Collection Schema
A PDF Portfolio can be presented in a table view with individual fields. By default a reader application will use the standard fields available in a file specification.
By using a schema it is possible to define all fields and their types individually. A schema can reference standard file-related fields such as the filename or its description but also allows you to define completely individual fields. These fields refer to data in a collection item which dictionary can be assigned to a file specification or its instance to a folder instance.
Defining a schema is done through the SetaPDF_Merger_Collection_Schema
instance, which can be resolved that easy:
$schema = $collection->getSchema();
The schema class offers various method which allows you to interact with the schema and their fields:
addField()
Add a field to the schema.
addFields()
Adds several fields to the schema.
getCollection()
Get the collection instance.
getField()
Get a field instance by its name.
getFields()
Get all field instances.
hasField()
Check if a field exists.
removeField()
Remove a field from the schema.
A field is represented by a SetaPDF_Merger_Collection_Schema_Field
instance. Most of the above methods allow you to just pass strings and constants while the field instances were created internally. Following example shows some ways to create fields with default or individual data:
As you may have noticed the constants prefixed with DATA_* refer to data available by default fields of a file specification or folder. The constants prefixed with TYPE_* define a data type. All available constants are:
Constant defining the compressed size property
Constant defining the creation date property
Constant defining the description property
Constant defining the file name property
Constant defining the modification date property
Constant defining the size property
Constant defining a date data type
Constant defining a number type
Constant defining a string data type (value needs to be in PdfDocEncoding or UTF-16BE)
Collection Items
Collection items are used to assign data described by the collection schema for a particular file or folder. The data or a collection item instance can be passed as the $collectionItem
parameter in the addFile()
or addFolder()
method of both the collection or a folder instance.
A collection item instance is a wrapper class around the collection item dictionary and optionally validates the data against a given collection schema:
// create a collection item $collectionItem = new SetaPDF_Merger_Collection_Item(); // add the company value $collectionItem->setEntry('company', SetaPDF_Core_Encoding::toPdfString('tektown'), $schema); // ignore the schema $collectionItem->setEntry( 'secret', 'value', SetaPDF_Merger_Collection_Schema::TYPE_STRING ); // add several entries $collectionItem->setData([ 'company' => SetaPDF_Core_Encoding::toPdfString('lenstown'), 'order' => 5 ], $schema);
If you set the collection item data through the addFile()
or addFolder()
methods you can pass an instance of a collection item or an array, which will be forwarded to the setData()
method of a newly created instance.
Notice that string values needs to be passed in PdfDocEncoding or UTF-16BE. You can use the SetaPDF_Core_Encoding::toPdfString()
method to convert it from your local encoding.
Various Settings
Initial View and Document
A PDF Portfolio can be viewed in different modes. The initial mode can be defined by using the setView()
method of the collection instance:
Description
Set the initial view.
Parameters
- $view : string
A view constant.
Exceptions
Throws SetaPDF_Core_SecHandler_Exception
Throws SetaPDF_Core_Type_Exception
See
The initial document that should be presented can be set using the setInitialDocument()
:
Description
Set the name of the document, that should be initially presented.
If you want to open a document, that is located in a subfolder, you will need to pass the id of the subfolder as a prefix to the name:
$collection->setInitialDocument('<' . $folder->getId() . '>' . $name);
Parameters
- $name : string
Exceptions
Throws SetaPDF_Core_SecHandler_Exception
Throws SetaPDF_Core_Type_Exception
Configure the Splitter Bar
The splitter bar can be configured through following methods of the collection instance:
getSplitterDirection()
Get the orientation of the splitter bar.
getSplitterPosition()
Get the initial position of the splitter bar.
setSplitterDirection()
Set the orientation of the splitter bar.
setSplitterPosition()
Set the initial position of the splitter bar.
Sorting
The sorting can be defined by using the setSort()
method:
Description
Set the data that specifies the order in which the collection shall be sorted in the user interface.
Parameters
- $sort : array
The key is the field name, while the value defines the direction. Valid key names are field names defined in the schema or
SetaPDF_Merger_Collection_Schema::DATA_*
constants.
Exceptions
Throws SetaPDF_Core_SecHandler_Exception
Throws SetaPDF_Core_Type_Exception