Metadata Reading and writing of document information and XMP metadata

Introduction

Metadata, such as the document's title, author and modification and creation dates may be included in a PDF document. These general information is intended to assist in cataloguing and searching for documents in external databases. 

These information are stored in the documents information dictionary and/or the document metadata stream in XML (XMP).

Get an Instance of the Info Object

The SetaPDF-Core component offers an easy access to manage these information through the SetaPDF_Core_Document_Info class. An instance can be created from a document instance by its getInfo() method:

PHP
$document = new \SetaPDF_Core_Document();
$info = $document->getInfo();

Set and Get Common Entries

The SetaPDF_Core_Document_Info class offers getter and setter methods for for all common entries, specified in the PDF specification:

__construct()

The constructor.

cleanUp()

Release memory.

getAuthor()

Get the name of the person who created the document.

getCreationDate()

Get the date and time the document was created.

getCreator()

Get the name of the product that created the original document from which it was converted.

getDictionary()

Get and/or creates the info dictionary.

getKeywords()

Get keywords associated with the document.

getModDate()

Get the date and time the document was most recently modified.

getProducer()

Get the name of the product that converted the original document to PDF.

getSubject()

Get the subject of the document.

getTitle()

Get the document's title.

getTrapped()

Get information whether the document has been modified to include trapping information.

getXmp()

Get the XMP helper instance.

setAuthor()

Set the name of the person who created the document.

setCreationDate()

Set the date and time the document was created.

setCreator()

Set the name of the product that created the original document from which it was converted.

setKeywords()

Set keywords associated with the document.

setModDate()

Set the date and time the document was most recently modified.

setProducer()

Set the name of the product that converted the original document to PDF.

setSubject()

Set the subject of the document.

setTitle()

Set the document's title.

setTrapped()

Set information whether the document has been modified to include trapping information.

PHP
// get and set a string value in default encoding
$title = $info->getTitle();
$info->setTitle('NEW: ' . $title);

// get the modification date
$modDate = $info->getModDate(false);
if ($modDate !== null) { // it's not a mandatory entry
    echo $modDate->getAsDateTime()->format('d.m.Y H:m');
}

// set a date value by a DateTime instance
$info->setCreationDate(new DateTime('tomorrow'));

// set the trapped entry
$info->setTrapped(SetaPDF_Core_Document_Info::TRAPPED_TRUE);

Set and Get Custom Entries

It is possible to add individual metadata to a document, too. The values are stored in the document information dictionary as well and shall be text strings:

getAllCustomMetadata()

Get all custom metadata.

getCustomMetadata()

Get a custom metadata value.

setCustomMetadata()

Set custom metadata.

PHP
// set a custom entry in default UTF-8 encoding
$info->setCustomMetadata('SetaDocumentId', '1234');

// get a custom entry in a specific encoding
$setaDocumentId = $info->getCustomMetadata('SetaDocumentId', 'UTF-16BE');

// get all custom entries
$allCustomMetadata = $info->getAllCustomMetadata();

Get and Set Several Values at a Time

To get and set all metadata at a time the class offers the getAll() and setAll() methods which allows you to received an pass PHP arrays:

getAll()

Get all data from the info dictionary.

getAllCustomMetadata()

Get all custom metadata.

setAll()

Set all data via an array parameter.

PHP
$metadata = $info->getAll();
$metadata['Title'] = 'New Title';
$info->setAll($metadata);

The Document Metadata Stream

The contents of the document metadata stream shall be the metadata represented in Extensible Markup Language (XML).

The format of the XML package is defined as part of the Extensible Metadata Platform (XMP) framework.

You can get access to a DOMDocument instance of the XMP metadata package this way:

PHP
$metadata = $info->getMetadata();

If you want to access the raw stream of the document metadata, you can access it through the document catalog instance:

PHP
$metadata = $document->getCatalog()->getMetadata();

Synchronize Document Information with Metadata Stream

The SetaPDF-Core component allows to automatically synchronize the changes on the document information data with the XMP data in the document metadata stream.

This feature has to be enabled and after all changes were done the data have to be synced by a simple method call: 

PHP
// enable the synchronization
$info->setSyncMetadata(true);

// change some metadata
$info->setTitle('New Title');
$info->setCustomMetadata('Client Name', 'John Doe');

// pass the new XMP package to the metadata stream
$info->syncMetadata();

Add or Update the XMP Data Manually

To add individual data to the XMP package the updateXmp() method could be used:

updateXmp()

Updates a single field in the XMP package.

PHP
// Copyrights
$info->updateXmp('http://ns.adobe.com/xap/1.0/rights/', 'Marked', 'True');
$info->updateXmp('http://purl.org/dc/elements/1.1/', 'rights', "Test\nNotice");
$info->updateXmp('http://ns.adobe.com/xap/1.0/rights/', 'WebStatement', 'http://www.setasign.com');

// Author Title
$info->updateXmp('http://ns.adobe.com/photoshop/1.0/', 'AuthorsPosition', 'Dr. Author Title');

// Description Writer
$info->updateXmp('http://ns.adobe.com/photoshop/1.0/', 'CaptionWriter', 'The description of the writer');

// Write the XMP package to the document metadata stream
$info->syncMetadata();