The Document Class Creating and loading PDF documents

Introduction

In SetaPDF 2 a PDF document is represented by a SetaPDF_Core_Document instance.

All individual components will use such instances to work with PDF documents. An instance e.g. could be forwarded from component to component without saving a temporary document version.

Very much low level functionalities are available through a document instance as you will read in the following chapters of this manual.

Creating Instances

The simplest way to initiate an instance is done by simply doing it: 

PHP
$document = new \SetaPDF_Core_Document();

To initiate a document instance this way is done in very rare situations because most of the SetaPDF components will work with existing documents and, for sure, will need to write the resulting document to a destination which is done with a writer instance. A writer could be passed in the constructor or with a setter method:

PHP
// create a writer
$writer = new \SetaPDF_Core_Writer_File('path/to/result.pdf');

// pass it in the constructor
$document = new \SetaPDF_Core_Document($writer);

// or pass it with the setWriter() method
$document = new \SetaPDF_Core_Document();
$document->setWriter($writer);

Loading an Existing Document

The most common task for SetaPDF components is to open an existing PDF document. This could be done with the static load()-method of the SetaPDF_Core_Document class:

Description
public static SetaPDF_Core_Document::load (
SetaPDF_Core_Reader_ReaderInterface $reader [, SetaPDF_Core_Writer_WriterInterface $writer = null [, string $className = 'SetaPDF_Core_Document' ]]
): SetaPDF_Core_Document

Creates an instance of a document based on an existing PDF.

Parameters
$reader : SetaPDF_Core_Reader_ReaderInterface

A reader instance

$writer : SetaPDF_Core_Writer_WriterInterface

A writer instance

$className : string

The class name to initiate

Return Values

Returns a SetaPDF_Core_Document instance

Exceptions

Throws SetaPDF_Core_Parser_CrossReferenceTable_Exception,Exception

To open a PDF file for reading and writing following PHP code is common:   

PHP
$reader = new \SetaPDF_Core_Reader_File('path/to/original.pdf');
$writer = new \SetaPDF_Core_Writer_File('path/to/new.pdf');
$document = \SetaPDF_Core_Document::load($reader, $writer);

There are two additional helper methods, which create a reader instance internally, to initiate an instance by a filename via the loadByFilename() or by a string via the loadByString() method:

PHP
$document = \SetaPDF_Core_Document::loadByFilename('path/to/original.pdf');
// or
$document = \SetaPDF_Core_Document::loadByString(file_get_contents('path/to/original.pdf'));

If the PDF document is available in the filesystem you should use the file reader throughout! Parsing the document in memory through a string variable will need much more memory and is slower due to very much string operations.

Loading Encrypted/Protected Documents

The SetaPDF-Core component natively supports reading and writing of PDF documents which are encrypted with PDF standard (protected with an owner and optionally an user password) or public-key security.

An example of how to check for and authenticate against a security handler is explained here.

Saving a Document

A document instance can be saved by simply calling the save()-method of the SetaPDF_Core_Document instance. This will write the current document structure or changes to the writer instance: 

Description
public SetaPDF_Core_Document::save (
[ boolean|integer $method = true ]
): SetaPDF_Core_Document

Saves the document.

The PDF format offers a way to add changes to a document by simply appending the changes to the end of the file. This method is called incremental update and has the advantage that it is very fast, because only changed objects have to be written. This behavior is the default one, when calling the save()-method. Sadly it makes it easy to revert the document to the previous state by simply cutting the bytes of the last revision.

The parameter of the save()-method allows you to define that the document should be rebuild from scratch by resolving the complete object structure. Just pass SetaPDF_Core_Document::SAVE_METHOD_REWRITE to it. This task is very performance intensive, because the complete document have to be parsed, interpreted and rewritten.

Additionally, it is possible to rewrite the whole document with all available objects. The benefit of this solution is that it will keep compressed object streams intact: SetaPDF_Core_Document::SAVE_METHOD_REWRITE_ALL. The disadvantage is, that unused objects may be copied/written, too.

Parameters
$method : boolean|integer

Update or rewrite the document

Exceptions

Throws SetaPDF_Core_Document_ObjectNotDefinedException

Throws SetaPDF_Core_Document_ObjectNotFoundException

Throws SetaPDF_Core_Exception

Throws SetaPDF_Core_Parser_CrossReferenceTable_Exception

Throws SetaPDF_Core_Parser_Exception

Throws SetaPDF_Core_SecHandler_Exception

Throws SetaPDF_Core_Type_Exception

Throws SetaPDF_Core_Type_IndirectReference_Exception

Throws SetaPDF_Exception

Throws SetaPDF_Exception_NotImplemented

Throws BadMethodCallException

A very simple example: 

PHP
$writer = new SetaPDF_Core_Writer_Echo();
$document = new SetaPDF_Core_Document($writer);
$document->save();

The snipped above will output something like: 

%PDF-1.3
%????
xref
0 1
0000000000 65535 f 
trailer
<</Size 1/ID[<b6fa53ca3c8054733520750c81b0981f><b6fa53ca3c8054733520750c81b0981f>]>>
startxref
15
%%EOF

That's a raw document structure without any content. Actually it's not even a valid PDF document because of missing catalog, pages and page objects.

To create at least a valid PDF document following PHP code could be used: 

PHP
<?php
require_once('library/SetaPDF/Autoload.php');

$writer = new \SetaPDF_Core_Writer_Http('blank.pdf', true);
$document = new \SetaPDF_Core_Document($writer);

// let's create at least a single page
$pages = $document->getCatalog()->getPages();
$pages->create(\SetaPDF_Core_PageFormats::A4);

$document->save()->finish();

As you can see the demo uses not only a save() call but additionally calls the finish() method of the SetaPDF_Core_Document instance. This call will mark the document to be finished and forward this call to the writer which will send e.g. a header and the PDF content or close a file handle. For demonstration purpose the finish() call was left on the initial example - the echo writer doesn't do anything in its finish()-method. 

It is possible to call the save()-method several times. Each save() call will write a new incremental update section to the writer if changes were recognized between the save() calls.

After calling finish() on the document instance it shall not be used further! Additional method calls may result in fatal errors! Only the optional cleanUp() method can be called afterwards. 

Cross-Reference Information

The cross-reference table of a PDF document contains information to permit a random access to object positions within the file. Beginning with PDF 1.5 an alternative way for storing the cross-reference information was introduced: "Cross-Reference Streams". 

By default the SetaPDF-Core component writes the cross-reference in the standard format or in the format which is defined in the source document, if any available.

To affect this behavior the setCompressXref() method is available: 

Description
public SetaPDF_Core_Document::setCompressXref (
bool $compressXref
): void

Define whether the cross-reference should be compressed or not.

By default, the SetaPDF-Core component writes the cross-reference in the standard format or in the format which is defined in the source document, if any available.

Parameters
$compressXref : bool

Pass true to enforce that the cross-reference will be compressed. Pass false to enforce a standard uncompressed cross-reference table.

Exceptions

Throws SetaPDF_Core_SecHandler_Exception

Throws SetaPDF_Core_Type_Exception

Throws BadMethodCallException

PHP
// Ensure that the file is written with a default cross-reference table
$document->setCompressXref(false);