Index
- Installation
- Getting Started
- Memory Usage
- Readers and Writers
- The Document Class
- Metadata
- Pages
- Canvas
- Page Layout and Mode
- Viewer Preferences
- Document Outline
- Page Labels
- Actions
- Destinations
- Annotations
- Embedded File Streams
- Colors and Color Spaces
- Page Formats and Boundaries
- Standard and Public Key Encryption
- Fonts and Encodings
- Corrupted Documents
- Reader Enabled Documents
- Refactor Old SetaPDF Code
- API Reference
The Document Class Creating and loading PDF documents
Table of Contents
Introduction
In SetaPDF 2 a PDF document is represented by a SetaPDF_Core_Document
instance.
All individual components will use such instances to work with PDF documents. An instance e.g. could be forwarded from component to component without saving a temporary document version.
Very much low level functionalities are available through a document instance as you will read in the following chapters of this manual.
Creating Instances
The simplest way to initiate an instance is done by simply doing it:
$document = new \SetaPDF_Core_Document();
To initiate a document instance this way is done in very rare situations because most of the SetaPDF components will work with existing documents and, for sure, will need to write the resulting document to a destination which is done with a writer instance. A writer could be passed in the constructor or with a setter method:
// create a writer $writer = new \SetaPDF_Core_Writer_File('path/to/result.pdf'); // pass it in the constructor $document = new \SetaPDF_Core_Document($writer); // or pass it with the setWriter() method $document = new \SetaPDF_Core_Document(); $document->setWriter($writer);
Loading an Existing Document
The most common task for SetaPDF components is to open an existing PDF document. This could be done with the static load()
-method of the SetaPDF_Core_Document
class:
Description
Creates an instance of a document based on an existing PDF.
Parameters
- $reader : SetaPDF_Core_Reader_ReaderInterface
A reader instance
- $writer : SetaPDF_Core_Writer_WriterInterface
A writer instance
- $className : string
The class name to initiate
Return Values
Returns a SetaPDF_Core_Document
instance
Exceptions
Throws SetaPDF_Core_Parser_CrossReferenceTable_Exception,Exception
To open a PDF file for reading and writing following PHP code is common:
$reader = new \SetaPDF_Core_Reader_File('path/to/original.pdf'); $writer = new \SetaPDF_Core_Writer_File('path/to/new.pdf'); $document = \SetaPDF_Core_Document::load($reader, $writer);
There are two additional helper methods, which create a reader instance internally, to initiate an instance by a filename via the loadByFilename()
or by a string via the loadByString()
method:
$document = \SetaPDF_Core_Document::loadByFilename('path/to/original.pdf'); // or $document = \SetaPDF_Core_Document::loadByString(file_get_contents('path/to/original.pdf'));
If the PDF document is available in the filesystem you should use the file reader throughout! Parsing the document in memory through a string variable will need much more memory and is slower due to very much string operations.
Loading Encrypted/Protected Documents
The SetaPDF-Core component natively supports reading and writing of PDF documents which are encrypted with PDF standard (protected with an owner and optionally an user password) or public-key security.
An example of how to check for and authenticate against a security handler is explained here.
Saving a Document
A document instance can be saved by simply calling the save()
-method of the SetaPDF_Core_Document
instance. This will write the current document structure or changes to the writer instance:
Description
Saves the document.
The PDF format offers a way to add changes to a document by simply appending the changes to the end of the file. This method is called incremental update and has the advantage that it is very fast, because only changed objects have to be written. This behavior is the default one, when calling the save()-method. Sadly it makes it easy to revert the document to the previous state by simply cutting the bytes of the last revision.
The parameter of the save()-method allows you to define that the document should be rebuild
from scratch by resolving the complete object structure. Just pass
SetaPDF_Core_Document::SAVE_METHOD_REWRITE
to it. This task is very performance intensive, because the
complete document have to be parsed, interpreted and rewritten.
Additionally, it is possible to rewrite the whole document with all available objects. The benefit of this
solution is that it will keep compressed object streams intact:
SetaPDF_Core_Document::SAVE_METHOD_REWRITE_ALL
. The disadvantage is, that unused objects may be
copied/written, too.
Parameters
- $method : boolean|integer
Update or rewrite the document
Exceptions
Throws SetaPDF_Core_Document_ObjectNotDefinedException
Throws SetaPDF_Core_Document_ObjectNotFoundException
Throws SetaPDF_Core_Exception
Throws SetaPDF_Core_Parser_CrossReferenceTable_Exception
Throws SetaPDF_Core_Parser_Exception
Throws SetaPDF_Core_SecHandler_Exception
Throws SetaPDF_Core_Type_Exception
Throws SetaPDF_Core_Type_IndirectReference_Exception
Throws SetaPDF_Exception
Throws SetaPDF_Exception_NotImplemented
Throws BadMethodCallException
A very simple example:
$writer = new SetaPDF_Core_Writer_Echo(); $document = new SetaPDF_Core_Document($writer); $document->save();
The snipped above will output something like:
%PDF-1.3 %???? xref 0 1 0000000000 65535 f trailer <</Size 1/ID[<b6fa53ca3c8054733520750c81b0981f><b6fa53ca3c8054733520750c81b0981f>]>> startxref 15 %%EOF
That's a raw document structure without any content. Actually it's not even a valid PDF document because of missing catalog, pages and page objects.
To create at least a valid PDF document following PHP code could be used:
As you can see the demo uses not only a save()
call but additionally calls the finish()
method of the SetaPDF_Core_Document
instance. This call will mark the document to be finished and forward this call to the writer which will send e.g. a header and the PDF content or close a file handle. For demonstration purpose the finish()
call was left on the initial example - the echo writer doesn't do anything in its finish()
-method.
It is possible to call the save()
-method several times. Each save()
call will write a new incremental update section to the writer if changes were recognized between the save()
calls.
After calling finish()
on the document instance it shall not be used further! Additional method calls may result in fatal errors! Only the optional cleanUp()
method can be called afterwards.
Cross-Reference Information
The cross-reference table of a PDF document contains information to permit a random access to object positions within the file. Beginning with PDF 1.5 an alternative way for storing the cross-reference information was introduced: "Cross-Reference Streams".
By default the SetaPDF-Core component writes the cross-reference in the standard format or in the format which is defined in the source document, if any available.
To affect this behavior the setCompressXref() method is available:
Description
Define whether the cross-reference should be compressed or not.
By default, the SetaPDF-Core component writes the cross-reference in the standard format or in the format which is defined in the source document, if any available.
Parameters
- $compressXref : bool
Pass true to enforce that the cross-reference will be compressed. Pass false to enforce a standard uncompressed cross-reference table.
Exceptions
Throws SetaPDF_Core_SecHandler_Exception
Throws SetaPDF_Core_Type_Exception
Throws BadMethodCallException
// Ensure that the file is written with a default cross-reference table $document->setCompressXref(false);