- Getting Started
- Memory Usage
- Readers and Writers
- The Document Class
- Page Layout and Mode
- Viewer Preferences
- Document Outline
- Page Labels
- Embedded File Streams
- Colors and Color Spaces
- Page Formats and Boundaries
- Standard and Public Key Encryption
- Fonts and Encodings
- Corrupted Documents
- Reader Enabled Documents
- Refactor Old SetaPDF Code
- API Reference
The Document Class Creating and loading PDF documents
Table of Contents
In SetaPDF 2 a PDF document is represented by a
All individual components will use such instances to work with PDF documents. An instance e.g. could be forwarded from component to component without saving a temporary document version.
Very much low level functionalities are available through a document instance as you will read in the following chapters of this manual.
The simplest way to initiate an instance is done by simply doing it:
$document = new SetaPDF_Core_Document();
To initiate a document instance this way is done in very rare situations because most of the SetaPDF components will work with existing documents and, for sure, will need to write the resulting document to a destination which is done with a writer instance. A writer could be passed in the constructor or with a setter method:
// create a writer $writer = new SetaPDF_Core_Writer_File('path/to/result.pdf'); // pass it in the constructor $document = new SetaPDF_Core_Document($writer); // or pass it with the setWriter() method $document = new SetaPDF_Core_Document(); $document->setWriter($writer);
Loading an Existing Document
The most common task for SetaPDF components is to open an existing PDF document. This could be done with the static
load()-method of the
Creates an instance of a document based on an existing PDF.
- $reader : SetaPDF_Core_Reader_ReaderInterface
A reader instance
- $writer : SetaPDF_Core_Writer_WriterInterface
A writer instance
- $className : string
The class name to initiate
Returns a SetaPDF_Core_Document instance
To open a PDF file for reading and writing following PHP code is common:
$reader = new SetaPDF_Core_Reader_File('path/to/original.pdf'); $writer = new SetaPDF_Core_Writer_File('path/to/new.pdf'); $document = SetaPDF_Core_Document::load($reader, $writer);
As there are actually only 2 possible reader classes the document class offers additional helper methods to initiate an instance by a filename via the
loadByFilename() or by a string via the
$document = SetaPDF_Core_Document::loadByFilename('path/to/original.pdf'); // or $document = SetaPDF_Core_Document::loadByString(file_get_contents('path/to/original.pdf'));
If the PDF document is available in the filesystem you should use the file reader throughout! Parsing the document in memory through a string variable will need much more memory and is slower due to very much string operations.
Loading Encrypted/Protected Documents
The SetaPDF-Core component natively supports reading and writing of PDF documents which are encrypted with PDF standard (protected with an owner and optionally an user password) or public-key security.
An example of how to check for and authenticate against a security handler is explained here.
Saving a Document
A document instance can be saved by simply calling the
save()-method of the
SetaPDF_Core_Document instance. This will write the current document structure or changes to the writer instance:
Saves the document.
The PDF format offers a way to add changes to a document by simply appending the changes to the end of the file. This method is called incremental update and has the advantage that it is very fast, because only changed objects have to be written. This behavior is the default one, when calling the save()-method. Sadly it makes it easy to revert the document to the previous state by simply cutting the bytes of the last revision.
The parameter of the save()-method allows you to define that the document should be rebuild from scratch by resolving the complete object structure. Just pass SetaPDF_Core_Document::SAVE_METHOD_REWRITE to it. This task is very performance intensive, because the complete document have to be parsed, interpreted and rewritten.
Additionally it is possible to rewrite the whole document with all available objects. The benefit of this solution is that it will keep compressed object streams intact: SetaPDF_Core_Document::SAVE_METHOD_REWRITE_ALL. The disadvantage is, that unused objects may be copied/written, too.
- $method : boolean|integer
Update or rewrite the document
A very simple example:
$writer = new SetaPDF_Core_Writer_Echo(); $document = new SetaPDF_Core_Document($writer); $document->save();
The snipped above will output something like:
%PDF-1.3 %???? xref 0 1 0000000000 65535 f trailer <</Size 1/ID[<b6fa53ca3c8054733520750c81b0981f><b6fa53ca3c8054733520750c81b0981f>]>> startxref 15 %%EOF
That's a raw document structure without any content. Actually it's not even a valid PDF document because of missing catalog, pages and page objects.
To create at least a valid PDF document following PHP code could be used:
As you can see the demo uses not only a
save() call but additionally calls the
finish() method of the
SetaPDF_Core_Document instance. This call will mark the document to be finished and forward this call to the writer which will send e.g. a header and the PDF content or close a file handle. For demonstration purpose the
finish() call was left on the initial example - the echo writer doesn't do anything in its
It is possible to call the
save()-method several times. Each
save() call will write a new incremental update section to the writer if changes were recognized between the
finish() on the document instance it shall not be used further! Additional method calls may result in fatal errors! Only the optional
cleanUp() method can be called afterwards.
The cross-reference table of a PDF document contains information to permit a random access to object positions within the file. Beginning with PDF 1.5 an alternative way for storing the cross-reference information was introduced: "Cross-Reference Streams".
By default the SetaPDF-Core component writes the cross-reference in the standard format or in the format which is defined in the source document, if any available.
To affect this behavior the setCompressXref() method is available:
Define whether the cross-reference should be compressed or not.
By default, the SetaPDF-Core component writes the cross-reference in the standard format or in the format which is defined in the source document, if any available.
- $compressXref : bool
Pass true to enforce that the cross-reference will be compressed. Pass false to enforce a standard uncompressed cross-reference table.
// Ensure that the file is written with a default cross-reference table $document->setCompressXref(false);