Index
- Installation
- Getting Started
- Memory Usage
- Readers and Writers
- The Document Class
- Metadata
- Pages
- Canvas
- Page Layout and Mode
- Viewer Preferences
- Document Outline
- Page Labels
- Actions
- Destinations
- Annotations
- Embedded File Streams
- Colors and Color Spaces
- Page Formats and Boundaries
- Standard and Public Key Encryption
- Fonts and Encodings
- Corrupted Documents
- Reader Enabled Documents
- Refactor Old SetaPDF Code
- API Reference
Readers and Writers Working with reader and writer classes
Table of Contents
While in SetaPDF version 1 reading and writing was only possible via local pathes this limitation has been completely removed from version 2 by introducing separate reader and writer classes.
Readers
SetaPDF 2 offers a flexible way for PDF data sources, which are represented as reader classes.
Stream
The SetaPDF_Core_Reader_Stream
allows to read in a PDF document directly from a seekable stream context, which uses random byte access. The entire document is never read into memory at once, allowing to work with PDF documents of virtually any size.
$stream = fopen('data:text/plain,' . urlencode($pdfContent), 'rb'); $reader = new \SetaPDF_Core_Reader_Stream($stream);
File
The file reader extends the stream reader and allows you to pass a file path to its constructor. The file handle is opened and closed internally. Also this reader is serializable.
$reader = new \SetaPDF_Core_Reader_File('path/to/a/pdf/document.pdf');
String
Sometimes the data source is not located in the filesystem but in a variable. For this situation another reader is available as well: SetaPDF_Core_Reader_String
$reader = new \SetaPDF_Core_Reader_String($pdfString);
Max File Reader and Handler
In specific situations it is needed to open several hundreds or thousands of files in a single process. Depending on the operation system these processes could hit the operation system limit of allowed open file handles/descriptors. To overcome this limitation we build a reader class that utilizes a handler class which opens and closes the handles by a given maximum number.
A reader instance can be created this way:
$maxOpenFiles = 100; $handler = new \SetaPDF_Core_Reader_MaxFileHandler($maxOpenFiles); $reader = $handler->createReader('path/to/document.pdf');
Writers
To allow as much flexibility as possible SetaPDF 2 does not output any content directly but makes use of writer classes. These PHP classes could be easily extended or derivated into own writers if needed.
The writer classes implements the SetaPDF_Core_Writer_WriterInterface
.
The methods of a writer instance are invoked by the SetaPDF_Core_Document
class when the resulting PDF document is saved. For more details, please see here.
Echo
The SetaPDF_Core_Writer_Echo
class simply echos the PDF content without sending any header. It is recommend to use if the headers are set manually in your PHP script.
File
The SetaPDF_Core_Writer_File
class should be used to save the resulting PDF to a local path.
HTTP
The SetaPDF_Core_Writer_Http
class is used to send the resulting PDF document to the browser/client via standard HTTP headers after it is completely assembled. It allows to define if the document should be displayed inline or if a download should be forced:
Description
The constructor.
Parameters
- $filename : string
The document filename in UTF-8 encoding
- $inline : boolean
Defines if the document should be displayed inline or if a download should be forced
HTTP Stream
The SetaPDF_Core_Writer_HttpStream
class will work simliar to the SetaPDF_Core_Writer_Http
writer but will start sending the data without a Content-Length header. The resulting bytes will be sent as soon as they were available without assembling the whole resulting document in memory.
This writer saves memory and will immediately cause a download dialog or start the inline viewer at the client.
Stream
The SetaPDF_Core_Writer_Stream
writer class forwards the result to a given stream handle. This writer is mostly used internally.
TempStream
The SetaPDF_Core_Writer_TempStream
class combines an internal handle to php://temp
and a string buffer. This combination results in best results in view to memory and CPU usage.
$writer = new \SetaPDF_Core_Writer_TempStream(); // create a document instance and save it to the $writer instance // ... // and re-read from it $reader = new \SetaPDF_Core_Reader_Stream($writer->getHandle());
Such logic can e.g. be used in loops of several documents that get merged through the SetaPDF-Merger component to create intermediate results to save memory.
It perfectly fits into environments using PSR-7 + PSR-17 implementations:
$writer = new \SetaPDF_Core_Writer_TempStream(); $document = \SetaPDF_Core_Document::load(..., $writer); $document->save()->finish(); /** @var \Psr\Http\Message\StreamFactoryInterface $factory **/ $stream = $factory->createStreamFromResource($writer->getHandle()); /** @var \Psr\Http\Message\ResponseInterface $response **/ $response = ( $response ->withBody($stream) ->withHeader('Content-Type', 'application/pdf') ... );
String
The SetaPDF_Core_Writer_String
class will hold the assembled PDF document internally which can be accessed via the SetaPDF_Core_Writer_String::__toString()
method or the SetaPDF_Core_Writer_String::getBuffer()
method.
Variable
The SetaPDF_Core_Writer_Var
class could be used to save the assembled PDF document in a string variable. The variable is simply passed by reference to the constructor.
Temporary File
The SetaPDF_Core_Writer_TempFile
class will write to a temporary file. It acts as a kind of proxy to the File writer but will create and delete temporary files automatically. Temporary files will be deleted when the writer instance is destructed.
The class uses the path returned by sys_get_temp_dir()
as its default folder for temporary files (as of revision 809). If you want to configure this individually or if you want to control if the temporary files should be deleted automatically or not, following static methods are available:
getFilePrefix()
Get the file prefix for temporary files.
getKeepFile()
Get whether files should be kept or deleted automatically when an instance is destructed.
getTempDir()
Get the current temporary directory path.
setFilePrefix()
Set the file prefix for temporary files.
setKeepFile()
Set whether files should be kept or deleted automatically when an instance is destructed.
setTempDir()
Set the temporary directory path.
Furthermore the class offers a static methods to create temporary files with a defined content or prepare a path for a temporary file:
createTempFile()
Creates a temporary file and returns the temporary path to it.
createTempPath()
Creates a temporary path.
Chaining Writers
It is possible to chain several writer instances with the SetaPDF_Core_Writer_Chain
class. This writer could be used e.g. if the same document should be written to disk while it should be send to the client as well:
$writer = new SetaPDF_Core_Writer_Chain(array( new SetaPDF_Core_Writer_File('path/to/target.pdf'), new SetaPDF_Core_Writer_HttpStream('target.pdf', false) ));
Reading and Writing From/To the Same File
Sometimes it is needed to overwrite the original file after modifying it. Anyhow it is impossible to use the same local path for reading and writing at the same time. The document instance needs to read from the in-file while writing to the out-file in its save()
method.
Additionally it is a bad practice, because nobody knows what happens to the process and an error could occur, which may destroy the out- and also the in-file.
So after all you should avoid writing to a file from which you are reading and use e.g. a temporary writer instance which copies the file over when the process is finished:
try { $file = 'document.pdf'; // create a reader $reader = new \SetaPDF_Core_Reader_File($file); // create a temporary file writer $tempWriter = new \SetaPDF_Core_Writer_TempFile(); $document = \SetaPDF_Core_Document::load($reader, $tempWriter); // modify the document... // save it $document->save()->finish(); // copy it over copy($tempWriter->getPath(), $file); } catch (\Exception $e) { // something went wrong... but our main document, is safe! }
Stream Wrappers
If the source or target is accessible through individual stream wrappers which are registered via stream_wrapper_register()
the stream needs to be seekable for reading.
Amazon S3
The AWS SDK for PHP provides an official Amazon S3 PHP stream wrapper. To use this wrapper it is needed to force allow seeking.
This can be done globally by using the stream_context_set_default()
method:
// register the stream wrapper $s3 = new S3Client(...); $s3->registerStreamWrapper(); // make all files for the s3 protocol seekable stream_context_set_default(['s3' => ['seekable' => true]]); // now the file can be opened by the File reader $reader = new \SetaPDF_Core_Reader_File("s3://{$bucket}/{$key}");
If you don't want to make all s3-streams seekable you have to open the streams manually and make use of a stream context:
// register the stream wrapper $s3 = new S3Client(...); $s3->registerStreamWrapper(); // create a stream context $context = stream_context_create(['s3' => ['seekable' => true]]); // and open the stream $stream = fopen("s3://{$bucket}/{$key}", 'r', false, $context); // then we have to use the Stream reader $reader = new \SetaPDF_Core_Reader_Stream($stream); // ... // finally close the stream fclose($stream);