Readers and Writers Working with reader and writer classes

While in SetaPDF version 1 reading and writing was only possible via local pathes this limitation has been completely removed from version 2 by introducing separate reader and writer classes.

Readers

SetaPDF 2 offers a flexible way for PDF data sources, which are represented as reader classes.

Stream

The SetaPDF_Core_Reader_Stream allows to read in a PDF document directly from a seekable stream context, which uses random byte access. The entire document is never read into memory at once, allowing to work with PDF documents of virtually any size.

PHP
$stream = fopen('data:text/plain,' . urlencode($pdfContent), 'rb');
$reader = new \SetaPDF_Core_Reader_Stream($stream);

File

The file reader extends the stream reader and allows you to pass a file path to its constructor. The file handle is opened and closed internally. Also this reader is serializable

PHP
$reader = new \SetaPDF_Core_Reader_File('path/to/a/pdf/document.pdf');

String

Sometimes the data source is not located in the filesystem but in a variable. For this situation another reader is available as well: SetaPDF_Core_Reader_String

PHP
$reader = new \SetaPDF_Core_Reader_String($pdfString);

Max File Reader and Handler

In specific situations it is needed to open several hundreds or thousands of files in a single process. Depending on the operation system these processes could hit the operation system limit of allowed open file handles/descriptors. To overcome this limitation we build a reader class that utilizes a handler class which opens and closes the handles by a given maximum number.

A reader instance can be created this way: 

PHP
$maxOpenFiles = 100;
$handler = new \SetaPDF_Core_Reader_MaxFileHandler($maxOpenFiles);
$reader = $handler->createReader('path/to/document.pdf');

Writers

To allow as much flexibility as possible SetaPDF 2 does not output any content directly but makes use of writer classes. These PHP classes could be easily extended or derivated into own writers if needed.

The writer classes implements the SetaPDF_Core_Writer_WriterInterface.

The methods of a writer instance are invoked by the SetaPDF_Core_Document class when the resulting PDF document is saved. For more details, please see here.

Echo

The SetaPDF_Core_Writer_Echo class simply echos the PDF content without sending any header. It is recommend to use if the headers are set manually in your PHP script.

File

The SetaPDF_Core_Writer_File class should be used to save the resulting PDF to a local path.

Description
public SetaPDF_Core_Writer_File::__construct (
string $path
)

The constructor.

Parameters
$path : string

The path to the output file

HTTP

The SetaPDF_Core_Writer_Http class is used to send the resulting PDF document to the browser/client via standard HTTP headers after it is completely assembled. It allows to define if the document should be displayed inline or if a download should be forced: 

Description
public SetaPDF_Core_Writer_Http::__construct (
[ string $filename = 'document.pdf' [, boolean $inline = false ]]
)

The constructor.

Parameters
$filename : string

The document filename in UTF-8 encoding

$inline : boolean

Defines if the document should be displayed inline or if a download should be forced

HTTP Stream

The SetaPDF_Core_Writer_HttpStream class will work simliar to the SetaPDF_Core_Writer_Http writer but will start sending the data without a Content-Length header. The resulting bytes will be sent as soon as they were available without assembling the whole resulting document in memory.

This writer saves memory and will immediately cause a download dialog or start the inline viewer at the client.

Stream

The SetaPDF_Core_Writer_Stream writer class forwards the result to a given stream handle. This writer is mostly used internally.

TempStream

The SetaPDF_Core_Writer_TempStream class combines an internal handle to php://temp and a string buffer. This combination results in best results in view to memory and CPU usage.

PHP
$writer = new \SetaPDF_Core_Writer_TempStream();

// create a document instance and save it to the $writer instance
// ...

// and re-read from it
$reader = new \SetaPDF_Core_Reader_Stream($writer->getHandle());

Such logic can e.g. be used in loops of several documents that get merged through the SetaPDF-Merger component to create intermediate results to save memory.

It perfectly fits into environments using PSR-7 + PSR-17 implementations:

PHP
$writer = new \SetaPDF_Core_Writer_TempStream();

$document = \SetaPDF_Core_Document::load(..., $writer);
$document->save()->finish();

/** @var \Psr\Http\Message\StreamFactoryInterface $factory **/
$stream = $factory->createStreamFromResource($writer->getHandle());

/** @var \Psr\Http\Message\ResponseInterface $response **/
$response = (
    $response
    ->withBody($stream)
    ->withHeader('Content-Type', 'application/pdf')
    ...
);

String

The SetaPDF_Core_Writer_String class will hold the assembled PDF document internally which can be accessed via the SetaPDF_Core_Writer_String::__toString() method or the SetaPDF_Core_Writer_String::getBuffer() method.

Variable

The SetaPDF_Core_Writer_Var class could be used to save the assembled PDF document in a string variable. The variable is simply passed by reference to the constructor. 

Description
public SetaPDF_Core_Writer_Var::__construct (
string &$var
)

The constructor.

Parameters
$var : string

A reference to the variable to write to

Temporary File

The SetaPDF_Core_Writer_TempFile class will write to a temporary file. It acts as a kind of proxy to the File writer but will create and delete temporary files automatically. Temporary files will be deleted when the writer instance is destructed.

The class uses the path returned by sys_get_temp_dir() as its default folder for temporary files (as of revision 809). If you want to configure this individually or if you want to control if the temporary files should be deleted automatically or not, following static methods are available:

getFilePrefix()

Get the file prefix for temporary files.

getKeepFile()

Get whether files should be kept or deleted automatically when an instance is destructed.

getTempDir()

Get the current temporary directory path.

setFilePrefix()

Set the file prefix for temporary files.

setKeepFile()

Set whether files should be kept or deleted automatically when an instance is destructed.

setTempDir()

Set the temporary directory path.

Furthermore the class offers a static methods to create temporary files with a defined content or prepare a path for a temporary file: 

createTempFile()

Creates a temporary file and returns the temporary path to it.

createTempPath()

Creates a temporary path.

Chaining Writers

It is possible to chain several writer instances with the SetaPDF_Core_Writer_Chain class. This writer could be used e.g. if the same document should be written to disk while it should be send to the client as well:

PHP
$writer = new SetaPDF_Core_Writer_Chain(array(
    new SetaPDF_Core_Writer_File('path/to/target.pdf'),
    new SetaPDF_Core_Writer_HttpStream('target.pdf', false)
));

Reading and Writing From/To the Same File

Sometimes it is needed to overwrite the original file after modifying it. Anyhow it is impossible to use the same local path for reading and writing at the same time. The document instance needs to read from the in-file while writing to the out-file in its save() method.

Additionally it is a bad practice, because nobody knows what happens to the process and an error could occur, which may destroy the out- and also the in-file.

So after all you should avoid writing to a file from which you are reading and use e.g. a temporary writer instance which copies the file over when the process is finished:

PHP
try {
    $file = 'document.pdf';

    // create a reader
    $reader = new \SetaPDF_Core_Reader_File($file);
    // create a temporary file writer
    $tempWriter = new \SetaPDF_Core_Writer_TempFile();

    $document = \SetaPDF_Core_Document::load($reader, $tempWriter);

    // modify the document...

    // save it
    $document->save()->finish();
    // copy it over
    copy($tempWriter->getPath(), $file);
} catch (\Exception $e) {
    // something went wrong... but our main document, is safe!
}

Stream Wrappers

If the source or target is accessible through individual stream wrappers which are registered via stream_wrapper_register() the stream needs to be seekable for reading.

Amazon S3

The AWS SDK for PHP provides an official Amazon S3 PHP stream wrapper. To use this wrapper it is needed to force allow seeking.

This can be done globally by using the stream_context_set_default() method:

PHP
// register the stream wrapper
$s3 = new S3Client(...);
$s3->registerStreamWrapper();

// make all files for the s3 protocol seekable
stream_context_set_default(['s3' => ['seekable' => true]]);

// now the file can be opened by the File reader
$reader = new \SetaPDF_Core_Reader_File("s3://{$bucket}/{$key}");

If you don't want to make all s3-streams seekable you have to open the streams manually and make use of a stream context:

PHP
// register the stream wrapper
$s3 = new S3Client(...);
$s3->registerStreamWrapper();

// create a stream context
$context = stream_context_create(['s3' => ['seekable' => true]]);
// and open the stream
$stream = fopen("s3://{$bucket}/{$key}", 'r', false, $context);

// then we have to use the Stream reader 
$reader = new \SetaPDF_Core_Reader_Stream($stream);

// ...

// finally close the stream
fclose($stream);