Index
- Installation
- Getting Started
- Memory Usage
- Readers and Writers
- The Document Class
- Metadata
- Pages
- Canvas
- Page Layout and Mode
- Viewer Preferences
- Document Outline
- Page Labels
- Actions
- Destinations
- Annotations
- Embedded File Streams
- Colors and Color Spaces
- Page Formats and Boundaries
- Standard and Public Key Encryption
- Fonts and Encodings
- Corrupted Documents
- Reader Enabled Documents
- Refactor Old SetaPDF Code
- API Reference
Fonts and Encodings Using fonts and understanding the text encoding in SetaPDF
Table of Contents
Introduction
The PDF format offers a wide range support for fonts and encodings.
The SetaPDF-Core component handles them all transparently in the background for you. The default input encoding in all SetaPDF components is UTF-8.
Fonts
A font is represented in SetaPDF as a class instance implementing the SetaPDF_Core_Font_FontInterface
interface.
PDF Standard Fonts
The SetaPDF-Core component provide all 14 PDF standard fonts which are represented by following classes:
SetaPDF_Core_Font_Standard_Courier
SetaPDF_Core_Font_Standard_CourierBold
SetaPDF_Core_Font_Standard_CourierBoldOblique
SetaPDF_Core_Font_Standard_CourierOblique
SetaPDF_Core_Font_Standard_Helvetica
SetaPDF_Core_Font_Standard_HelveticaBold
SetaPDF_Core_Font_Standard_HelveticaBoldOblique
SetaPDF_Core_Font_Standard_HelveticaOblique
SetaPDF_Core_Font_Standard_Symbol
SetaPDF_Core_Font_Standard_TimesBold
SetaPDF_Core_Font_Standard_TimesBoldItalic
SetaPDF_Core_Font_Standard_TimesItalic
SetaPDF_Core_Font_Standard_TimesRoman
SetaPDF_Core_Font_Standard_ZapfDingbats
Standard fonts actually have to be initiated with an encoding or with the default encoding. It is also possible to define differences to a base encoding to build individual encodings:
// Replace "uacute" by "Lslash" and "ucircumflex" with "fi" $font = \SetaPDF_Core_Font_Standard_Helvetica::create( $document, \SetaPDF_Core_Encoding::WIN_ANSI, array(250 => 'Lslash', 251 => 'fi') );
The names are defined in the Adobe Glyph List. This list is also included in the core component, which makes it possible to resolve a glyphs name with the following code:
$name = \SetaPDF_Core_Font_Glyph_List::byCode(chr(251), \SetaPDF_Core_Encoding::WIN_ANSI); // ucircumflex
TrueType Fonts
SetaPDF comes with a TrueType parser and subset engine which allows you to use any character from the unicode range as long it is available in the given TrueType or OpenType (with TrueType outlines) font program.
A font subset has the advantage that it will reduce the font size to a minimum which results in very smal PDF files.
A TrueType font subset can be create with the SetaPDF_Core_Font_TrueType_Subset
class. With this font instance you can use up to 255 individual characters. The resulting font program, which will be embedded in the resulting PDF document, will automatically be subset to only these specific used glyphs.
$font = new \SetaPDF_Core_Font_TrueType_Subset($document, 'path/to/font/file.ttf');
So if the input is limited to a text which is not generated with more than 255 different characters, you're fine to use this font class.
If you need to use more than 255 different characters you can use the SetaPDF_Core_Font_Type0_Subset
class, which represents a Type0 font with a TrueType font programm as its descendant font.
$font = new \SetaPDF_Core_Font_Type0_Subset($document, 'path/to/font/file.ttf');
If it is needed to embedded the complete font program you can still use the SetaPDF_Core_Font_TrueType
font class. An instance of this class can be used as any other font type and has to be created by a static create()
method.
This font instance supports automated encoding by simply passing "auto"
to the $diffEncoding
parameter. This way the Difference entry will be build automatically with the used glyphs. It is possible to utilize 255 different glyphs by a single font instance which could cover a wide range of text and languages, too:
$font = \SetaPDF_Core_Font_TrueType::create($document, 'path/to/font/file.ttf', 'WinAnsiEncoding', 'auto');
Generally you will need the legal permission to embed a font or a subset of it into a PDF document. Some fonts have a permission flag set, which says that the font "[...]must not be modified, embedded or exchanged in any manner without first obtaining permission of the legal owner.". If this flag is set all font classes will throw an SetaPDF_Core_Font_Exception
exception. If you have the permission, you can disable this exception by passing true
to the $ignoreLicenseRestrictions
parameter of the desired method.
Please notice that all font instance currently do not support scripts and languages which need pre-processing such as glyph substitution or glyph ordering (such as Arabic, Hebrew,...).
Encodings
The PDF format defines 2 encodings for internal represantation of strings: PDFDocEncoding and UTF-16BE. These encodings only effects strings at their lowest level. They are used for example for metadata like author or creator. Also e.g. form field values are saved in one of these encodings.
The SetaPDF-Core component offers an encoding class which is a wrapper around mbstring (used by default) and iconv with support for PDF specific encodings.
All components make use of this class, so that the handling of different encodings will be done seamless in the background. If any method accepts a text string it will offer an encoding parameter with which you can define the input encoding, if it differs to UTF-8.
If fonts came up the encoding issues are much more complex. Because it's not guaranteed that a font will cover a complete encoding scheme. This could result in replacement characters (?) or the "missing glyph" if a glyph is not available in the desired font.