- Getting Started
- Memory Usage
- Readers and Writers
- The Document Class
- Page Layout and Mode
- Viewer Preferences
- Document Outline
- Page Labels
- Colors and Color Spaces
- Page Formats and Boundaries
- Standard and Public Key Encryption
- Fonts and Encodings
- Corrupted Documents
- Reader Enabled Documents
- Refactor Old SetaPDF Code
- API Reference
Fonts and Encodings Using fonts and understanding the text encoding in SetaPDF
The PDF format offers a wide range support for fonts and encodings.
The SetaPDF-Core component handles them all transparently in the background for you. The default input encoding in all SetaPDF components is UTF-8.
A font is represented in SetaPDF as an instance of a SetaPDF_Core_Font class.
The SetaPDF-Core component provide all 14 PDF standard fonts which are represented by following classes:
Standard fonts actually have to be initiated with an encoding or with the default encoding. It is also possible to define differences to a base encoding to build individual encodings:
// Replace "uacute" by "Lslash" and "ucircumflex" with "fi" $font = SetaPDF_Core_Font_Standard_Helvetica::create( $document, SetaPDF_Core_Encoding::WIN_ANSI, array(250 => 'Lslash', 251 => 'fi') );
The names are defined in the Adobe Glyph List. This list is also included in the core component, which makes it possible to resolve a glyphs name with the following code:
$name = SetaPDF_Core_Font_Glyph_List::byCode(chr(251), SetaPDF_Core_Encoding::WIN_ANSI); // ucircumflex
It is planned to implement a mechanism for standard fonts which tries to adjust the differences automatically if characters are used which are not covered by the base encoding. Actually the differences have to be defined manually.
The Core component offers a parser for True Type fonts that is used by a SetaPDF_Core_Font_TrueType font class. An instance of this class can be used as any standard font type.
Using such a font object makes it possible to embed the full font program into the PDF file (while subsetting is currently not supported).
Furthermore this font instance already supports automated encoding by simply passing "auto" to the $diffEncoding parameter. This way the Difference entry will be build automatically with the used glyphs. It is possible to utilize 255 different glyphs by a single font instance which could cover a wide range of text and languages:
$font = SetaPDF_Core_Font_TrueType::create($document, 'path/to/font/file.ttf', 'WinAnsiEncoding', 'auto');
The method is defined as follwing:
Creates a font object based on a TrueType font file.
- $document : SetaPDF_Core_Document
The document instance in which the font will be used
- $fontFile : string
A path to the TTF font file
- $baseEncoding : string
The base encoding
- $diffEncoding : array|string
A translation table to adjust individual char codes to different glyphs or "auto" to build this table dynamically.
- $embedded : boolean
Defines if the font program will be embedded in the document or not
- $forceLicenseRestrictions : bool
Could be used to disable the font license check
The SetaPDF_Core_Font_TrueType instance
The PDF format defines 2 encodings for internal represantation of strings: PDFDocEncoding and UTF-16BE. These encodings only effects strings at their lowest level. They are used for example for metadata like author or creator. Also e.g. form field values are saved in one of these encodings.
All components make use of this class, so that the handling of different encodings will be done seamless in the background. If any method accepts a text string it will offer an encoding parameter with which you can define the input encoding, if it differs to UTF-8.
If fonts came up the encoding issues are much more complex. Because it's not guaranteed that a font will cover a complete encoding scheme. This could result in replacement characters (?) if a glyph is not available in the desired font.