FreeType » Docs » Core API » Character Mapping
Character Mapping¶
Synopsis¶
This section holds functions and structures that are related to mapping character input codes to glyph indices.
Note that for many scripts the simplistic approach used by FreeType of mapping a single character to a single glyph is not valid or possible! In general, a higher-level library like HarfBuzz or ICU should be used for handling text strings.
FT_CharMap¶
Defined in FT_FREETYPE_H (freetype/freetype.h).
typedef struct FT_CharMapRec_* FT_CharMap;
A handle to a character map (usually abbreviated to ‘charmap’). A charmap is used to translate character codes in a given encoding into glyph indexes for its parent's face. Some font formats may provide several charmaps per font.
Each face object owns zero or more charmaps, but only one of them can be ‘active’, providing the data used by FT_Get_Char_Index
or FT_Load_Char
.
The list of available charmaps in a face is available through the face->num_charmaps
and face->charmaps
fields of FT_FaceRec
.
The currently active charmap is available as face->charmap
. You should call FT_Set_Charmap
to change it.
note
When a new face is created (either through FT_New_Face
or FT_Open_Face
), the library looks for a Unicode charmap within the list and automatically activates it. If there is no Unicode charmap, FreeType doesn't set an ‘active’ charmap.
also
See FT_CharMapRec
for the publicly accessible fields of a given character map.
FT_CharMapRec¶
Defined in FT_FREETYPE_H (freetype/freetype.h).
typedef struct FT_CharMapRec_
{
FT_Face face;
FT_Encoding encoding;
FT_UShort platform_id;
FT_UShort encoding_id;
} FT_CharMapRec;
The base charmap structure.
fields
face |
A handle to the parent face object. |
encoding |
An |
platform_id |
An ID number describing the platform for the following encoding ID. This comes directly from the TrueType specification and gets emulated for other formats. |
encoding_id |
A platform-specific encoding number. This also comes from the TrueType specification and gets emulated similarly. |
FT_Encoding¶
Defined in FT_FREETYPE_H (freetype/freetype.h).
typedef enum FT_Encoding_
{
FT_ENC_TAG( FT_ENCODING_NONE, 0, 0, 0, 0 ),
FT_ENC_TAG( FT_ENCODING_MS_SYMBOL, 's', 'y', 'm', 'b' ),
FT_ENC_TAG( FT_ENCODING_UNICODE, 'u', 'n', 'i', 'c' ),
FT_ENC_TAG( FT_ENCODING_SJIS, 's', 'j', 'i', 's' ),
FT_ENC_TAG( FT_ENCODING_PRC, 'g', 'b', ' ', ' ' ),
FT_ENC_TAG( FT_ENCODING_BIG5, 'b', 'i', 'g', '5' ),
FT_ENC_TAG( FT_ENCODING_WANSUNG, 'w', 'a', 'n', 's' ),
FT_ENC_TAG( FT_ENCODING_JOHAB, 'j', 'o', 'h', 'a' ),
/* for backward compatibility */
FT_ENCODING_GB2312 = FT_ENCODING_PRC,
FT_ENCODING_MS_SJIS = FT_ENCODING_SJIS,
FT_ENCODING_MS_GB2312 = FT_ENCODING_PRC,
FT_ENCODING_MS_BIG5 = FT_ENCODING_BIG5,
FT_ENCODING_MS_WANSUNG = FT_ENCODING_WANSUNG,
FT_ENCODING_MS_JOHAB = FT_ENCODING_JOHAB,
FT_ENC_TAG( FT_ENCODING_ADOBE_STANDARD, 'A', 'D', 'O', 'B' ),
FT_ENC_TAG( FT_ENCODING_ADOBE_EXPERT, 'A', 'D', 'B', 'E' ),
FT_ENC_TAG( FT_ENCODING_ADOBE_CUSTOM, 'A', 'D', 'B', 'C' ),
FT_ENC_TAG( FT_ENCODING_ADOBE_LATIN_1, 'l', 'a', 't', '1' ),
FT_ENC_TAG( FT_ENCODING_OLD_LATIN_2, 'l', 'a', 't', '2' ),
FT_ENC_TAG( FT_ENCODING_APPLE_ROMAN, 'a', 'r', 'm', 'n' )
} FT_Encoding;
/* these constants are deprecated; use the corresponding `FT_Encoding` */
/* values instead */
#define ft_encoding_none FT_ENCODING_NONE
#define ft_encoding_unicode FT_ENCODING_UNICODE
#define ft_encoding_symbol FT_ENCODING_MS_SYMBOL
#define ft_encoding_latin_1 FT_ENCODING_ADOBE_LATIN_1
#define ft_encoding_latin_2 FT_ENCODING_OLD_LATIN_2
#define ft_encoding_sjis FT_ENCODING_SJIS
#define ft_encoding_gb2312 FT_ENCODING_PRC
#define ft_encoding_big5 FT_ENCODING_BIG5
#define ft_encoding_wansung FT_ENCODING_WANSUNG
#define ft_encoding_johab FT_ENCODING_JOHAB
#define ft_encoding_adobe_standard FT_ENCODING_ADOBE_STANDARD
#define ft_encoding_adobe_expert FT_ENCODING_ADOBE_EXPERT
#define ft_encoding_adobe_custom FT_ENCODING_ADOBE_CUSTOM
#define ft_encoding_apple_roman FT_ENCODING_APPLE_ROMAN
An enumeration to specify character sets supported by charmaps. Used in the FT_Select_Charmap
API function.
note
Despite the name, this enumeration lists specific character repertoires (i.e., charsets), and not text encoding methods (e.g., UTF-8, UTF-16, etc.).
Other encodings might be defined in the future.
values
FT_ENCODING_NONE |
The encoding value 0 is reserved for all formats except BDF, PCF, and Windows FNT; see below for more information. |
FT_ENCODING_UNICODE |
The Unicode character set. This value covers all versions of the Unicode repertoire, including ASCII and Latin-1. Most fonts include a Unicode charmap, but not all of them. For example, if you want to access Unicode value U+1F028 (and the font contains it), use value 0x1F028 as the input value for |
FT_ENCODING_MS_SYMBOL |
Microsoft Symbol encoding, used to encode mathematical symbols and wingdings. For more information, see ‘https://www.microsoft.com/typography/otspec/recom.htm#non-standard-symbol-fonts’, ‘http://www.kostis.net/charsets/symbol.htm’, and ‘http://www.kostis.net/charsets/wingding.htm’. This encoding uses character codes from the PUA (Private Unicode Area) in the range U+F020-U+F0FF. |
FT_ENCODING_SJIS |
Shift JIS encoding for Japanese. More info at ‘https://en.wikipedia.org/wiki/Shift_JIS’. See note on multi-byte encodings below. |
FT_ENCODING_PRC |
Corresponds to encoding systems mainly for Simplified Chinese as used in People's Republic of China (PRC). The encoding layout is based on GB 2312 and its supersets GBK and GB 18030. |
FT_ENCODING_BIG5 |
Corresponds to an encoding system for Traditional Chinese as used in Taiwan and Hong Kong. |
FT_ENCODING_WANSUNG |
Corresponds to the Korean encoding system known as Extended Wansung (MS Windows code page 949). For more information see ‘https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WindowsBestFit/bestfit949.txt’. |
FT_ENCODING_JOHAB |
The Korean standard character set (KS C 5601-1992), which corresponds to MS Windows code page 1361. This character set includes all possible Hangul character combinations. |
FT_ENCODING_ADOBE_LATIN_1 |
Corresponds to a Latin-1 encoding as defined in a Type 1 PostScript font. It is limited to 256 character codes. |
FT_ENCODING_ADOBE_STANDARD |
Adobe Standard encoding, as found in Type 1, CFF, and OpenType/CFF fonts. It is limited to 256 character codes. |
FT_ENCODING_ADOBE_EXPERT |
Adobe Expert encoding, as found in Type 1, CFF, and OpenType/CFF fonts. It is limited to 256 character codes. |
FT_ENCODING_ADOBE_CUSTOM |
Corresponds to a custom encoding, as found in Type 1, CFF, and OpenType/CFF fonts. It is limited to 256 character codes. |
FT_ENCODING_APPLE_ROMAN |
Apple roman encoding. Many TrueType and OpenType fonts contain a charmap for this 8-bit encoding, since older versions of Mac OS are able to use it. |
FT_ENCODING_OLD_LATIN_2 |
This value is deprecated and was neither used nor reported by FreeType. Don't use or test for it. |
FT_ENCODING_MS_SJIS |
Same as FT_ENCODING_SJIS. Deprecated. |
FT_ENCODING_MS_GB2312 |
Same as FT_ENCODING_PRC. Deprecated. |
FT_ENCODING_MS_BIG5 |
Same as FT_ENCODING_BIG5. Deprecated. |
FT_ENCODING_MS_WANSUNG |
Same as FT_ENCODING_WANSUNG. Deprecated. |
FT_ENCODING_MS_JOHAB |
Same as FT_ENCODING_JOHAB. Deprecated. |
note
When loading a font, FreeType makes a Unicode charmap active if possible (either if the font provides such a charmap, or if FreeType can synthesize one from PostScript glyph name dictionaries; in either case, the charmap is tagged with FT_ENCODING_UNICODE
). If such a charmap is synthesized, it is placed at the first position of the charmap array.
All other encodings are considered legacy and tagged only if explicitly defined in the font file. Otherwise, FT_ENCODING_NONE
is used.
FT_ENCODING_NONE
is set by the BDF and PCF drivers if the charmap is neither Unicode nor ISO-8859-1 (otherwise it is set to FT_ENCODING_UNICODE
). Use FT_Get_BDF_Charset_ID
to find out which encoding is really present. If, for example, the cs_registry
field is ‘KOI8’ and the cs_encoding
field is ‘R’, the font is encoded in KOI8-R.
FT_ENCODING_NONE
is always set (with a single exception) by the winfonts driver. Use FT_Get_WinFNT_Header
and examine the charset
field of the FT_WinFNT_HeaderRec
structure to find out which encoding is really present. For example, FT_WinFNT_ID_CP1251
(204) means Windows code page 1251 (for Russian).
FT_ENCODING_NONE
is set if platform_id
is TT_PLATFORM_MACINTOSH
and encoding_id
is not TT_MAC_ID_ROMAN
(otherwise it is set to FT_ENCODING_APPLE_ROMAN
).
If platform_id
is TT_PLATFORM_MACINTOSH
, use the function FT_Get_CMap_Language_ID
to query the Mac language ID that may be needed to be able to distinguish Apple encoding variants. See
https://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/Readme.txt
to get an idea how to do that. Basically, if the language ID is 0, don't use it, otherwise subtract 1 from the language ID. Then examine encoding_id
. If, for example, encoding_id
is TT_MAC_ID_ROMAN
and the language ID (minus 1) is TT_MAC_LANGID_GREEK
, it is the Greek encoding, not Roman. TT_MAC_ID_ARABIC
with TT_MAC_LANGID_FARSI
means the Farsi variant of the Arabic encoding.
FT_ENC_TAG¶
Defined in FT_FREETYPE_H (freetype/freetype.h).
#ifndef FT_ENC_TAG
#define FT_ENC_TAG( value, a, b, c, d ) \
value = ( ( FT_STATIC_BYTE_CAST( FT_UInt32, a ) << 24 ) | \
( FT_STATIC_BYTE_CAST( FT_UInt32, b ) << 16 ) | \
( FT_STATIC_BYTE_CAST( FT_UInt32, c ) << 8 ) | \
FT_STATIC_BYTE_CAST( FT_UInt32, d ) )
#endif /* FT_ENC_TAG */
This macro converts four-letter tags into an unsigned long. It is used to define ‘encoding’ identifiers (see FT_Encoding
).
note
Since many 16-bit compilers don't like 32-bit enumerations, you should redefine this macro in case of problems to something like this:
#define FT_ENC_TAG( value, a, b, c, d ) value
to get a simple enumeration without assigning special numbers.
FT_Select_Charmap¶
Defined in FT_FREETYPE_H (freetype/freetype.h).
FT_EXPORT( FT_Error )
FT_Select_Charmap( FT_Face face,
FT_Encoding encoding );
Select a given charmap by its encoding tag (as listed in freetype.h
).
inout
face |
A handle to the source face object. |
input
encoding |
A handle to the selected encoding. |
return
FreeType error code. 0 means success.
note
This function returns an error if no charmap in the face corresponds to the encoding queried here.
Because many fonts contain more than a single cmap for Unicode encoding, this function has some special code to select the one that covers Unicode best (‘best’ in the sense that a UCS-4 cmap is preferred to a UCS-2 cmap). It is thus preferable to FT_Set_Charmap
in this case.
FT_Set_Charmap¶
Defined in FT_FREETYPE_H (freetype/freetype.h).
FT_EXPORT( FT_Error )
FT_Set_Charmap( FT_Face face,
FT_CharMap charmap );
Select a given charmap for character code to glyph index mapping.
inout
face |
A handle to the source face object. |
input
charmap |
A handle to the selected charmap. |
return
FreeType error code. 0 means success.
note
This function returns an error if the charmap is not part of the face (i.e., if it is not listed in the face->charmaps
table).
It also fails if an OpenType type 14 charmap is selected (which doesn't map character codes to glyph indices at all).
FT_Get_Charmap_Index¶
Defined in FT_FREETYPE_H (freetype/freetype.h).
FT_EXPORT( FT_Int )
FT_Get_Charmap_Index( FT_CharMap charmap );
Retrieve index of a given charmap.
input
charmap |
A handle to a charmap. |
return
The index into the array of character maps within the face to which charmap
belongs. If an error occurs, -1 is returned.
FT_Get_Char_Index¶
Defined in FT_FREETYPE_H (freetype/freetype.h).
Return the glyph index of a given character code. This function uses the currently selected charmap to do the mapping.
input
face |
A handle to the source face object. |
charcode |
The character code. |
return
The glyph index. 0 means ‘undefined character code’.
note
If you use FreeType to manipulate the contents of font files directly, be aware that the glyph index returned by this function doesn't always correspond to the internal indices used within the file. This is done to ensure that value 0 always corresponds to the ‘missing glyph’. If the first glyph is not named ‘.notdef’, then for Type 1 and Type 42 fonts, ‘.notdef’ will be moved into the glyph ID 0 position, and whatever was there will be moved to the position ‘.notdef’ had. For Type 1 fonts, if there is no ‘.notdef’ glyph at all, then one will be created at index 0 and whatever was there will be moved to the last index – Type 42 fonts are considered invalid under this condition.
FT_Get_First_Char¶
Defined in FT_FREETYPE_H (freetype/freetype.h).
Return the first character code in the current charmap of a given face, together with its corresponding glyph index.
input
face |
A handle to the source face object. |
output
agindex |
Glyph index of first character code. 0 if charmap is empty. |
return
The charmap's first character code.
note
You should use this function together with FT_Get_Next_Char
to parse all character codes available in a given charmap. The code should look like this:
FT_ULong charcode;
FT_UInt gindex;
charcode = FT_Get_First_Char( face, &gindex );
while ( gindex != 0 )
{
... do something with (charcode,gindex) pair ...
charcode = FT_Get_Next_Char( face, charcode, &gindex );
}
Be aware that character codes can have values up to 0xFFFFFFFF; this might happen for non-Unicode or malformed cmaps. However, even with regular Unicode encoding, so-called ‘last resort fonts’ (using SFNT cmap format 13, see function FT_Get_CMap_Format
) normally have entries for all Unicode characters up to 0x1FFFFF, which can cause a lot of iterations.
Note that *agindex
is set to 0 if the charmap is empty. The result itself can be 0 in two cases: if the charmap is empty or if the value 0 is the first valid character code.
FT_Get_Next_Char¶
Defined in FT_FREETYPE_H (freetype/freetype.h).
Return the next character code in the current charmap of a given face following the value char_code
, as well as the corresponding glyph index.
input
face |
A handle to the source face object. |
char_code |
The starting character code. |
output
agindex |
Glyph index of next character code. 0 if charmap is empty. |
return
The charmap's next character code.
note
You should use this function with FT_Get_First_Char
to walk over all character codes available in a given charmap. See the note for that function for a simple code example.
Note that *agindex
is set to 0 when there are no more codes in the charmap.
FT_Load_Char¶
Defined in FT_FREETYPE_H (freetype/freetype.h).
Load a glyph into the glyph slot of a face object, accessed by its character code.
inout
face |
A handle to a target face object where the glyph is loaded. |
input
char_code |
The glyph's character code, according to the current charmap used in the face. |
load_flags |
A flag indicating what to load for this glyph. The |
return
FreeType error code. 0 means success.
note
This function simply calls FT_Get_Char_Index
and FT_Load_Glyph
.
Many fonts contain glyphs that can't be loaded by this function since its glyph indices are not listed in any of the font's charmaps.
If no active cmap is set up (i.e., face->charmap
is zero), the call to FT_Get_Char_Index
is omitted, and the function behaves identically to FT_Load_Glyph
.