delorie.com/users/dj/brain/graphics/bits-glyphs/ | search |
This document talks about the various terms that are used when discussing fonts, characters, etc. While the terms have specific and well-understood meanings, among laypersons they often get misused or misunderstood. It is important to use the proper terms and understand the mechanism behind all this.
First, here are the terms I'll be using and the definitions that I think they have:
A glyph is a visual representation of a character. Its only purpose is to format the character information into a form that can be readily interpreted by the human visual system. For example, here are a number of different glyphs that could all be used to represent the seventh letter of the English alphabet:
While there are some conventions for selecting glyphs (you wouldn't want a house-shaped glyph for the letter g), the choice is usually left to the author of the font. Conventional fonts will use glyphs that at least represent the letters, digits, and symbols of the ASCII or ISO-Latin-1 character set. Custom fonts may use the character codes merely as an index to an assortment of unrelated glyphs, such as electical symbols or geometrical shapes. ASCII code 0x67, traditionally used for a capital G, may be used to select a floppy-shaped icon in a custom font.
The important thing to remember is that a glyph, by definition, is merely a shape and has no intrinsic meaning on its own.
In order to ease the use of glyphs in some fonts, they may be given names. Glyphs corresponding to letters of the alphabet may be given one-letter names (choose the appropriate letter, of course), but some may be longer. For example, the glyph $ may be named dollarsign or usmoney. In fonts that claim to provide a standard collection of glyphs (for example, ISO-Latin-1 fonts), the standard dictates the names that will be used.
Most fonts will number their glyphs starting with one. Fonts may choose to select these numbers so that they happen to match the ASCII or ISO-Latin-1 character codes in the default encoding, but there is no need to do so. The glyph id may not be directly usable in some font formats.
When a program wishes to display characters to the user, it represents those characters as integers, usually in the range of 0..255 for Western languages. The system must take those integers and choose suitable glyphs to display. If the wrong glyphs are chosen, the user will not understand what the program is trying to say. The primary tool for performing this mapping is called the encoding. This encoding is basically a table that maps the character codes to glyphs. Here is an example:
In this example, the program wants to print the string Hello. The letter e in that string is stored as the integer 0x65 (101 decimal). The encoding maps that integer value to glyph number 425 in the font. That glyph is shown in the example.
I've put together a complete list of ISO-Latin-1 encodings and HTML entities for reference.
While the font may provide a default encoding for the first 256 character codes, the system may choose to install a different encoding for its purpose. For example, if the application is dealing with ISO-Latin-1 data, it may specify an encoding that maps character codes to suitable glyphs for that data.
One case where this becomes important in web design concerns the back-quote (or back-tick) character. Most fonts provide a default back-quote glyph that looks like the forward quote, so you can use two each to make English spoken quotes. Microsoft Windows, however, changes the encoding so that the back-tick variant is used, making it useless for spoken quotes.
For reference, here is what your browser uses for those characters:
Proportional | Monospaced | |
` ' " | ` ' " |
webmaster | delorie software privacy |
Copyright © 1996 | Updated Nov 1996 |