delorie.com/users/dj/brain/graphics/bits-glyphs/   search  
From Bits to Glyphs

This document talks about the various terms that are used when discussing fonts, characters, etc. While the terms have specific and well-understood meanings, among laypersons they often get misused or misunderstood. It is important to use the proper terms and understand the mechanism behind all this.

First, here are the terms I'll be using and the definitions that I think they have:

glyph
A graphical shape or figure that is used to represent a character to the user.
glyph name
A textual name that describes a specific glyph.
glyph id
An integer that identifies a single glyph within a font.
font
A collection of glyphs with a default encoding.
encoding
A table that maps character codes to glyphs.
character code
An integer that tells the computer which character you intend.
character
A glyph that represents a character code, when used as such to infer the character code's meaning to the user.

Glyphs

A glyph is a visual representation of a character. Its only purpose is to format the character information into a form that can be readily interpreted by the human visual system. For example, here are a number of different glyphs that could all be used to represent the seventh letter of the English alphabet:

glyph-g

While there are some conventions for selecting glyphs (you wouldn't want a house-shaped glyph for the letter g), the choice is usually left to the author of the font. Conventional fonts will use glyphs that at least represent the letters, digits, and symbols of the ASCII or ISO-Latin-1 character set. Custom fonts may use the character codes merely as an index to an assortment of unrelated glyphs, such as electical symbols or geometrical shapes. ASCII code 0x67, traditionally used for a capital G, may be used to select a floppy-shaped icon in a custom font.

The important thing to remember is that a glyph, by definition, is merely a shape and has no intrinsic meaning on its own.

Glyph Names

In order to ease the use of glyphs in some fonts, they may be given names. Glyphs corresponding to letters of the alphabet may be given one-letter names (choose the appropriate letter, of course), but some may be longer. For example, the glyph $ may be named dollarsign or usmoney. In fonts that claim to provide a standard collection of glyphs (for example, ISO-Latin-1 fonts), the standard dictates the names that will be used.

Glyph IDs

Most fonts will number their glyphs starting with one. Fonts may choose to select these numbers so that they happen to match the ASCII or ISO-Latin-1 character codes in the default encoding, but there is no need to do so. The glyph id may not be directly usable in some font formats.

Encodings

When a program wishes to display characters to the user, it represents those characters as integers, usually in the range of 0..255 for Western languages. The system must take those integers and choose suitable glyphs to display. If the wrong glyphs are chosen, the user will not understand what the program is trying to say. The primary tool for performing this mapping is called the encoding. This encoding is basically a table that maps the character codes to glyphs. Here is an example:

Encoding Example

In this example, the program wants to print the string Hello. The letter e in that string is stored as the integer 0x65 (101 decimal). The encoding maps that integer value to glyph number 425 in the font. That glyph is shown in the example.

I've put together a complete list of ISO-Latin-1 encodings and HTML entities for reference.

While the font may provide a default encoding for the first 256 character codes, the system may choose to install a different encoding for its purpose. For example, if the application is dealing with ISO-Latin-1 data, it may specify an encoding that maps character codes to suitable glyphs for that data.

One case where this becomes important in web design concerns the back-quote (or back-tick) character. Most fonts provide a default back-quote glyph that looks like the forward quote, so you can use two each to make English spoken quotes. Microsoft Windows, however, changes the encoding so that the back-tick variant is used, making it useless for spoken quotes.

single quote examples

a
This is the back-tick glyph that Microsoft uses for character code 0x60 (96 decimal).
b
This is the back-quote glyph that everyone else uses for character code 0x60.
c
this is the single-quote glyph that everyone uses for character code 0x0x27 (39 decimal). Note that it is symmetrical with b.
d
This is what everyone uses for character code 0x22 (34 decimal). Note that it is an inch symbol, which is not quite the same as a double quote used for quoting spoken text.
e
You can use pairs of the single quotes to represent spoken quotes, but only if they're symmetrical.

For reference, here is what your browser uses for those characters:

Proportional
Monospaced
` ' "
` ' "


  webmaster     delorie software   privacy  
  Copyright © 1996     Updated Nov 1996